Improving the Quality of Web Search

Improving the Quality of Web Search

Mohamed Salah Hamdi (University of Qatar, Qatar)
Copyright: © 2008 |Pages: 18
DOI: 10.4018/978-1-59904-847-5.ch026
OnDemand PDF Download:


Conventional Web search engines return long lists of ranked documents that users are forced to sift through to find relevant documents. The notoriously-low precision of Web search engines coupled with the ranked list presentation make it hard for users to find the information they seek. Developing retrieval techniques that will yield high recall and high precision is desirable. Unfortunately, such techniques would impose additional resource demands on the search engines which are already under severe resource constraints. A more productive approach, however, seems to enhance post-processing of the retrieved set. If such value-adding processes allow the user to easily identify relevant documents from a large retrieved set, queries that produce low precision/high recall results will become more acceptable. We propose improving the quality of Web search by combining meta-search and self-organizing maps. This can help users both in locating interesting documents more easily and in getting an overview of the retrieved document set.

Key Terms in this Chapter

Search Result Ranking: Ranking, in general, is the process of positioning items such as individuals, groups, or businesses on an ordinal scale in relation to others. A list arranged in this way is said to be in rank order. Search engines rank Web pages depending on their relevance to a user’s query. Each major search engine is unique in how it determines page rank. There is a growing business in trying to trick search engines into giving a higher page rank to particular Web pages as a marketing tool. The makers of search engines, of course, strive to make sure that such tricks are ineffective. One way that they do this is by keeping their algorithmic details confidential. They also may play the spy versus spy game of watching for the use of such tricks and refining their ranking algorithms to circumvent the tricks. At the same time, some search companies try to play double agent by selling improved page rank (positioning in search results).

Information Overload: Historically, more information has almost always been a good thing. However, as the ability to collect information grew, the ability to process that information did not keep up. Today, we have large amounts of available information and a high rate of new information being added, but contradictions in the available information, a low signal-to-noise ratio (proportion of useful information found to all information found), and inefficient methods for comparing and processing different kinds of information characterize the situation. The result is the “information overload” of the user, that is, users have too much information to make a decision or remain informed about a topic.

Inverted Index: An inverted index is an index into a set of documents of the words in the documents. The index is accessed by some search method. Each index entry gives the word and a list of documents, possibly with locations within the documents, where the word occurs. The inverted index data structure is a central component of a typical search engine indexing algorithm. A goal of a search engine implementation is to optimize the speed of the query: find the documents where word X occurs. Once a forward index is developed, which stores lists of words per document, it is next inverted to develop an inverted index. Querying the forward index would require sequential iteration through each document and to each word to verify a matching document. The time, memory, and processing resources to perform such a query are not always technically realistic. Instead of listing the words per document in the forward index, the inverted index data structure is developed, which lists the documents per word. With the inverted index created, the query can now be resolved by jumping to the word ID (via random access) in the inverted index. Random access is generally regarded as being faster than sequential access.

Unsupervised Learning: Consider a system which receives some sequence of inputs x1, x2, x3, …, where xt is the sensory input at time t. This input, called the data, could correspond to an image on the retina, the pixels in a camera, or a sound waveform. It could also correspond to less-obviously sensory data, for example, the words in a news story, or the list of items in a supermarket shopping basket. In unsupervised learning, the system simply receives inputs x1, x2, …, but obtains neither supervised target outputs, nor rewards from its environment. It may seem somewhat mysterious to imagine what the system could possibly learn, given that it does not get any feedback from its environment. However, it is possible to develop a formal framework for unsupervised learning based on the notion that the system’s goal is to build representations of the input that can be used for decision-making, predicting future inputs, efficiently communicating the inputs to another system, and so forth. In a sense, unsupervised learning can be thought of as finding patterns in the data above and beyond what would be considered pure, unstructured noise. Two very simple classic examples of unsupervised learning are clustering and dimensionality reduction.

Quality of Web Search: Seen from a user’s perspective, this term is related to the notion of “user satisfaction”. The more satisfied that a user is with the search results and the different aspects of searching, the higher is the rating of the search system. Assessing the quality of a Web search system and the results that it produces is notoriously difficult. For search results, criteria for determining the good, the bad, and the ugly include: scope and depth of coverage, authority, currency, accuracy and reliability, motive and purpose, ease of use and design issues, and so forth. Web search systems are used by a heterogeneous user population for a wide variety of tasks: from finding a specific Web document that the user has seen before and can easily describe, to obtaining an overview of an unfamiliar topic, to exhaustively examining a large set of documents on a topic, and more. A search system will prove useful only in a subset of these cases.

Recall and Precision: Recall and precision are two retrieval evaluation measures for information retrieval systems. Precision describes the ability of the system to retrieve top-ranked documents that are mostly relevant. Recall describes the ability of the system to find all of the relevant items in the corpus. If I is an example information request (from a test reference collection), R is the set of relevant documents for I (provided by specialists), A is the document answer set for I generated by the system being evaluated, and Ra = RnA is the set of relevant documents in the answer set, then recall = |Ra|/|R| and precision = |Ra|/|A|.

Browsing: The definition of browsing is to inspect, in a leisurely and casual way, a body of information, usually on the World Wide Web, based on the organization of the collections, without clearly-defined intentions. Hypertext is an appropriate conceptual model for organization. Usually, hypertext systems encourage browsing by stimulating the user to follow links. Today, most hypertext systems employ the point-and-click paradigm for user interaction; information is just one click (of the mouse button) away.

Information Customization (IC) Systems: IC systems are systems that customize information to the needs and interests of the user. They function proactively (take the initiative), continuously scan appropriate resources, analyze and compare content, select relevant information, and present it as visualizations or in a pruned format. Building software that can interact with the range and diversity of the online resources is a challenge, and the promise of IC systems is becoming highly attractive. Instead of users investing significant effort to find the right information, the right information should find the users. IC systems attempt to accomplish this by automating many functions of today’s information retrieval systems and providing features to optimally use information.

Complete Chapter List

Search this Book:
Editorial Advisory Board
Table of Contents
Coral Calero, M. Angeles Moraga, Mario Piattini
Chapter 1
Emilia Mendes
Surveying and classifying previous work on a particular field brings several benefits, which are: 1) to help organise a given body of knowledge; 2)... Sample PDF
Sizing Web Applications for Web Effort Estimation
Chapter 2
Emilia Mendes, Silvia Abrahão
Effort models and effort estimates help project managers allocate resources, control costs and schedule, and improve current practices, leading to... Sample PDF
Web Development Effort Estimation: An Empirical Analysis
Chapter 3
Pankaj Kamthan
The significance of approaching Web information systems (WIS) from an engineering viewpoint is emphasized. A methodology for deploying patterns as... Sample PDF
Patterns for Improving the Pragmatic Quality of Web Information Systems
Chapter 4
Rosemary Stockdale, Chad Lin
Many small and medium sized businesses (SMEs) have set up their own Web sites, as part of their business strategies, to improve their... Sample PDF
Evaluation of the Effectiveness of Small and Medium Sized Businesses Web Sites in a Business to Business Context
Chapter 5
May Haydar, Ghazwa Malak, Houari Sahraoui, Alexandre Petrenko, Sergiy Boroday
This chapter addresses the problem of Web application quality assessment from two perspectives. First, it shows the use of model checking of... Sample PDF
Anomaly Detection and Quality Evaluation of Web Applications
Chapter 6
Thomas Mandl
Automatic quality assessment of Web pages needs to complement human information work in the current situation of an information overload. Several... Sample PDF
Automatic Quality Assessment for Internet Pages
Chapter 7
Mª Ángeles Moraga, Julio Córdoba, Coral Calero, Cristina Cachero
The success of Web portals has increased over time, in such a way that a portal user can choose among a wide variety of portals. Therefore, the... Sample PDF
A General View of Quality Models for Web Portals and a Particularization to E-Banking Domain
Chapter 8
Angélica Caro, Coral Calero, Mario Piattini
Web portals are Internet-based applications that provide a big amount of data. The data consumer who uses the data given by these applications needs... Sample PDF
A Data Quality Model for Web Portals
Chapter 9
Marta Fernández de Arriba, Eugenia Díaz, Jesús Rodríguez Pérez
This chapter presents the structure of an index which serves as support so allowing the development team to create the specification of the context... Sample PDF
Specification of the Context of Use for the Development of Web-Based Applications
Chapter 10
Web Accessibility  (pages 163-180)
Carlos García Moreno
This chapter faces the Web accessibility issue from the perspective of Web Information Systems Quality, which is the main topic of the handbook. The... Sample PDF
Web Accessibility
Chapter 11
Adriana Martín, Alejandra Cechich, Gustavo Rossi
Web accessibility is one facet of Web quality in use, and one of the main actors upon which the success of a Web site depends. In spite of these... Sample PDF
Comparing Approaches to Web Accessibility Assessment
Chapter 12
Soonhwa Seok
Digital inclusion and Web accessibility are integral parts of modern culture and, as such, have implications for social accountability. The World... Sample PDF
Maximizing Web Accessibility Through User-Centered Interface Design
Chapter 13
Francisco Montero, María Dolores Lozano, Pascual González
World Wide Web software development is a challenge. The need to provide appealing and attractive user interfaces is combined with the fact that the... Sample PDF
Usability-Oriented Quality Model Based on Ergonomic Criteria
Chapter 14
Maristella Matera, Francesca Rizzo, Rebeca Cortázar, Asier Perallos
Given the emergent need for usability, during last year’s traditional development processes have been extended for enabling the fulfillment of... Sample PDF
The Usability Dimension in the Development of Web Applications
Chapter 15
Fernando Bellas, Iñaki Paz, Alberto Pan, Óscar Díaz
Portlets are interactive Web mini-applications that can be plugged into a portal. This chapter focuses on “portletizing” existing Web applications... Sample PDF
New Approaches to Portletization of Web Applications
Chapter 16
Victoria Torres, Joan Fons, Vicente Pelechano
Users consider usability aspects as a key factor when using Web applications. For this reason, in this work we take a special care in this very... Sample PDF
Handling Usability Aspects for the Construction of Business Process Driven Web Applications
Chapter 17
Nicolas Guelfi, Cédric Pruski, Chantal Reynaud
The evolution of Web information is of utmost importance in the design of good Web Information Systems applications. New emerging paradigms, like... Sample PDF
Towards the Adaptive Web Using Metadata Evolution
Chapter 18
Carmen Martínez-Cruz, Ignacio José Blanco, M. Amparo Vila
The Semantic Web has resulted in a wide range of information (e.g., HML, XML, DOC, PDF documents, ontologies, interfaces, forms, etc.) being made... Sample PDF
Looking for Information in Fuzzy Relational Databases Accessible Via Web
Chapter 19
Ricardo Barros, Geraldo Xexéo, Wallace A. Pinheiro, Jano de Souza
Currently, in the Web environment, users have to deal with an enormous amount of information. In a Web search, they often receive useless... Sample PDF
A Web Metadata Based-Model for Information Quality Prediction
Chapter 20
Fernando Molina, Francisco J. Lucas, Ambrosio Toval Alvarez, Juan M. Vara, Paloma Cáceres, Esperanza Marcos
Recent years have seen the arrival of the Internet as the platform that supports most areas within organizations, a fact which has led to the... Sample PDF
Towards Quality Web Information Systems Through Precise Model-Driven Development
Chapter 21
M.J. Escalona, G. Aragón
The increasing complexity and the many different aspects that should be treated at the same time require flexible but powerful methodologies to... Sample PDF
The Use of Metamodels in Web Requirements to Assure the Consistence
Chapter 22
Cristina Cachero Castro, Coral Calero, Yolanda Marhuenda García
This chapter introduces the necessity to consider quality management activities as part of the Web engineering (WE) process to improve the final... Sample PDF
A Quality-Aware Engineering Process for Web Applications
Chapter 23
Sergej Sizov, Stefan Siersdorfer
This chapter addresses the problem of automatically organizing heterogeneous collections of Web documents for the generation of thematically-focused... Sample PDF
Restrictive Methods and Meta Methods for Thematically Focused Web Exploration
Chapter 24
Mª Ángeles Moraga, Ignacio García-Rodríguez de Guzmán, Coral Calero, Mario Piattini
The use of Web portals continues to rise, showing their importance in the current information society. Specifically, this chapter focuses on... Sample PDF
WSRP-O: An Ontology to Model WSRP Compliant Portlets
Chapter 25
Tony C. Shan, Winnie W. Hua
This article defines a comprehensive set of guiding principles, called philosophy of architecture design (PAD), as a means of coping with the... Sample PDF
Philosophy of Architecture Design in Web Information Systems
Chapter 26
Mohamed Salah Hamdi
Conventional Web search engines return long lists of ranked documents that users are forced to sift through to find relevant documents. The... Sample PDF
Improving the Quality of Web Search
Chapter 27
Jengchung V. Chen, Wen-Hsiang Lu, Kuan-Yu He, Yao-Sheng Chang
With the fast growth of the Web, users often suffer from the problem of information overload, since many existing search engines respond to queries... Sample PDF
The Perspectives of Improving Web Search Engine Quality
Chapter 28
Xiannong Meng
This chapter surveys various technologies involved in a Web search engine with an emphasis on performance analysis issues. The aspects of a... Sample PDF
Web Search Engine Architectures and their Performance Analysis
Chapter 29
Fotis Lazarinis
As the Web population continues to grow, more non-English users will be amassed online. The purpose of this chapter is to describe the methods and... Sample PDF
Towards a Model for Evaluating Web Retrieval Systems in Non-English Queries
Chapter 30
John D. D’Ambra, Nina Mistillis
This chapter considers the change in information seeking behaviour of tourists as a result of the increased use of the World Wide Web as an... Sample PDF
Web Information Resources Vis-à-Vis Traditional Information Services
About the Contributors