This chapter surveys various technologies involved in a Web search engine with an emphasis on performance analysis issues. The aspects of a general-purpose search engine covered in this survey include system architectures, information retrieval theories as the basis of Web search, indexing and ranking of Web documents, relevance feedback and machine learning, personalization, and performance measurements. The objectives of the chapter are to review the theories and technologies pertaining to Web search, and help us understand how Web search engines work and how to use the search engines more effectively and efficiently.
Key Terms in this Chapter
Document Frequency: The number of documents containing a particular term.
Inverted Index: An indexing system in which the terms point to documents to which the terms belong.
Term Frequency: The number of times that a term appears in a document.
Relevance Feedback: A mechanism through which an IR system generates a set of results for a given query; the user is allowed to send feedback of some form to the IR system to improve search accuracy.
Estimated Search Length (ESL): The average number of irrelevant documents that one has to examine in order to retrieve a given number of relevant documents.
Cosine Similarity: A measure used to evaluate the relevance between a query and a document in vector space model; this measure is based on the cosine of the angle between the two vectors, the query, and the document.
Rank: The order with which the retrieved documents are presented; the closer to the beginning of the list, the more favored the document is.
Averaged Search Length (ASL): The expected position of a relevant document in the ordered list of all documents.
Information Retrieval: A branch of science that deals with the representation, storage, organization of, and access to information with the prime aim of retrieval information for a given set of queries.
Vector Space Model: A model in which all documents are represented as a vector of weights contributed by each of the terms found in these documents.
Complete Chapter List
Coral Calero, M. Angeles Moraga, Mario Piattini
Emilia Mendes, Silvia Abrahão
Rosemary Stockdale, Chad Lin
May Haydar, Ghazwa Malak, Houari Sahraoui, Alexandre Petrenko, Sergiy Boroday
Mª Ángeles Moraga, Julio Córdoba, Coral Calero, Cristina Cachero
Angélica Caro, Coral Calero, Mario Piattini
Marta Fernández de Arriba, Eugenia Díaz, Jesús Rodríguez Pérez
Carlos García Moreno
Adriana Martín, Alejandra Cechich, Gustavo Rossi
Francisco Montero, María Dolores Lozano, Pascual González
Maristella Matera, Francesca Rizzo, Rebeca Cortázar, Asier Perallos
Fernando Bellas, Iñaki Paz, Alberto Pan, Óscar Díaz
Victoria Torres, Joan Fons, Vicente Pelechano
Nicolas Guelfi, Cédric Pruski, Chantal Reynaud
Carmen Martínez-Cruz, Ignacio José Blanco, M. Amparo Vila
Ricardo Barros, Geraldo Xexéo, Wallace A. Pinheiro, Jano de Souza
Fernando Molina, Francisco J. Lucas, Ambrosio Toval Alvarez, Juan M. Vara, Paloma Cáceres, Esperanza Marcos
M.J. Escalona, G. Aragón
Cristina Cachero Castro, Coral Calero, Yolanda Marhuenda García
Sergej Sizov, Stefan Siersdorfer
Mª Ángeles Moraga, Ignacio García-Rodríguez de Guzmán, Coral Calero, Mario Piattini
Tony C. Shan, Winnie W. Hua
Mohamed Salah Hamdi
Jengchung V. Chen, Wen-Hsiang Lu, Kuan-Yu He, Yao-Sheng Chang
John D. D’Ambra, Nina Mistillis