A Study on Web Searching: Overlap and Distance of the Search Engine Results

A Study on Web Searching: Overlap and Distance of the Search Engine Results

Shanfeng Zhu (City University of Hong Kong, Hong Kong), Xiaotie Deng (City University of Hong Kong, Hong Kong), Qizhi Fang (Qingdao Ocean University, China) and Weimin Zhang (Tsinghua University, China)
Copyright: © 2004 |Pages: 18
DOI: 10.4018/978-1-59140-194-0.ch014
OnDemand PDF Download:


Web search engines are one of the most popular services to help users find useful information on the Web. Although many studies have been carried out to estimate the size and overlap of the general web search engines, it may not benefit the ordinary web searching users, since they care more about the overlap of the top N (N=10, 20 or 50) search results on concrete queries, but not the overlap of the total index database. In this study, we present experimental results on the comparison of the overlap of the top N (N=10, 20 or 50) search results from AlltheWeb, Google, AltaVista and WiseNut for the 58 most popular queries, as well as for the distance of the overlapped results. These 58 queries are chosen from WordTracker service, which records the most popular queries submitted to some famous metasearch engines, such as MetaCrawler and Dogpile. We divide these 58 queries into three categories for further investigation. Through in-depth study, we observe a number of interesting results: the overlap of the top N results retrieved by different search engines is very small; the search results of the queries in different categories behave in dramatically different ways; Google, on average, has the highest overlap among these four search engines; each search engine tends to adopt a different rank algorithm independently.

Complete Chapter List

Search this Book:
Table of Contents
Masoud Mohammadian
Chapter 1
Hui Yang, Minjie Zhang
The rapid proliferation of online textual databases on the Internet has made it difficult to effectively and efficiently search desired information... Sample PDF
Potential Cases, Database Types, and Selection Methodologies for Searching Distributed Text Databases
Chapter 2
Masoud Mohammadian, Ric Jentzsch
The World Wide Web has added an abundance of data and information to the complexity of information for disseminators and users alike. With this... Sample PDF
Computational Intelligence Techniques Driven Intelligent Agents for Web Data Mining and Information Retrieval
Chapter 3
Juan Manuel Dodero, Paloma Diaz, Ignacio Aedo
Knowledge creation or production in a distributed knowledge management system is a collaborative task that needs to be coordinated. A multi-agent... Sample PDF
A Multi-Agent Approach to Collaborate Knowledge Production
Chapter 4
Jin Sung Kim
One of the attractive topics in the field of Internet business is blending Artificial Intelligence (AI) techniques with the business process. In... Sample PDF
Customized Recommendation Mechanism Based on Web Data Mining and Case-Based Reasoning
Chapter 5
David Camacho, Ricardo Aler, Juan Cuadrado
How to build intelligent robust applications that work with the information stored in the Web is a difficult problem for several reasons which arise... Sample PDF
Rule-Based Parsing for Web Data Extraction
Chapter 6
Rowena Chau, Chung-Hsing Yeh
This chapter presents a novel user-oriented, concept-based approach to multilingual web content mining using self-organizing maps. The multilingual... Sample PDF
Multilingual Web Content Mining: A User-Oriented Approach
Chapter 7
Kaïs Khrouf, Chantal Soule-Dupuy
An enterprise memory must be able to be used as a basis for the processes of scientific or technical developments. Indeed, it was proven that... Sample PDF
A Textual Warehouse Approach: A Web Data Repository
Chapter 8
T. Beran, T. Macek
This chapter describes a rather less traditional technique of text processing. The technique is based on the binary neural network Correlation... Sample PDF
Text Processing by Binary Neural Networks
Chapter 9
Daniel Rivero, Juan R. Rabunal, Julián Dorado, Alejandro Pazos, Nieves Pedreira
In this chapter, we present an application of Genetic Programming (GP) in the field of data mining and extraction of Artificial Neural Networks... Sample PDF
Extracting Knowledge from Databases and ANNs with Genetic Programming: Iris Flower Classification Problem
Chapter 10
Koichi Jurumatani
We propose a social coordination mechanism that is realized with CONSORTS, a new kind of multi-agent architecture for ubiquitous agents. By social... Sample PDF
Social Coordination with Architecture for Ubiquitous Agents-CONSORTS
Chapter 11
A. Andreevskaia, R. Abi-Aad, T. Radhakrishnan
This chapter presents a tool for knowledge acquisition for user profiling in electronic commerce. The knowledge acquisition in e-commerce is a... Sample PDF
Agent-Mediated Knowledge Acquisition for User Profiling
Chapter 12
Shinichi Nagano, Yasuyuki Tahara, Tetsuo Hasegawa, Akihiko Ohsuga
Heavy electric machinery industry is currently developing electronic market places of product and parts. PLIB is the standard of dictionary model... Sample PDF
Development of Agent-Based Electronic Catalog Retrieval System
Chapter 13
Samhaa R. El-Baltagy, Ahmed Rafea, Yasser Abdelhamid
This chapter presents a simple framework for extracting information found in publications or documents that are issued in large volumes and which... Sample PDF
Using Dynamically Acquired Background Knowledge for Information Extraction and Intelligent Search
Chapter 14
Shanfeng Zhu, Xiaotie Deng, Qizhi Fang, Weimin Zhang
Web search engines are one of the most popular services to help users find useful information on the Web. Although many studies have been carried... Sample PDF
A Study on Web Searching: Overlap and Distance of the Search Engine Results
Chapter 15
S. Vrettos, A. Stafylopatis
Our work proposes the use of topic taxonomies as part of a filtering language. Given a taxonomy, we train classifiers for every topic of it. The... Sample PDF
Taxonomy Based Fuzzy Filtering of Search Results
Chapter 16
Wei Lai, Maolin Huang, Kang Zhang
A graph can be used for web navigation. The whole of cyberspace can be regarded as one huge graph. To explore this huge graph, it is critical to... Sample PDF
Generating and Adjusting Web Sub-Graph Displays for Web Navigation
Chapter 17
Hong Shi, Ji-Fu Zhang
There are frequent occurrences of pattern match involved in the process of counting the support count of candidates, which is one of the main... Sample PDF
An Algorithm of Pattern Match Being Fit for Mining Association Rules
Chapter 18
Jon T.S. Quah, Y. M. Chen, Winnie C.H. Leow
With the rapid evolution of the Internet, information overload is becoming a common phenomenon. It is necessary to have a tool to help users extract... Sample PDF
Networking E-Learning Hosts Using Mobile Agents
About the Authors