Similarity Web Pages Retrieval Technologies on the Internet
Rung Ching Chen (Chaoyang University of Technology, Taiwan), Ming Yung Tsai (Chaoyang University of Technology, Taiwan) and Chung Hsun Hsieh (Chaoyang University of Technology, Taiwan)
Copyright: © 2005
In recent years, due to the fast growth of the Internet, the services and information it provides are constantly expanding. Madria and Bhowmick (1999) and Baeza-Yates (2003) indicated that most large search engines need to comply to, on average, at least millions of hits daily in order to satisfy the users’ needs for information. Each search engine has its own sorting policy and the keyword format for the query term, but there are some critical problems. The searches may get more or less information. In the former, the user always gets buried in the information. Requiring only a little information, they always select some former items from the large amount of returned information. In the latter, the user always re-queries using another searching keyword to do searching work. The re-query operation also leads to retrieving information in a great amount, which leads to having a large amount of useless information. That is a bad cycle of information retrieval. The similarity Web page retrieval can help avoid browsing the useless information. The similarity Web page retrieval indicates a Web page, and then compares the page with the other Web pages from the searching results of search engines. The similarity Web page retrieval will allow users to save time by not browsing unrelated Web pages and reject non-similar Web pages, rank the similarity order of Web pages and cluster the similarity Web pages into the same classification.