Design of a Least Cost (LC) Vertical Search Engine based on Domain Specific Hidden Web Crawler

Design of a Least Cost (LC) Vertical Search Engine based on Domain Specific Hidden Web Crawler

Sudhakar Ranjan (Department of Computer Science Engineering, Apeejay Stya University, Gurgaon, India) and Komal Kumar Bhatia (Department of Computer Engineering, YMCA University of Science & Technology, Faridabad, India)
Copyright: © 2017 |Pages: 15
DOI: 10.4018/IJIRR.2017040102
OnDemand PDF Download:
No Current Special Offers


Now days with the advent of internet technologies and ecommerce the need for smart search engine for human life is rising. The traditional search engines are not intelligent as well as smart and thus lead to the rise in searching costs. In this paper, architecture of a vertical search engine based on the domain specific hidden web crawler is proposed. To make a least cost vertical search engine improvement in the following techniques like: searching, indexing, ranking, transaction and query interface are suggested. The domain term analyzer filters the useless information to the maximum extent and finally provides the users with high precision information. Through the experimental result it is shown that the system works on accelerating the access, computation, storage, communication time, increased efficiency and work professionally.
Article Preview

In Shettar and Bhuptani (2008) the vertical search engine based on domain classifier is built on seven modules: crawler (spider), HTML parse, filter, domain classifier, page ranker, URL db, search interface. In Peshave (2005) the work on structured-data on the web has focused mostly on providing users access to the data. However, the significant value can be obtained from analyzing collections of meta-data on the Web. Desa (2007) describes in detail the basic tasks a search engine performs. An overview of how the whole system of a search engine works is provided. A WebCrawler application is implemented using Java programming language. In Raghavan and Garcia-Molina (2001) a large amount of on-line information resides on the invisible web – web pages generated dynamically from databases and other data sources hidden from current crawlers which retrieve content only from the publicly Indexable Web. Specially, they ignore the tremendous amount of high quality content “hidden” behind search forms, and pages that require authorization or prior registration in large searchable electronic databases.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 12: 4 Issues (2022): Forthcoming, Available for Pre-Order
Volume 11: 4 Issues (2021): 2 Released, 2 Forthcoming
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing