2 Way Crawling: A Review

2 Way Crawling: A Review

Mayuri Anantrao Deshmukh (MIT College of Engineering Aurangabad, Pune, India)
Copyright: © 2019 |Pages: 6
DOI: 10.4018/IJAEC.2019070105

Abstract

As we know that the deep web grows at very fast pace, there has been increased interest in techniques which help efficiently locate and check deep web interfaces. So, it is important to achieve wide coverage and high efficiency on the large volume of web resources. For this we propose a multistage framework, Smart crawler. Smart crawler is a two-stage crawler used to efficiently harvest deep web interfaces. In the first stage, the crawler performs site-based searching for center pages and avoids visiting non-relevant sites. In the second stage, an adaptive link ranking technique is used which helps to searching relevant site by excavating most relevant links. It is important to eliminate bias on visiting highly relevant links which is hidden in web directories, for this a link tree data structure is designed to achieve wider coverage for a website. The proposed framework gives experimental result on different domains and shows the agility and accuracy of the proposed framework, which retrieves deep-web interfaces from a large volume of sites and achieves higher harvest rates than other crawler.
Article Preview
Top

2. Literature

To leverage the large volume information buried in deep web, previous work has proposed techniques which include dark web understanding and dark web crawlers, and web samplers. For all these approaches, the ability to crawl deep web is a key challenge so the search paper Assessing relevance and trust of the deep web sources and results based on inter-source agreement.

2.1. Source Rank: Relevant Searching for Deep Web Sources for Giving Best Answers

In this, the problem of ranking database tuples for keyword search in databases has been addressed. The focus of these papers is on relevance assessment of tuples for keyword search in a single database. The problems of trust and importance are not considered. Improving web database search relevance by exploiting the search results from a surface web search engine. This paper considers the relevance assessment for search in a single database, and does not consider the trust problem. Further, the paper assumes the availability of high-quality web search results on the same topics as a reference (Raju & Subbarao, 2011, pp. 227-236).

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2020): 1 Released, 3 Forthcoming
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing