Reference Hub2
Design of a Migrating Crawler Based on a Novel URL Scheduling Mechanism using AHP

Design of a Migrating Crawler Based on a Novel URL Scheduling Mechanism using AHP

Deepika Punj, Ashutosh Dixit
Copyright: © 2017 |Volume: 4 |Issue: 1 |Pages: 16
ISSN: 2334-4598|EISSN: 2334-4601|EISBN13: 9781522515715|DOI: 10.4018/IJRSDA.2017010106
Cite Article Cite Article

MLA

Punj, Deepika, and Ashutosh Dixit. "Design of a Migrating Crawler Based on a Novel URL Scheduling Mechanism using AHP." IJRSDA vol.4, no.1 2017: pp.95-110. http://doi.org/10.4018/IJRSDA.2017010106

APA

Punj, D. & Dixit, A. (2017). Design of a Migrating Crawler Based on a Novel URL Scheduling Mechanism using AHP. International Journal of Rough Sets and Data Analysis (IJRSDA), 4(1), 95-110. http://doi.org/10.4018/IJRSDA.2017010106

Chicago

Punj, Deepika, and Ashutosh Dixit. "Design of a Migrating Crawler Based on a Novel URL Scheduling Mechanism using AHP," International Journal of Rough Sets and Data Analysis (IJRSDA) 4, no.1: 95-110. http://doi.org/10.4018/IJRSDA.2017010106

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

In order to manage the vast information available on web, crawler plays a significant role. The working of crawler should be optimized to get maximum and unique information from the World Wide Web. In this paper, architecture of migrating crawler is proposed which is based on URL ordering, URL scheduling and document redundancy elimination mechanism. The proposed ordering technique is based on URL structure, which plays a crucial role in utilizing the web efficiently. Scheduling ensures that URLs should go to optimum agent for downloading. To ensure this, characteristics of both agents and URLs are taken into consideration for scheduling. Duplicate documents are also removed to make the database unique. To reduce matching time, document matching is made on the basis of their Meta information only. The agents of proposed migrating crawler work more efficiently than traditional single crawler by providing ordering and scheduling of URLs.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.