Issues and Challenges in Web Crawling for Information Extraction

Subrata Paul, Anirban Mitra, Swagata Dey

Source Title: Bio-Inspired Computing for Information Retrieval Applications

ISBN13: 9781522523758|ISBN10: 1522523758|EISBN13: 9781522523765

DOI: 10.4018/978-1-5225-2375-8.ch004

MLA

Paul, Subrata, et al. "Issues and Challenges in Web Crawling for Information Extraction." Bio-Inspired Computing for Information Retrieval Applications, edited by D.P. Acharjya and Anirban Mitra, IGI Global, 2017, pp. 93-121. https://doi.org/10.4018/978-1-5225-2375-8.ch004

APA

Paul, S., Mitra, A., & Dey, S. (2017). Issues and Challenges in Web Crawling for Information Extraction. In D. Acharjya & A. Mitra (Eds.), Bio-Inspired Computing for Information Retrieval Applications (pp. 93-121). IGI Global. https://doi.org/10.4018/978-1-5225-2375-8.ch004

Chicago

Paul, Subrata, Anirban Mitra, and Swagata Dey. "Issues and Challenges in Web Crawling for Information Extraction." In Bio-Inspired Computing for Information Retrieval Applications, edited by D.P. Acharjya and Anirban Mitra, 93-121. Hershey, PA: IGI Global, 2017. https://doi.org/10.4018/978-1-5225-2375-8.ch004

Export Reference

Favorite

View Full Text HTML

View Full Text PDF

Abstract

Computational biology and bio inspired techniques are part of a larger revolution that is increasing the processing, storage and retrieving of data in major way. This larger revolution is being driven by the generation and use of information in all forms and in enormous quantities and requires the development of intelligent systems for gathering, storing and accessing information. This chapter describes the concepts, design and implementation of a distributed web crawler that runs on a network of workstations and has been used for web information extraction. The crawler needs to scale (at least) several hundred pages per second, is resilient against system crashes and other events, and is capable to adapted various crawling applications. Further this chapter, focusses on various ways in which appropriate biological and bio inspired tools can be used to implement, automatically locate, understand, and extract online data independent of the source and also to make it available for Semantic web agents like a web crawler.

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.

Username or email: *

Password: *

Forgot individual login password?

Create individual account

Issues and Challenges in Web Crawling for Information Extraction

MLA

APA

Chicago

Export Reference

Abstract

Request Access