Reference Hub2
Issues and Challenges in Web Crawling for Information Extraction

Issues and Challenges in Web Crawling for Information Extraction

Subrata Paul, Anirban Mitra, Swagata Dey
Copyright: © 2017 |Pages: 29
ISBN13: 9781522523758|ISBN10: 1522523758|EISBN13: 9781522523765
DOI: 10.4018/978-1-5225-2375-8.ch004
Cite Chapter Cite Chapter

MLA

Paul, Subrata, et al. "Issues and Challenges in Web Crawling for Information Extraction." Bio-Inspired Computing for Information Retrieval Applications, edited by D.P. Acharjya and Anirban Mitra, IGI Global, 2017, pp. 93-121. https://doi.org/10.4018/978-1-5225-2375-8.ch004

APA

Paul, S., Mitra, A., & Dey, S. (2017). Issues and Challenges in Web Crawling for Information Extraction. In D. Acharjya & A. Mitra (Eds.), Bio-Inspired Computing for Information Retrieval Applications (pp. 93-121). IGI Global. https://doi.org/10.4018/978-1-5225-2375-8.ch004

Chicago

Paul, Subrata, Anirban Mitra, and Swagata Dey. "Issues and Challenges in Web Crawling for Information Extraction." In Bio-Inspired Computing for Information Retrieval Applications, edited by D.P. Acharjya and Anirban Mitra, 93-121. Hershey, PA: IGI Global, 2017. https://doi.org/10.4018/978-1-5225-2375-8.ch004

Export Reference

Mendeley
Favorite

Abstract

Computational biology and bio inspired techniques are part of a larger revolution that is increasing the processing, storage and retrieving of data in major way. This larger revolution is being driven by the generation and use of information in all forms and in enormous quantities and requires the development of intelligent systems for gathering, storing and accessing information. This chapter describes the concepts, design and implementation of a distributed web crawler that runs on a network of workstations and has been used for web information extraction. The crawler needs to scale (at least) several hundred pages per second, is resilient against system crashes and other events, and is capable to adapted various crawling applications. Further this chapter, focusses on various ways in which appropriate biological and bio inspired tools can be used to implement, automatically locate, understand, and extract online data independent of the source and also to make it available for Semantic web agents like a web crawler.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.