Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

The Emerging Threats of Web Scrapping to Web Applications Security and Their Defense Mechanism

Rizwan Ur Rahman, Danish Wadhwa, Aakash Bali, Deepak Singh Tomar

Source Title: Encyclopedia of Criminal Activities and the Deep Web

DOI: 10.4018/978-1-5225-9715-5.ch053

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Web scraping is the technique exploited to robotically obtain particular information from web applications instead of manually copying it. The purpose of a web scraper is to search for certain class of information, dig out, and aggregate it into new database. More precisely, web scrapers are used to transform unstructured web data and store them in structured databases. It is a continuing threat to web applications that aims to steal sensitive data from a victim or from web applications. The key objective of this article is to examine to what extent web scraping can cause a threat to web application security. This article explores the classification of web scraping such as content scraping, web scraping, price scraping, and database scraping in general and presents the most widely used scraping tools such as Web Content Extractor, and Screen Scrapper. Consequently, the aim of this article is to give evaluation of vulnerabilities, threats of web scraping associated with web application applications, and effective measures to counter them.

Chapter Preview

Top

Web Scraping

Web scraping is also known by some other names like web harvesting and web data extraction basically is used for extraction of data from the websites on the WORLD WIDE WEB. In other words, it can be defined as the process consisting of the extraction and combination of content gathered from the web in a systematic manner (Vargiu & Urru, 2012).

Software applications are available for doing the web scrapping which may do their work of accessing the World Wide Web using Hypertext Transfer Protocol or web browser. Web scraping can also be done manually by the user but is preferably done in an automated fashion implemented using a bot or web crawler. In this, some software also known as web robot is mimicking the browsing between the web and the human in a conventional web traversal.

This robot may gather the data from as many websites as needed and the parsing of the contents is done to easily find and fetch the data required and stores them in the structures as desired.

Generally, this task of web scraping is somewhat similar to copying; in this particular data is collected and copied from the Internet into some manageable and readable storage structure like some spreadsheets or databases.

In this process, the web page is downloaded or fetched (it happens whenever the browser opens up some pages) first and saved for later use and then the data is extracted from it. Hence we can say that web crawling is an important component of the process.

At the second step of the process the content present in the page is parsed, searched or some type of reformatting is done to understand the content for the data to get it inserted into the spreadsheets or database by copying. Generally, the web scrapping software may sometime take a part of the page which can be useful for the authority for some other purpose.

Web Scrapping is being used in various things in today’s life like in advertisements and marketing generally by contact scraping and also an important part of the application made for data mining and web mining, and sometimes used to do some price comparisons, for online price change monitoring, weather data monitoring, research and for providing a service to the user where the content comprises of more than one source also known as web mashup for instance, like trivago and mybestprice applications.

Basically, these web scrapers are APIs which are used to extract data from a web page or a website present on the Internet. Also, some big companies like Amazon Web Services and Google provide web scrapping tools free of cost to end users.

Key Terms in this Chapter

News Scraping: It is a process of scraping the news from the newspaper websites.

Database Scraping: It is a process of directly extracting data from the database is known as the Database scraping.

Article Scraping: It is a process of scraping of the articles from the blogs or websites.

Content Scraping: It is a process of lifting off the displayed content from various websites and using it somewhere else or displaying it on other websites.

Data Scraping: It is a process used to extract massive amount of data from websites in which the data is stored in local computer system or in structured database.

Price Scraping: It is a process of extracting or collecting the prices of various items in e-commerce site available over the internet without the consent.

Web Scraping: The process of extracting data from the websites in a systematic manner.

Email Harvesting: The mechanism to obtain a large number of email addresses using different methods or techniques.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

The Emerging Threats of Web Scrapping to Web Applications Security and Their Defense Mechanism

Abstract

Web Scraping

Key Terms in this Chapter

Complete Chapter List