The Proposed Framework of View-Dependent Data Integration Architecture

The Proposed Framework of View-Dependent Data Integration Architecture

Pradeep Kumar, Madhurendra Kumar, Rajeev Kumar
Copyright: © 2024 |Pages: 19
DOI: 10.4018/979-8-3693-2964-1.ch021
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In this chapter the authors have proposed framework that have we are using various techniques to overcome the above mentioned challenges and proving the exact identification and extraction of WQI by classifying them according to their domain. The framework has been represented in system level design as a high level view of the view dependent data integration system and the architectural framework of the system design. Being a multi-database-oriented system, it is scalable to structure as well as unstructured data source. The wrapper and mediator module of the system is used to map the web-query-interface to the global schema of the integrated web query interfaces. In this the authors have implemented high level view of the system modeling from the end users' point of view; and the operational framework design also been represented.
Chapter Preview
Top

Introduction

Deep web provides huge amount of domain-specific content like medical data, houses on sale data, e-commerce and science data etc. All these data stored on deep web can be accessed with the help of various HTML forms named Web Query Interface. This information stored in deep web can be fetched from multiple database servers but one at a time that makes the information accessing process inefficient. To overcome this, a different approach is used by integrating Web Query Interface (IWQI) that is able to perform the query operation on multiple database servers at a time and acts as an independent entry point. In the proposed architecture we have given an innovative method for integrating web forms along with IWQI based on VDIS (View-based Data Integration System) and linked data. We have proposed a novel and alternative solution for combining various WQI into single IWQI for a specific domain for the advanced registrations based systems for enhanced security and reliability. Our proposed method starts with an integrated system by wrapping of various domain-wise WQI till the development of single integrated WQI.

The deep web, also called invisible web, hidden web is a part of World Wide Web. Its content can’t directly index by common search engines like Yahoo, Google or Bing. Deep web acts like an umbrella for various parts of internet that are not fully accessible through the mentioned search engines. In the working process of crawling a web, every search engine uses search robots to add new contents to find any appropriate search engine index. The range of deep web is unknown to everyone, but according to the estimates by experts that 1% of all content over the web is indexed and crawled by search engines. The content that is available for access through search engines is known as surface web. As the content stored in it, deep web includes private data over various social media sites, emails, chat messages, bank statements etc. Search Robots are capable to search the paid sites like news articles or educational sites those requires some kind of subscription and for the open access sites like YouTube, Netflix. The terms deep web and dark web are uses similarly sometimes, still they are having some differences. Dark web is a part of larger web and the content stored in it is accessible via any search engine like Google.

Dark web is less reputed as it involves many illegal activities like malware attacks, cyber attacks, black market of stolen credit card, etc. In contrast of this, deep web contains legal content like academic journals and research databases. Although deep web can’t be indexed by standard search engines, still, to access deep web is safe and easily accessible by many users. Gmail, LinkedIn sign in are some of the examples of deep web sites. The data on deep web contains personal information, that’s why they are having restrictive access. The value of deep web content becomes enormous if the most iconic commodity of current information age is actually information. By having these facts, Bright Planet shows various facts and findings in a study. The study is actually based on collected data between 13 and 20 March 2000. The findings include-

  • Deep web is having a huge public information dataset. There available information is 400-500 times larger than the data in commonly defined in www.

  • Deep web is having 7500 Tb of information in its database. On contrary surface web is having only 19 Tb of information.

  • 550 billion individual documents are stored in deep web that are far more than surface web.

  • In current time near about 200,000 deep websites exists all over the web.

  • On an average, data traffic received by the deep websites is 50% more than the data traffic received by the surface web.

  • Content available on deep web is highly connected and relevant to any information related to market, need or domain.

  • Deep web provides quality content 1000- 2000 times greater than surface web.

The content stored in deep web database can be accessed directly by URLs or through specific IP addresses if the website is open source kind of and it may require security access to fetch public-web pages. The deep web content is stored behind HTTP forms and it is useful for various common platforms like online banking, emailing, paywalls, etc. Figure 1 is showing the step-wise sequence of various phases involved in deep web query interface.

Complete Chapter List

Search this Book:
Reset