Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Entity Resolution on Graph Data Set

Source Title: Innovative Techniques and Applications of Entity Resolution

DOI: 10.4018/978-1-4666-5198-2.ch008

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

In this chapter, the authors study entity resolution on graph data set. In order to conduct entity resolution on graph data, the authors need to define the distance of graph. The authors compute these distances or approximately compute them for time efficiency. At last, the authors utilize the distances to get the final result of entity resolution. The approximate graph matching algorithms may be index-based like the NH-Index method or kernel function based like G-hash method. Other methods concentrate on providing new definitions of similar graph that are easier to compute than traditional methods, like the Web-collection method and the Grafil method. To increase the resolution ability of traditional methods, researchers provide some methods to recognize similar graphs, like graph-bounded simulation and p-homomorphism. Section 8.1 introduces existing methods on defining the distance of graph, which has a direct impact on the computation of graph similarity. Section 8.1 introduces pair-wise entity resolution on graph data set, including index techniques, graph-bounded simulation, and graph p-homomorphism.

Chapter Preview

Top

Introduction

At earlier times, researchers concentrate on exact graph matching, for exact graph matching can get the same subgraph in the data graph. However, as the development of database theory, more none-relational database systems come into the world, and graph database system is one of the most suitable systems for the processing of big data. For many times, exact graph matching cannot express the query intention well. User queries include querying for whether there exists web store with similar production structure, querying for protein molecules with a given protein molecules, etc. As the Chinese saying goes, “no clear water to fish”, exact graph matching may lead to not enough number of matching results or even no matching result. So similar graph matching plays an important role in the database management world.

The approximate graph matching algorithms may be index-based like the NH-Index method, or kernel function based like G-hash method.

NH-Index(Tian, 2008), short for neighborhood index is an easy to implement index structure for similar graph matching. Traditional graph index methods only index subgraphs (paths, trees or general subgraphs), which can lead to index sizes that are exponential in the database size. The index unit for NH-Index is the neighbor information of each node in database and the index size is linear in the database size. Also, the NH-Index is a disk-based index, which is suitable for big data that cannot be all put in main memory.

G-hash(Wang, 2009) is a kernel function based method to do pair-wise entity resolution on graphs. The basic idea of this method is mapping graph data into node vectors, and we can get graph similarity by computing similarity function on these node vectors. In order to get node vector, we first make use of wavelet functions, which transform the topology of graphs into node vectors. Kernel function refers to the operation of computing the inner product between two objects in feature space. Kernel function computes the similarity of node vectors, which reflect the similarity between graphs.

Other methods concentrate on providing new definitions of similar graph that are easier to compute than traditional methods, like the web-collection (Cho, 2000) method and the Grafil(Yan, 2005) method.

Web collection is a practical graph similarity measure method defined by the group who developed the Google search engine. Their aim is to find replicated web pages, and this is done by modeling web pages as a web graph. The basic processing unit is called collection in this method.

Grafil is a similarity measure for graph. This measure builds a connection between the structure-based measure and the feature-based measure so that we can use the feature-based measure to screen the database before performing the expensive pairwise structure-based similarity computation. When performing subgraph matching, too strict matching will induce a nearly empty result set. Grafil’s basic idea is to extract features from query graph, and when the result set doesn’t have enough elements, we gradually reduce the number of features to return more similar results. This process is called query relaxation. During the computation, feature filtering can then improve time efficiency.

To increase the resolution ability of traditional methods, researchers provide some methods to recognize similar graphs, like graph bounded simulation (Fan, 2010) and p-homomorphism (Fan, 2010).

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Entity Resolution on Graph Data Set

Abstract

Introduction

Complete Chapter List