Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Overview of Entity Resolution

Source Title: Innovative Techniques and Applications of Entity Resolution

DOI: 10.4018/978-1-4666-5198-2.ch001

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Entity resolution is one of many importation operations for data quality management, information retrieval, and data management. It has wide applications in Web search, ecommerce search, data cleaning, and information integration. Due to its importance, entity resolution has been studied by researchers in multiple fields including database, machine learning, information retrieval, as well as high performance computation. This book contains a number of chapters, which are carefully chosen in order to discuss the broad research issues in entity resolution. In addition, a number of important applications of entity resolution are also covered in the book. The purpose of this chapter is to provide an overview of the concepts, applications, and research topics of entity resolution, as well as the coverage of these topics in this book.

Chapter Preview

Top

Basic Concepts Of Entity Resolution

Entity resolution is to distinguish the representations referring to the same real-world entity in one or more databases and recognize all different real-world entities in the databases.

Entity resolution plays an important role in data management. It is one of the major research problems in data quality management.

From the result form of entity resolution, it could be classified into two types. One is pair-wise entity resolution. The results are pairs of data objects which refer to the same real-world entity. The other is group-wire entity resolution, whose result is a family of clusters with each one containing the data objects referring to the same real-world entity.

Entity resolution has wide applications in many steps in data management and data quality management. We use two examples to explain entity resolution and its applications.

Example 1: In a management information system for an enterprise, different departments of marketing, sales and server may maintain autonomous databases. These databases may have different types such as relational database, XML documents and OO database. The data in the databases may have different schemas. The name of attribute of the same entity may have different description method. As an example, a custom with name “Wei Wang” may be represented as “Wang Wei”, “W Wang” even pairs (Wei, Wang) or XML data fragment <Customer><FamilyName>Wang</FamilyName> <GivenName>Wei</ GivenName ></customer> in different databases. The acquiring and reorganizing of enterprises will result in more such instances, since the databases of enterprises involving the acquiring may have many different representations referring to the same real-world entity. Information integrated from such databases may mislead the decision. For example, during the statistics of the number of customers, if the same customer from various databases is treated as different customers, the result is larger than the real result. In order to support the decisions with management information system, it is necessary to detect the data object referring to the same real-world entity in different databases correctly. Additionally, the data quantity in enterprise gets very large. According to the panel in VLDB 2002 (42), in 2002, the data amount of manufacturing enterprises reaches 100TB and increases 20% each year. Therefore, entity resolution techniques for massive and frequent-updating data in various structures are in demand for enterprise data management.
Example 2: Web sites in the Internet are autonomous. Information in Web 2.0 sites is inputted by various non-expert users. Therefore, one real-world entity may have different descriptions in different web sites even in different part of the same website. Thus, the search results from the Internet may contain various descriptions of the same real-world entity. On one hand, such duplicated results make users browse many similar information and their time is wasted. On the other hand, inconsistent information and wrong statistics results from retrieval results may lead to wrong decisions. If entity resolution is applied on the retrieval results to cluster them according to the referred entities and make the data objects in each cluster referring to the same real-world entity, retrieval results in higher quality are provided to users. Such that the effectiveness of information use is increased. However, entity resolution on Internet brings challenges. The first challenge is that the data quantity of information in the Internet is very large. The number of pages indexed by Google exceeds 1T. Due to the involving of many users, the information in Internet updates frequently and is in various types including XML, relational database, RDF in graph structure and HTML. Internet information collection and retrieval system with quality assurance requires entity resolution on dynamic massive data in various types.

Other examples of entity resolution include finding special structure in network and IP alias discovery (Getoor & Machanavajjhala, 2012).

From these examples, entity resolution is important for data quality management and data management. Formally, these two kinds of entity resolution are defined as follows.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Overview of Entity Resolution

Abstract

Basic Concepts Of Entity Resolution

Complete Chapter List