An Algebraic Approach to Data Quality Metrics for Entity Resolution over Large Datasets

An Algebraic Approach to Data Quality Metrics for Entity Resolution over Large Datasets

John Talburt (University of Arkansas at Little Rock, USA), Richard Wang (Massachusetts Institute of Technology, USA), Kimberly Hess (CASA 20th Judicial District, USA) and Emily Kuo (Massachusetts Institute of Technology, USA)
Copyright: © 2007 |Pages: 22
DOI: 10.4018/978-1-59904-024-0.ch001
OnDemand PDF Download:
$37.50

Abstract

This chapter introduces abstract algebra as a means of understanding and creating data quality metrics for entity resolution, the process in which records determined to represent the same real-world entity are successively located and merged. Entity resolution is a particular form of data mining that is foundational to a number of applications in both industry and government. Examples include commercial customer recognition systems and information sharing on “persons of interest” across federal intelligence agencies. Despite the importance of these applications, most of the data quality literature focuses on measuring the intrinsic quality of individual records than the quality of record grouping or integration. In this chapter, the authors describe current research into the creation and validation of quality metrics for entity resolution, primarily in the context of customer recognition systems. The approach is based on an algebraic view of the system as creating a partition of a set of entity records based on the indicative information for the entities in question. In this view, the relative quality of entity identification between two systems can be measured in terms of the similarity between the partitions they produce. The authors discuss the difficulty of applying statistical cluster analysis to this problem when the datasets are large and propose an alternative index suitable for these situations. They also report some preliminary experimental results, and outlines areas and approaches to further research in this area.

Complete Chapter List

Search this Book:
Reset
Table of Contents
Foreword
John Talburt
Acknowledgments
Chapter 1
John Talburt, Richard Wang, Kimberly Hess, Emily Kuo
This chapter introduces abstract algebra as a means of understanding and creating data quality metrics for entity resolution, the process in which... Sample PDF
An Algebraic Approach to Data Quality Metrics for Entity Resolution over Large Datasets
$37.50
Chapter 2
Laure Berti-Equille
For non-collaborative distributed data sources, quality-driven query processing is difficult to achieve because the sources generally do not export... Sample PDF
Quality-Extended Query Processing for Mediation Systems
$37.50
Chapter 3
M. Mehdi Owrang O.
Current database technology involves processing a large volume of data in order to discover new knowledge. However, knowledge discovery on just the... Sample PDF
Discovering Quality Knowledge from Relational Databases
$37.50
Chapter 4
Zbigniew J. Gackowski
This chapter presents a qualitative inquiry into the universe of quality attributes of symbolic representation such as data and information values.... Sample PDF
Relativity of Inforamtion Quality: Ontological vs. Teleological, Internal vs. Eternal View
$37.50
Chapter 5
Karolyn Kerr, Tony Norris
Data quality requirements are increasing as a wider range of data becomes available and the technology to mine data shows the value of data that is... Sample PDF
The Development of a Health Data Quality Programme
$37.50
Chapter 6
Ismael Caballero, Mario Piattini
This chapter introduces a way for assessing and improving information quality at organizations. Information is one of the most important assets for... Sample PDF
Assessment and Improvement of Information Quality
$37.50
Chapter 7
Elizabeth M. Pierce
This paper takes the basic constructs of the IP-Map diagram and demonstrates how they can be combined with the Event-Driven Process Chain... Sample PDF
Integrating IP-Maps with Business Process Modelling
$37.50
Chapter 8
Latif Al-Hakim
This chapter considers information flow as an important dimension of information quality and proposes a procedure for mapping information flow. The... Sample PDF
Procedure for Mapping Information Flow: A Case of Surgery Management Process
$37.50
Chapter 9
Zhanming Su, Zhanming Jin
Product Information Quality (PIQ) is critical in manufacturing enterprises. Yet, the field lacks comprehensive methodologies for its evaluation. In... Sample PDF
A Methodology for Information Quality Assessment in the Designing and Manufacturing Processes of Mechanical Products
$37.50
Chapter 10
Andy Koronos, Shien Lin
This chapter discusses the critical issues of information quality (IQ) associated with engineering assets management. It introduces an asset... Sample PDF
Inforamtion Quality in Engineering Asset Management
$37.50
Chapter 11
Zhenguo Yu, Ying Wang
This chapter presents a survey into quality management practice regarding statistics and financial data in China. As a fast-developing country... Sample PDF
Quality Management Practices Regarding Statistical and Financial Data in China
$37.50
Chapter 12
Suhaiza Zailani, Premkumar Rajagopal
This paper introduces how information quality plays an important role in a supply chain performance. In order to make smarter use of global... Sample PDF
The Effects of Information Quality on Supply Chain Performance: New Evidence from Malaysia
$37.50
About the Authors