Measuring and Diffusing Data Quality in a Peer-to-Peer Architecture

Measuring and Diffusing Data Quality in a Peer-to-Peer Architecture

Diego Milano (Università degli Studi di Roma, Italy)
DOI: 10.4018/978-1-60566-146-9.ch014
OnDemand PDF Download:
$37.50

Abstract

Data quality is a complex concept defined by various dimensions such as accuracy, currency, completeness, and consistency (Wang & Strong, 1996). Recent research has highlighted the importance of data quality issues in various contexts. In particular, in some specific environments characterized by extensive data replication high quality of data is a strict requirement. Among such environments, this article focuses on Cooperative Information Systems. Cooperative information systems (CISs) are all distributed and heterogeneous information systems that cooperate by sharing information, constraints, and goals (Mylopoulos & Papazoglou, 1997). Quality of data is a necessary requirement for a CIS. Indeed, a system in the CIS will not easily exchange data with another system without knowledge of the quality of data provided by the other system, thus resulting in a reduced cooperation. Also, when the quality of exchanged data is poor, there is a progressive deterioration of the overall data quality in the CIS. On the other hand, the high degree of data replication that characterizes a CIS can be exploited for improving data quality, as different copies of the same data may be compared in order to detect quality problems and possibly solve them. In Scannapieco, Virgillito, Marchetti, Mecella, and Baldoni (2004) and Mecella et al. (2003), the DaQuinCIS architecture is described as an architecture managing data quality in cooperative contexts, in order to avoid the spread of low-quality data and to exploit data replication for the improvement of the overall quality of cooperative data. In this article we will describe the design of a component of our system named as, quality factory. The quality factory has the purpose of evaluating quality of XML data sources of the cooperative system. While the need for such a component had been previously identified, this article first presents the design of the quality factory and proposes an overall methodology to evaluate the quality of XML data sources. Quality values measured by the quality factory are used by the data quality broker. The data quality broker has two main functionalities: 1) quality brokering that allows users to select data in the CIS according to their quality; 2) quality improvement that diffuses best quality copies of data in the CIS.
Chapter Preview
Top

Introduction

Data quality is a complex concept defined by various dimensions such as accuracy, currency, completeness, and consistency (Wang & Strong, 1996). Recent research has highlighted the importance of data quality issues in various contexts. In particular, in some specific environments characterized by extensive data replication high quality of data is a strict requirement. Among such environments, this article focuses on Cooperative Information Systems.

Cooperative information systems (CISs) are all distributed and heterogeneous information systems that cooperate by sharing information, constraints, and goals (Mylopoulos & Papazoglou, 1997). Quality of data is a necessary requirement for a CIS. Indeed, a system in the CIS will not easily exchange data with another system without knowledge of the quality of data provided by the other system, thus resulting in a reduced cooperation. Also, when the quality of exchanged data is poor, there is a progressive deterioration of the overall data quality in the CIS. On the other hand, the high degree of data replication that characterizes a CIS can be exploited for improving data quality, as different copies of the same data may be compared in order to detect quality problems and possibly solve them.

In Scannapieco, Virgillito, Marchetti, Mecella, and Baldoni (2004) and Mecella et al. (2003), the DaQuinCIS architecture is described as an architecture managing data quality in cooperative contexts, in order to avoid the spread of low-quality data and to exploit data replication for the improvement of the overall quality of cooperative data.

In this article we will describe the design of a component of our system named as, quality factory. The quality factory has the purpose of evaluating quality of XML data sources of the cooperative system. While the need for such a component had been previously identified, this article first presents the design of the quality factory and proposes an overall methodology to evaluate the quality of XML data sources.

Quality values measured by the quality factory are used by the data quality broker. The data quality broker has two main functionalities: 1) quality brokering that allows users to select data in the CIS according to their quality; 2) quality improvement that diffuses best quality copies of data in the CIS.

As a further research contribution, this article will focus on the design and implementation features of the data quality broker as a Peer-to-Peer (P2P) system. More specifically, the data quality broker is implemented as a peer-to-peer distributed service: each organization hosts a copy of the data quality broker that interacts with other copies. While the functional specification of the data quality broker is not a contribution of this article, and has been presented in (Scannapieco et al., 2004; Mecella et al., 2003), its detailed design and implementation features as a P2P system are a novel contribution of this article. Moreover, we will present some results from tests made to prove the effectiveness and efficiency of our system. The data quality broker is implemented by a peer-to-peer architecture in order to be as less invasive as possible in introducing quality controls in a cooperative system. Indeed, cooperating organizations need to save their independency and autonomy requirements. Such requirements are well-guaranteed by the P2P paradigm which is able to support the cooperation without necessarily involving consistent re-engineering actions; in the section on Related Work, we will better detail this point, comparing our choice with a system that instead does not adopt a P2P architecture.

The rest of this article is organized as follows. The second section describes the main features of the quality factory and of the data quality broker. The third section presents the overall methodology and the fourth section details the architectural design of the quality factory, by focusing on the case of XML data sources. The fifth section describes the detailed design and implementation of the data quality broker as a peer-to-peer system, and each module of its component architecture. The set of performed experiments is described in the sixth section. Finally, related work and conclusions are presented in the seventh and eighth section respectively.

Complete Chapter List

Search this Book:
Reset
Table of Contents
Preface
Angappa Gunasekaran
Chapter 1
Emad M. Kamhawi
Responding to the need for a better understanding of the factors that explain ERP systems implementation success, this chapter used a field study to... Sample PDF
Examining the Factors Affecting Project and Business Success of ERP Implementation
$37.50
Chapter 2
Ronald E. McGaughey, Angappa Gunasekaran
Business needs have driven the design, development, and use of the enterprise-wide information systems we call Enterprise Resource Planning (ERP)... Sample PDF
Evolution of Enterprise Resource Planning
$37.50
Chapter 3
Purnendu Mandal, Mohan P. Rao
The build-up of export-oriented companies since 1990s on the Mexico-USA boarder, and their recent decline, is no surprise to many policy analysts.... Sample PDF
Information Technology Usage in Maquila Enterprises
$37.50
Chapter 4
Henk Jonkers, Maria-Eugenia Iacob
In this chapter the authors address the integration of functional models with non-functional models in the context of service-oriented... Sample PDF
Performance and Cost Analysis of Service-Oriented Enterprise Architectures
$37.50
Chapter 5
S. Parthasarathy
Enterprise Resource Planning (ERP) system is an integrated software system reflecting the business processes of an enterprise. Enterprise Resource... Sample PDF
Significance of Analytical Hierarchy Process (AHP) and Nominal Group Technique (NGT) in ERP Implementation
$37.50
Chapter 6
Manuel Kolp, Yves Wautelet, Stéphane Faulkner
Organizational Modeling is concerned with analyzing and understanding the organizational context within which a software system will eventually... Sample PDF
Specifying Software Models with Organizational Styles
$37.50
Chapter 7
Piotr Soja
Enterprise resource planning (ERP) systems have been implemented in various and diverse organizations. The size of companies, their industry, the... Sample PDF
Towards Identifying the Most Important Attributes of ERP Implementations
$37.50
Chapter 8
Shuchih Ernest Chang
Other than providing Web services through popular Web browser interfaces, pervasive computing may offer new ways of accessing Internet applications... Sample PDF
A Voice-Enabled Pervasive Web System with Self-Optimization Capability for Supporting Enterprise Applications
$37.50
Chapter 9
Hafid Agourram
Research has showed that social and socio-technical concepts are influenced by culture. The objective of this chapter is to explore how the... Sample PDF
The Impact of Culture on the Perception of Information System Success
$37.50
Chapter 10
John Krogstie, Csaba Veres, Guttorm Sindre
Much of the early focus in the area of Semantic Web has been on the development of representation languages for static conceptual information; while... Sample PDF
Achieving System and Business Interoperability by Semantic Web Services
$37.50
Chapter 11
Chen-Yang Cheng
The success of implementing Enterprise Information System (EIS) depends on exploring and improving the EIS software, and EIS software training.... Sample PDF
Integrated Research and Training in Enterprise Information Systems
$37.50
Chapter 12
Lea Kutvonen
Participation in electronic business networks has become necessary for the success of enterprises. The strategic business needs for participating in... Sample PDF
Service-Oriented Middleware for Managing Inter-Enterprise Collaborations
$37.50
Chapter 13
Joseph Bradley, C. Christopher Lee
Training is still a neglected part of most ERP implementation projects. This case study investigates the relation between training satisfaction and... Sample PDF
Training and User Acceptance in a University ERP Implementation: Applying the Technology Acceptance Model
$37.50
Chapter 14
Diego Milano
Data quality is a complex concept defined by various dimensions such as accuracy, currency, completeness, and consistency (Wang & Strong, 1996).... Sample PDF
Measuring and Diffusing Data Quality in a Peer-to-Peer Architecture
$37.50
Chapter 15
Vipul Jain
The key part of dynamic supply chain management is negotiating with suppliers and with buyers. Designing efficient business processes throughout the... Sample PDF
Modeling Buyer-Supplier Relationships in Dynamic Supply Chains
$37.50
Chapter 16
Ioannis Ignatiadis, Joe Nandhakumar
Enterprise Systems are widespread in current organizations and seen as integrating organizational procedures across functional divisions. An... Sample PDF
Enterprise Systems, Control and Drift
$37.50
About the Contributors