Data Quality Assessment: Problems and Methods

Data Quality Assessment: Problems and Methods

Juliusz L. Kulikowski (Nalecz Institute of Biocybernetics and Biomedical Engineering PAS, Warsaw, Poland)
DOI: 10.4018/ijoci.2014010102
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

A state-of-the-art in the domain of data quality assessment and maintaining in modern information systems is presented in the paper. A short historical view on the development of this problem is given. Particular attention is paid to the development of the idea of multi-aspect data quality assessment. The problem of extension of single data quality assessment on higher-level data structures is considered. The methods of multi-aspect data quality measures ordering for comparison is analyzed and its solution based on the concept of semi-ordering in linear vector (Kantorovitsh) space is proposed. Remarks on organizational and technological tools for data quality maintenance in organizations are given. Expected future trends in the development of data quality assessment and maintenance methods are suggested.
Article Preview

Introduction

High effectiveness of modern computer-based information processing (IP) systems depends on three basic factors: 1) technological performance of the system, 2) human-computer interaction capability, 3) quality of input data processed by the system. However, it does not mean that low input data quality makes all IP systems unable to work effectively; some IP systems can be designed for low data quality enhancement and in this, particular case a difference between the output and input data qualities becomes a basis for systems’ making on the base of inner or external data or knowledge bases high input data quality directly affects the decisions’ accuracy. Therefore, a precise definition of a data quality concept fitted to a given situation, as well as setting the rules of the data quality levels comparison should be included into the process of any IP system design.

There are numerous definitions of a general data quality notion in the literature, e.g.: Ballou Tayi, 1999; Kulikowski, 2002, 2009; Shanks, & Darke, 1998; Wang, Storey et al. 1995; Wang, & Strong, 1996; Huang, Lee, & Wang, 1998; Luebbers, Grimmer, & Jarke, 2003. In principle, the differences between them are not very substantial: all they define data quality as a factor independent on the amount of information provided by the data and strongly connected with the data customers’ needs. Below, data quality (DQ) will be understood as a set of data attributes describing their fitness to provide information expected or required in a given area of interest by some customers for decision making, learning, entertainment, etc. (Kulikowski, 2009). The following aspects of DQ thus should be noted: a) it is not a single, but rather a set of properties (complexity), b) it is relative as being dependent on the individual user’s point of view (relativity), c) it may depend on time and/or physical, organizational, etc. localization (time-dependency, site-dependency), d) its components (the properties) in different ways can be parameterized (formal non-homogeneity), e) on different data-organization levels, it in different ways may be defined; however, higher quality of lower-level data implies not-lower quality of the based on them higher-level data (structural scalability). The above-mentioned aspects of DQ cause, on one hand, a permanent interest to their theoretical investigation, and, on the other hand, a progress in programming tools designed for DQ improvement and maintaining, particularly, in large and distributed data and knowledge bases.

The aim of this paper consists in presentation of a state-of-art and future perspectives of DQ assessment as a research area within the IP systems theory and design methods. In the consecutive sections of the paper the following problems are presented: a short historical view on the development of the concept of information value and DQ, DQ components and their parameterization, relationships between the DQ on different data organization levels, comparison of multi-aspect DQ characteristics for IP systems optimization, and software tools for high DQ maintenance. Concluding remarks and expected future tendencies in DQ assessment and improving methods are given in the last section.

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing