A Multidisciplinary View of Data Quality

A Multidisciplinary View of Data Quality

Andrew Borchers
DOI: 10.4018/978-1-60566-026-4.ch437
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This article introduces the concepts of data quality as described in the literature of several disciplines and discusses research results on how individual perceptions of data quality are influenced by different media (in particular World Wide Web vs. print). A search of literature on “data quality” and “media creditability” reveals that researchers in many disciplines are separately studying the subject. These disciplines include accounting, advertising and public relations, information systems, scientific data collection, education, journalism fields, and others. While these threads have developed separately, these streams of research approach similar issues of how people view the quality of information they receive from different sources.
Chapter Preview
Top

Background

Data quality is an emerging area of research fundamental to the field of information systems. Indeed, the efficacy of systems is in large part driven by the quality of the data that they contain. With the Internet revolution, however, there have been fundamental changes in how information is collected and shared that have a potentially great influence on data quality. This challenge is accentuated with the recent move to “user-generated content” as a part of the broader evolution to Web 2.0 (Schwartz, 2007). In addition, younger generations immerse themselves in media more than their parents do. This has led to the label of the “M-generation.” A study by the Kaiser Family Foundation and Stanford University finds young people spending on average 6.5 hours per day in media exposure. Increasingly, this exposure comes in multiple media at one time (Azzam, 2006).

However, with such access and participation comes a challenge as stated by Gilster (as cited in Flanigan & Metzger, 2000):

When is a globe spanning information network dangerous? When people make too many assumptions about what they find on it. For while the Internet offers myriad opportunities for learning, an unconsidered view of its contents can be misleading and deceptive.

Further, organizational responses to data quality have been largely ad hoc (Swartz, 2006) with the majority of firms relying on localized, ad hoc approaches to ensuring data quality.

Recent research and seminars underscore the importance of the topic of data quality. Interest in the discipline has spawned the creation of the International Association for Information and Data Quality, several annual conferences (e.g., www.iqconference.org), and the ACM Journal of Data and Information Quality. Indeed, Total Data Quality Management (TDQM) has evolved as a field of study extending the concepts of Total Quality Management (Radziwill, 2006). Data quality has emerged as a significant research area.

Information systems and journalism practitioners have echoed the importance of data quality for many years. Research by Redman (1998) summarizes the practical implications of poor data quality. He points out the consequences of poor data quality in areas such as decision making, organizational trust, strategic planning and implementation, and customer satisfaction. Redman conducted (1998) detailed studies and found increased costs of 8-12% due to poor data quality. Service organizations can find increased expenses of 40-60% (Redman, 1998). Strong, Lee, and Wang (1997) support the seriousness of this issue in their study of 42 data quality projects in three organizations. Early research by other authors note data quality issues in a number of settings including accounting (Xu, 2000; Kaplan, Krishnan, Padman, & Peters, 1998), airlines, healthcare (Strong et al., 1997), criminal justice (Laudon, 1986), and data warehousing (Ballou, 1999).

As for a formal definition of data quality, Umar, Karabatis, Ness, Horowitz, and Elmagardmid (1999) quote Redman (1992):

Key Terms in this Chapter

Data Quality: A multifaceted concept in information systems research that focuses on the fitness for use of data by consumers. Data quality can be viewed in four categories: intrinsic (accuracy, objectivity, believability, and reputation), contextual (relevancy, timeliness, and appropriate amount of data), representational (format of the data), and accessibility (ease of access).

Intrinsic Data Quality: A concept that “data have quality in their own right” ( Wang & Strong, 1996 ) including accuracy, objectivity, believability, and reputation dimensions.

Representational Data Quality: A concept that data quality is related to the “format of the data (concise and consistent representation) and meaning of data (interpretability and ease of understanding)” ( Wang & Strong 1996 ).

User-Generated Content: A feature of Web applications that allows participants to add and modify information as contrasted to traditional sources such as publishers or broadcasters.

Complete Chapter List

Search this Book:
Reset