Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Data Warehouses and Big Data: How to Cope With Data Quality

Hamid Naceur Benkhaled, Djamel Berrabah, Faouzi Boufares

Source Title: International Journal of Organizational and Collective Intelligence (IJOCI) 10(3)

DOI: 10.4018/IJOCI.2020070101

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Before the arrival of the Big Data era, data warehouse (DW) systems were considered the best decision support systems (DSS). DW systems have always helped organizations around the world to analyse their stored data and use it in making decisive decisions. However, analyzing and mining data of poor quality can give the wrong conclusions. Several data quality (DQ) problems can appear during a data warehouse project like missing values, duplicates values, integrity constrains issues and more. As a result, organizations around the world are more aware of the importance of data quality and invest a lot of money in order to manage data quality in the DW systems. On the other hand, with the arrival of BD, new challenges have to be considered like the need for collecting the most recent data and the ability to make real-time decisions. This article provides a survey about the exiting techniques to control the quality of the stored data in the DW systems and the new solutions proposed in the literature to face the new Big Data requirements.

Article Preview

Top

1. Introduction

To best explore the mountains of data that exist within organizations and across the web, data quality is becoming increasingly important. Indeed, data quality is a major issue in an organization and has a significant impact on the quality of its services and profitability. Decision-making using data of poor quality has a negative influence on the activities of organizations. Anomalies are only detected at the level of data restitution (such as analyses or visualizations), which is too late!

For the decision-makers, it would be recommended to integrate various data in order to create new ones including databases, data warehouses, data marts, data lakes, and master data. In an era of data deluge, data quality is more important than ever (Figure 1). There are multiple data sources: social networks; web; open data; dark data (dormant data not yet used; a lot of unstructured textual data). Indeed, nowadays, any type of organization needs to integrate data from various distributed sources which heterogeneous and of varying quality. In most cases, data descriptions in the sources are poor or nonexistent. As a result, the data assembly may be meaningless and the result obtained may contain many anomalies. The problems that lead to poor quality of the manipulated data could be the following: (i) heterogeneous data when integrated; (ii) different levels of data description (little or no description at all) and (iii) lack of semantics (Zaidi et al., 2015).

As mentioned above, data warehouse (DW) systems are among technologies used to integrate data. Before the arrival of the Big Data (BD) era, data warehouse systems were considered as the most powerful decision support system. DW systems have always helped organizations around the world to exploit their stored data and use it to a take an advantage over the competitors in the market.

Although DW systems have proven their standing over the years, they can sometimes fail to meet the stakeholder’s expectations or give the right decisions. Indeed, many DW projects have been cancelled due to data quality (DQ) problems. DQ problems can appears in different ways like missing values, duplicates records (Benkhaled et al., 2019) (Ouhab et al., 2017) or the referential integrity problems. Poor quality data causes losses estimated at about $ 600 million annually in the USA alone (information reported by the Data Warehousing Institute) This Institute also mentioned that 15% to 20% of the stored data in most of the enterprises is of poor data quality (Geiger, 2004). Consequently, companies’ leaders can lose their trust in the DW systems and look for other solutions since DQ problems can increase the cost of the Data Warehouse projects.

However, with the arrival of the Big Data era, adapting the traditional DW systems to the new Big Data challenges was one of the main active research fields. Most of the Big Data applications need to execute near-real times analyzing (Like Internet of Things) which was not the case with the traditional DW systems (Meehan et al., 2017), specifically, the ETL (extraction, transformation, and loading) process which is considered as the most time-consuming step during the DW life cycle. Previously, DW systems were not impacted by the latency of ETL since near-real-time decisions were not a necessity (Berkani et al., 2013).

Even with the new requirements of Big Data, some of the DW systems community researchers still defending it over BD. DW gives the users the possibility of executing many queries on the same stored data which is not possible with BD because data is not stored. If a user wants to execute another query, a Data Lake should be implemented which stores the most important unstructured data (Feugey, 2016).

Complete Article List

Search this Journal:

Reset

Volume 14: 1 Issue (2024): Forthcoming, Available for Pre-Order

Volume 13: 1 Issue (2023)

Volume 12: 4 Issues (2022)

Volume 11: 4 Issues (2021)

Volume 10: 4 Issues (2020)

Volume 9: 4 Issues (2019)

Volume 8: 4 Issues (2018)

Volume 7: 4 Issues (2017)

Volume 6: 4 Issues (2016)

Volume 5: 4 Issues (2015)

Volume 4: 4 Issues (2014)

Volume 3: 4 Issues (2012)

Volume 2: 4 Issues (2011)

Volume 1: 4 Issues (2010)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Data Warehouses and Big Data: How to Cope With Data Quality

Abstract

1. Introduction

Complete Article List