Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Towards Extract-Transform-Load Operations in a Big Data context

Hana Mallek, Faiza Ghozzi, Faiez Gargouri

Source Title: International Journal of Sociotechnology and Knowledge Development (IJSKD) 12(2)

DOI: 10.4018/IJSKD.2020040105

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Big Data emerged after a big explosion of data from the Web 2.0, digital sensors, and social media applications such as Facebook, Twitter, etc. In this constant growth of data, many domains are influenced, especially the decisional support system domain, where the integration of processes should be adapted to support this huge amount of data to improve analysis goals. The basic purpose of this research article is to adapt extract-transform-load processes with Big Data technologies, in order to support not only this evolution of data but also the knowledge discovery. In this article, a new approach called Big Dimensional ETL (BigDimETL) is suggested to deal with ETL basic operations and take into account the multidimensional structure. In order to accelerate data handling, the MapReduce paradigm is used to enhance data warehousing capabilities and HBase as a distributed storage mechanism. Experimental results confirm that the ETL operation performs well especially with adapted operations.

Article Preview

Top

1. Introduction

Nowadays, the availability of smart devices and communication systems used by billions of end users has led to the emergence of Big Data notion, which is characterized by 4Vs as presented by Gartner¹: the Volume, the velocity, the variety, and the veracity. From this perspective, certain statistics reported by IBM analytic² indicate that 6 billion of people from 7 billion of the world population have at least one Smartphone. This has increased to a great extent the number of connected people on the internet, which multiplied the volume of data 300 times from 2005 to 2020 amounting to about 40 Zettabytes of data. Hence, Big data is a term applied to gigantic, unstructured and heterogeneous datasets whose size and type exceed the ability of traditional relational databases to manage, capture, and process data. Thus, this explosion of data and technologies stands for a big challenge for multiple domains, and particularly in terms of Decision Support System (DSS) (Kimball & Caserta, 2011) especially data mining and knowledge discovery activities (Storey & Song, 2017). Within this framework of reference, many researchers, focused upon this big evolution by handling and analyzing the sensor data (Trauth & Browning, 2018), logistic data (AlShaer et al., 2019), data of social media (Gupta & Aluvalu, 2019), etc. The significance of analyzing this big evolution makes it intrinsic to take into consideration the basic element of DSS, which is called Extract-Transform-Load (ETL). According to (Vassiliadis et al., 2002) ETL implementation may take up to 80% of the DW project. Typically, ETL is composed of several operations such as Selection, conversion, filtering, Join, etc.; executed sequentially in order to capture, integrate and filter data so as to be loaded in DW. In fact, these classical operations cannot bear the big evolution of data. Moreover, Relational Database Management Systems (RDBMS) are not suitable for distributed databases as argued in (Boussahoua et al., 2017). For this reason, ETL process needs much attention to be adapted to deal with the big explosion of data in order to generate handled data into the DW.

Several technologies have appeared with the emergence of Big Data such as MapReduce (Dean & Ghemawat, 2008) with Google to process a big amount of data. In addition, NoSQL (Not only SQL) databases have appeared to store unstructured data on column-oriented such as HBase (George, 2011) or document-oriented such as MongoDB (Chodorow & Dirolf, 2010), or as a graph-oriented (Pokornỳ, 2015).

Indeed, this paper presents BigDimETL approach that applies Big Data technologies, which support the scalability and the performance, to adapt the Extraction and Transformation phases of ETL. Within this context, the adaptation of data processing is based on adding the parallelism aspect through Hadoop (White, 2012) ecosystem. The latter corresponds to an open-source framework for handling unstructured data using a parallel processing technique called MapReduce paradigm (Dean & Ghemawat, 2008) in order to minimize time-consumption. Besides, the HBase database is considered in BigDimETL as a column-oriented data store in order to support complex data instead of classical Relational database. Moreover, the goal of the proposed approach is to make the reformulation of ETL processes by retaining the specificities of the multidimensional structure of DW. The latter is considered as a high-level DW/ETL specific constructs (Liu et al., 2013). It is dedicated to online analytical processing and business intelligence applications. The central focus of this research work is upon modeling ETL operations in the formal level of extraction and transformation phases. Accordingly, in the extraction phase, conversion and vertical partitioning methods are invested to minimize the overload into the transforming and loading phase. However, the transformation phase proves to support the most used operations for treating and filtering data

Complete Article List

Search this Journal:

Reset

Volume 16: 1 Issue (2024)

Volume 15: 1 Issue (2023)

Volume 14: 4 Issues (2022): 2 Released, 2 Forthcoming

Volume 13: 4 Issues (2021)

Volume 12: 4 Issues (2020)

Volume 11: 4 Issues (2019)

Volume 10: 4 Issues (2018)

Volume 9: 4 Issues (2017)

Volume 8: 4 Issues (2016)

Volume 7: 4 Issues (2015)

Volume 6: 4 Issues (2014)

Volume 5: 4 Issues (2013)

Volume 4: 4 Issues (2012)

Volume 3: 4 Issues (2011)

Volume 2: 4 Issues (2010)

Volume 1: 4 Issues (2009)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Towards Extract-Transform-Load Operations in a Big Data context

Abstract

1. Introduction

Complete Article List