A New Approach for Conceptual Extraction-Transformation-Loading Process Modeling

A New Approach for Conceptual Extraction-Transformation-Loading Process Modeling

Neepa Biswas (Department of Information Technology, Jadavpur University, Kolkata, India), Samiran Chattapadhyay (Department of Information Technology, Jadavpur University, Kolkata, India), Gautam Mahapatra (DRDO, Ministry of Defence, Govt of India, Research Centre Imarat, Kurmalguda, India), Santanu Chatterjee (DRDO, Ministry of Defence, Govt of India, Research Centre Imarat, Kurmalguda, India) and Kartick Chandra Mondal (Department of Information Technology, Jadavpur University, Kolkata, India)
Copyright: © 2019 |Pages: 16
DOI: 10.4018/IJACI.2019010102

Abstract

Erroneous or incomplete data generated from various sources can have direct impact in business analysis. Extracted data from sources need to load into data warehouse after required transformation to reduce error and minimize data loss. This process is also known as Extraction-Transformation-Loading (ETL). High-level view of the system activities can be visualized by conceptual modeling of ETL process. It provides the advantage of pre-identification of system error, cost minimization, scope and risk assessment etc. A new modeling approach is proposed for conceptualization ETL process by using a standard Systems Modeling Language (SysML). For handling increasing complexity of any system model, it is preferable to go through verification and validation process in early stage of system development. In this article, the authors' previous work is extended by presenting a MBSE based approach to automate the SysML model's validation by using No Magic simulator. Here, the main objective is to overcome the gap between modeling and simulation and to examine the performance of the proposed SysML model. The usefulness of the authors' approach is exhibited by using a use case scenario.
Article Preview

Introduction

Data warehouse (Franconi & Kamblet, 2004) is a repository of historical data which is consolidated in multidimensional format. In warehouse, data is stored in a standard structure which is obtained by integrating data from different operational sources of an organization. Business analyst (Ayhan et al., 2013; Snezana & Violeta, 2010) can access that data, perform analysis, apply business intelligence tool and make a prediction as well as take strategic decision. For maintaining a data warehouse, the main focus is to manage the large amount of data generated from different type of systems (SAP, ERP, Oracle, Mainframe etc.) and store those data in a uniform structure. For managing the uniformity of data, ETL has a very important role. ETL is a widely used process in business organizations. It identifies and extracts data from various sources, filters and customizes those data according to the required format, at last integrate and update it into data warehouse (Vassiliadis, 2009). Configuring an ETL process is one of the key factors having direct impact over cost, time and effort for establishment of a successful data warehouse. Data modeling (Cagiltay, Topalli, Aykac, & Tokdemir, 2013) gives an abstract view about how the data will be arranged in an organization and how they will be managed. By applying data modeling techniques, the relationship between different data items can be visualized. The modeling concept has a great benefit over organizational data to manage it in a structural way. At starting phase, it is highly recommended to make an efficient modeling and design of the total workflow. Due to the expensive nature of warehouse implementation, good modeling, as well as documentation should be maintained. Based on the report of (Eckerson & White, 2003), designing a well-established ETL workflow consumes almost one-third of cost and effort in a DW implementation. A well-designed ETL process is one of the important aspects to accomplish an effective DW. Each vendor provided tool has their own specific methodology for designing the ETL process (Barateiro & Galhardas, 2005; Kherdekar & Metkewar, 2016). It requires understanding about functionality, language, standards etc. about that particular tool. Moreover, the integrated design is not suitable to execute in other platforms.

During the ETL processing, conceptual modeling reflects the high-level view of entities and relationship among them. It only provides an abstract view of the workflow instead of the implementation details. Different research work has been done for conceptual modeling of ETL. UML, BPMN and Semantic Web are commonly used so far for conceptual modeling techniques. A new way for modeling is proposed an ETL process using a system modeling language (SysML) in our previous work (Biswas et al., 2017).

Although there are many contributions towards ETL abstract modeling is done, SysML is a new direction for conceptualizing and validating of ETL workflow. There is a lot of research scope using SysML to practically implement ETL model, validation, simulation, executable code production in a specific way for the sake of both technical and non-technical users.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2020): Forthcoming, Available for Pre-Order
Volume 10: 4 Issues (2019): 2 Released, 2 Forthcoming
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 2 Issues (2016)
Volume 6: 2 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing