Article Preview
TopIntroduction
Data warehouse (Franconi & Kamblet, 2004) is a repository of historical data which is consolidated in multidimensional format. In warehouse, data is stored in a standard structure which is obtained by integrating data from different operational sources of an organization. Business analyst (Ayhan et al., 2013; Snezana & Violeta, 2010) can access that data, perform analysis, apply business intelligence tool and make a prediction as well as take strategic decision. For maintaining a data warehouse, the main focus is to manage the large amount of data generated from different type of systems (SAP, ERP, Oracle, Mainframe etc.) and store those data in a uniform structure. For managing the uniformity of data, ETL has a very important role. ETL is a widely used process in business organizations. It identifies and extracts data from various sources, filters and customizes those data according to the required format, at last integrate and update it into data warehouse (Vassiliadis, 2009). Configuring an ETL process is one of the key factors having direct impact over cost, time and effort for establishment of a successful data warehouse. Data modeling (Cagiltay, Topalli, Aykac, & Tokdemir, 2013) gives an abstract view about how the data will be arranged in an organization and how they will be managed. By applying data modeling techniques, the relationship between different data items can be visualized. The modeling concept has a great benefit over organizational data to manage it in a structural way. At starting phase, it is highly recommended to make an efficient modeling and design of the total workflow. Due to the expensive nature of warehouse implementation, good modeling, as well as documentation should be maintained. Based on the report of (Eckerson & White, 2003), designing a well-established ETL workflow consumes almost one-third of cost and effort in a DW implementation. A well-designed ETL process is one of the important aspects to accomplish an effective DW. Each vendor provided tool has their own specific methodology for designing the ETL process (Barateiro & Galhardas, 2005; Kherdekar & Metkewar, 2016). It requires understanding about functionality, language, standards etc. about that particular tool. Moreover, the integrated design is not suitable to execute in other platforms.
During the ETL processing, conceptual modeling reflects the high-level view of entities and relationship among them. It only provides an abstract view of the workflow instead of the implementation details. Different research work has been done for conceptual modeling of ETL. UML, BPMN and Semantic Web are commonly used so far for conceptual modeling techniques. A new way for modeling is proposed an ETL process using a system modeling language (SysML) in our previous work (Biswas et al., 2017).
Although there are many contributions towards ETL abstract modeling is done, SysML is a new direction for conceptualizing and validating of ETL workflow. There is a lot of research scope using SysML to practically implement ETL model, validation, simulation, executable code production in a specific way for the sake of both technical and non-technical users.