Ontology Development for ETL Process Design

Azman Ta’a (Universiti Utara Malaysia, Malaysia) and Mohd Syazwan Abdullah (Universiti Utara Malaysia, Malaysia)
The Extract, Transform, Load (ETL) process design is difficult to perform because of the ambiguity of user requirements and the complexity of data integration and transformation. Current studies have explored the ontology-based approach to overcome these limitations by reconciling the semantics of user requirements within the ETL process design for easy generation of the ETL process specification. The ontology for ETL process activities has been developed by using the Requirement Analysis Method for ETL Processes (RAMEPs) that is gathered from the perspectives of organization, decision-maker, and developer. Therefore, the ontology is used to generate the ETL process specification for a student affairs’ Data Warehouse (DW) system. The correctness of the ontology model was validated by using an appropriate reasoner. Moreover, the process of ontology development for the case study is presented and shows how the ontology-based approach was successful in implementing the design and generating the ETL process specification.
The representation of the ETL processes through an ontological approach is suitable to generate the ETL process specification programmatically. Apparently, the ontology formally provides meaning for data integration and transformation activities. However, current works have not properly explained the ontology development process for ETL process activities. Particularly, the ontology is concerned about the definition of ETL process operations and inter-mapping between user requirements toward the relevant data sources. Our approach will utilize the application ontology for DW domain, which has not been given much attention by researchers as well as practitioners in building the ontology. Moreover, the heterogeneous scenario of DW system will be structured according to the ontology in order to highlight the data source conflicts that need to be resolved.

In the ontology development process, the construction of ontology is semantically described by the user requirement glossaries. The semantics of the user requirements is described at a high-level meaning, so that the user requirements can be possibly mapped to the data sources for accomplishing the transformation and integration processes. Strong linkages between requirement glossaries and appropriates data sources through ontology structure will produce the ETL process specification programmatically. This can be done through invoking an appropriate algorithm and reasoning to the ontology. Furthermore, the use of ontology is based on Description Logic (DL), which constitutes the most commonly used knowledge representation formalism (Sirin, Parsia, Grau, Kalyanpur, & Katz, 2007).

