Article Preview
Top1. Introduction
Huge amount of data collected from multiple sources is loaded into enterprise data warehouse through a process known as ETL (Extraction, Transformation and Loading) (Thareja, 2009). Analytical processing is performed on this warehouse data using OLAP queries for strategic decision making. The results are generated after traversing through enormous amount of data. The performance of OLAP queries in terms of result retrieval time is an important aspect. To generate the results of frequent OLAP queries (Thareja (2009), Gupta (2014), Han et al. (2011), Rusu et al. (2004), Rusu et al. (2005), Tjioe and Taniar (2005)) the data warehouse is invoked repeatedly and is quite time-consuming resulting in performance overhead on the system. The OLAP queries may also use data marts (Thareja, 2009) for generating results which are subset of data warehouse. The data marts store data based on the needs of the users and contain less amount of data as compared to a data warehouse. Depending on the source of data, the data marts are classified as dependent data mart or independent data mart. The type of data mart used in the present study is dependent data mart where the source of data is the data warehouse as depicted in Figure 1.
Figure 1. Data Warehouse generated with data from multiple sources and generating dependent Data Marts
The two major existing approaches used for retrieving query results from a data warehouse are multidimensional data cubes and materialized views. The multidimensional data cubes (Gupta (2014) and Han et al. (2011)) are used to store the results of aggregate queries while materialized views (Gupta et al. (1993) and Gupta and Mumick (1995)) store query results along with the view definition. The major issues faced while using data cubes and materialized views is explained in detail in section 2.
In the present study, the executed OLAP queries are stored along with their results and some necessary metadata information into a relational database referred here as MQDB (Materialized Query Database) (Chakraborty and Doshi, 2018a). The metadata of a query includes timestamp, frequency, threshold, number of records in output, path of result table and path of data mart (for processing incremental data). When an OLAP query is fired, first it is determined if its synonymous query exists in MQDB. If the tables, fields, functions and criteria of the input query and stored query are same then they generate same results and therefore these queries are considered as synonymous queries. For a synonymous query, the requirement of incremental update is determined. If no incremental updates are required then the stored results are fetched from MQDB (Chakraborty and Doshi, 2018a). For the synonymous queries requiring an incremental update, incremental results are generated using data marts. Generating incremental results of the query using data mart is faster due to less number of records as compared to a data warehouse (Chakraborty and Doshi, 2018b). Thereafter, the final results are derived by combining stored results with the incremental results using arithmetic operations.
Top2. Major Existing Approaches And Drawbacks
The two major existing approaches used for retrieving query results from a data warehouse are using multidimensional data cubes and materialized views. A brief discussion about the approaches with their issues is presented in section 2.1 and 2.2.