SQL Scorecard for Improved Stability and Performance of Data Warehouses

SQL Scorecard for Improved Stability and Performance of Data Warehouses

Nayem Rahman (Intel Corporation, Hillsboro, OR, USA)
Copyright: © 2016 |Pages: 16
DOI: 10.4018/IJSI.2016070102
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Scorecard-based measurement techniques are used by organizations to measure the performance of their business operations. A scorecard approach could be applied to a database system to measure performance of SQL (Structured Query Language) being executed and the extent of resources being used by SQL. In a large data warehouse, thousands of jobs run daily via batch cycles to refresh different subject areas. Simultaneously, thousands of queries by business intelligence tools and ad-hoc queries are being executed twenty-four by seven. There needs to be a controlling mechanism to make sure these batch jobs and queries are efficient and do not consume database systems resources more than optimal. The authors propose measurement of SQL query performance via a scorecard tool. The motivation behind using a scorecard tool is to make sure that the resource consumption of SQL queries is predictable and the database system environment is stable. The experimental results show that queries that pass scorecard evaluation criteria tend to utilize optimal level of database systems computing resources. These queries also show improved parallel efficiency (PE) in using computing resources (CPU, I/O and spool space) that demonstrate the usefulness of SQL scorecard.
Article Preview

1. Introduction

The data warehouse is used to hold historical and cross-functional data. Organizations use data warehouses as their integrated enterprise repository of data coming from disparate operational sources. As the business environment has become more global and competitive the data warehouse has proved to be a very critical technology for an organization to better manage and enhance its information. This in turn helps an organization become more competitive, better understand its customers, and more rapidly meet market demands (Roberts & Grover, 2012; Akhter & Rahman, 2015; Wixom & Watson, 2001).

In order to provide faster business information to different groups of users including the business intelligence (BI) community, a data warehouse should be stable and its computing resources available to users (Rahman, 2010). To achieve that goal, the queries that get executed on a data warehouse need to be efficient. Efficiency of queries can be ensured only by putting some consistent and repeatable measures (Chandra et al., 2015; Mahmood et al., 2014) on the quality of queries which will provide necessary indicators about the on-going health of a data warehouse environment. We need to make sure the analytical tools and SQL queries we run on the data warehouse, use optimal resources, and have parallel efficiency (Rahman, 2013b). In addition to queries, data warehouse design, extract-transform-load (ETL) development, and load strategy need to be efficient. We need load process strategies that save database management system (DBMS) resources (Rahman, 2010; Narasayya et al., 2015) and ensure DBMS resources availability of analytical tools and query processing while the load is running.

Kaplan and Norton (1992) proposed a balance scorecard approach and compared it with dials and indicators of an airline cockpit that provides detailed information about complex tasks used by pilots when navigating and flying an airplane. In order to ensure the quality of SQL that run on the data warehouse and get indicators on SQL performance we propose a balanced scorecard approach (Rahman, 2013a) for the SQL used to load tables or retrieve information from the data warehouse.

In our case, a balanced scorecard allows the data warehouse team and other stakeholders to look at the data warehouse performance from four important perspectives and provides answers to the following questions. How do users, including analytical and BI community, see us running the data warehouse (customer perspective)? What must we excel at in maintaining a data warehouse (internal perspective)? How can we continue to improve and create value by overcoming technical challenges such as innovation (Ramamurthy et al., 2008) (learning perspective)? How do we enable management initiatives (financial perspective)? A balanced scorecard minimizes information overload by limiting the number of measures used (Kaplan & Norton; 1992, 2007). We need to make sure that the database system resources such as CPU time and I/O utilization are as optimal as possible. This can be achieved by introducing SQL scorecard tools.

A data warehouse environment performance and stability depends on several factors including the design of SQL queries, data warehouse logical and physical data modeling (Bowen et al., 2009; Schlesinger & Rahman, 2015), and capacity planning. Writing efficient SQL queries is the most important piece among the stated factors. Gartner reports that analysts are running more ad hoc, but equally, complex queries, each competing for CPU, memory and disk access (Feinberg & Beyer, 2010). This research is conducted to tackle the issues of SQL queries that are run to load tables by executing stored procedures and macros, executing queries by various reporting and business intelligence (BI) tools. In a data warehouse hundreds of queries are run at any point in time either to load tables via batch cycles (Rahman, 2010) or retrieve information by analytical community using reporting and BI tools. It is very important that SQL queries are written efficiently. In a parallel processing data warehousing architecture, the SQL queries that are written must be compliant with PE architecture (Deepak et al., 2012; Rahman, 2013b).

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 6: 4 Issues (2018): 1 Released, 3 Forthcoming
Volume 5: 4 Issues (2017)
Volume 4: 4 Issues (2016)
Volume 3: 4 Issues (2015)
Volume 2: 4 Issues (2014)
Volume 1: 4 Issues (2013)
View Complete Journal Contents Listing