Multidimensional Business Benchmarking Analysis on Data Warehouses

Multidimensional Business Benchmarking Analysis on Data Warehouses

Akiko Campbell (LiveLabs Medical Laboratories, Burnaby, Canada), Xiangbo Mao (Simon Fraser University, Burnaby, Canada), Jian Pei (Simon Fraser University, Burnaby, Canada) and Abdullah Al-Barakati (King Abdulaziz University, Jeddah, Saudi Arabia)
Copyright: © 2017 |Pages: 25
DOI: 10.4018/IJDWM.2017010103


Benchmarking analysis has been used extensively in industry for business analytics. Surprisingly, how to conduct benchmarking analysis efficiently over large data sets remains a technical problem untouched. In this paper, the authors formulate benchmark queries in the context of data warehousing and business intelligence, and develop a series of algorithms to answer benchmark queries efficiently. Their methods employ several interesting ideas and the state-of-the-art data cube computation techniques to reduce the number of aggregate cells that need to be computed and indexed. An empirical study using the TPC-H data sets and the Weather data set demonstrates the efficiency and scalability of their methods.
Article Preview

1. Introduction

In business analysis, more often than not, one wants to compare an aggregate group with its peers in a multidimensional manner. For example, suppose in a company, an analyst is investigating the performance of the senior sales representatives in Asia in terms of average sales amount per representative. The analyst is interested in several factors including the product lines, customer industry, and transaction time in month. The analyst collects the transaction data into a relational table where each record corresponds to a transaction and contains the information about the sales representative id, the sales representative’s level, the region, the product sold, the customer id, the customer’s industry, the transaction date rounded to month and the amount of the transaction. Then, the analyst may want to find the other sales groups that have a dramatically better performance in some aspects than the group of senior sales representatives in Asia under analysis. For example, some answers interesting to the analysis may look like “the group in North America is the best senior sales group in terms of selling laptop computers to business customers”. This kind of questions are also known as benchmarking analysis in business analytics, since they try to find a benchmark for a query group.

Business benchmarking analysis is among the most popular practice in business analysis. The history of benchmarking analysis goes back to early 1980s when Xerox employed benchmarking as part of its “Leadership through Quality”, a program to find ways to reduce manufacturing costs. In 1982, Xerox determined that the average manufacturing cost of copies in Japanese companies was 40-50% of that of Xeroxs, and Xerox were able to undercut Xeroxs prices effortlessly. As part of the “Leadership through Quality”, Xerox established the benchmarking program, which played a major role in pulling Xerox out of trouble in the years to come. Xerox since then has become one of the best examples of the successful implementation of benchmarking in business (Zairi, 1996).

Benchmark queries can be very sophisticated. For example, one may add various constraints to refine the search space. Instead of comparing a query group with any group, one may be interested in only those groups that are super-groups, sub-groups or sibling groups of the query group. For instance, the super-groups of senior sales representatives in Asia contains only sales representatives in Asia, the senior sales representatives in the world, and all sales representatives in the world, while the sibling groups of senior sales representatives in Asia are the groups of senior sales representatives in other regions, such as North America, South America, and Europe, as well as the group of junior sales representatives in Asia.

Interestingly, business benchmarking analysis is related to egocentric analysis. Essentially, given a query group, egocentric analysis tries to identify the aspects that the query group is better than its peers. For example, take the group of senior sales representatives in Asia as the query group, the egocentric analysis tries to identify the factors that this group performs the best against the other groups. An answer may look like “comparing to the senior sales representatives in other regions, the query group has the best performance in selling desktop computers to education customers.” Intriguingly, if a query group cannot find a benchmark in a subspace, the subspace is an answer to the egocentric analysis on the same query group.

Data warehouses are essential information infrastructure in modern enterprises. However, to the best of our knowledge, how to conduct benchmarking analysis effectively and efficiently on data warehouses remains largely untouched from the technical point of view of data management and analytics. Benchmark queries cannot be answered online using the existing data cube and data warehouse techniques. Even we compute the whole data cube using all the attributes, given a query group, we still need to search the cube for the answers. It is well recognized that the size of a data cube is exponential with respect to the number of tuples and the dimensionality of the base table.

Complete Article List

Search this Journal:
Open Access Articles
Volume 16: 4 Issues (2020): 2 Released, 2 Forthcoming
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing