Ranking Gradients in Multi-Dimensional Spaces

Ranking Gradients in Multi-Dimensional Spaces

Ronnie Alves (University of Nice Sophia-Antipolis, France), Joel Ribeiro (University of Minho, Portugal), Orlando Belo (University of Minho, Portugal) and Jiawei Han (University of Illinois at Urbana-Champaign, USA)
DOI: 10.4018/978-1-60566-748-5.ch011


Business organizations must pay attention to interesting changes in customer behavior in order to anticipate their needs and act accordingly with appropriated business actions. Tracking customer’s commercial paths through the products they are interested in is an essential technique to improve business and increase customer satisfaction. Data warehousing (DW) allows us to do so, giving the basic means to record every customer transaction based on the different business strategies established. Although managing such huge amounts of records may imply business advantage, its exploration, especially in a multi-dimensional space (MDS), is a nontrivial task. The more dimensions we want to explore, the more are the computational costs involved in multi-dimensional data analysis (MDA). To make MDA practical in real world business problems, DW researchers have been working on combining data cubing and mining techniques to detect interesting changes in MDS. Such changes can also be detected through gradient queries. While those studies have provided the basis for future research in MDA, just few of them points to preference query selection in MDS. Thus, not only the exploration of changes in MDS is an essential task, but also even more important is ranking most interesting gradients. In this chapter, the authors investigate how to mine and rank the most interesting changes in a MDS applying a TOP-K gradient strategy. Additionally, the authors also propose a gradient-based cubing method to evaluate interesting gradient regions in MDS. So, the challenge is to find maximum gradient regions (MGRs) that maximize the task of raking gradients in a MDS. The authors’ evaluation study demonstrates that the proposed method presents a promising strategy for ranking gradients in MDS.
Chapter Preview

1 Introduction

The amount of data available in business data warehouses is increasing every day. To get interesting insights from those multi-dimensional databases is not a simple task. Since 1990s, several efforts were made in OLAP research area, making Multi-Dimensional data analysis (MDA) effective and real on several business applications. However, being effective does not imply being practical. In that time, DW researchers were concerned about how to deal with data cubes. Data cube is a multi-dimensional data structure from which one can dig interesting trends from DW. So, questions like how to build, how to explore, how to index, and how to maintain were in the agenda. Once the cube was available, business analysts could explore it, testing several what-if scenarios, and figuring out interesting business opportunities. Given that such inspection was usually carried out manually, it would be reasonable to mine interesting trends automatically, e.g., by data cubing (Sarawagi, 1998; Sarawagi, 2000; Sathe, 2001).

Taking the “wave” of data mining methods in the last decade, a bunch of papers were written about combining cubing and mining, also called OLAPing, for getting better MDA over DWs (Imielinski, 2002; Dong, 2004; Sarawagi, 1998; Sarawagi, 2000; Sathe, 2001; Wang, 2006). Using those hybrid strategies, one can evaluate cube’s dimensions and measures, not only before of after cubing, but even more sophisticated, while cubing. Those approaches are also called as change-based, exception-based or outlier-based methods.

Since data cube is usually big, independently of applying any of the current OLAPing methods, the amount of interesting patterns that could be brought out from them is still big too. Therefore, it is necessary to provide preference selection among those patterns, i.e., ranking most interesting patterns for further analysis. Ranking queries (Chang, 2000; Hristidis, 2001; Bruno, 2002) have been studied substantially by the information retrieval and database communities, and recently attracted attention of OLAP researchers (Xin, 2006a; Wu, 2008).

In this paper, we present a new OLAPing method which combines efforts from both active research areas: OLAPing and Ranking. Our main goal is to mining the most interesting (TOP-K) changes in an MDS applying a gradient-based cubing strategy. The challenge is to find Maximum Gradient Regions (MGRs) that maximize the task of mining Top-K gradient cells. We also introduce several constraints related to mine TOP-K gradients that help us to mine the MDS efficiently. These constraints include support threshold (iceberg), closedness (that indicates the closed cells in the cube), spreadness (which measure the variability of gradient regions), gradient threshold (which focuses the search to most interesting changes in the MDS) and TOP-K (the number of interesting gradient cells to locate). Different from the previous studies, we call this strategy Raking Gradient-based Aggregation Mining. Our solution to this problem consists of: 1) an efficient partitioning method based on gradient-regions, 2) an effective TOP-K gradient-cubing method which prunes non-gradient cells and also guides TOP-K search efficiently.

Complete Chapter List

Search this Book: