Data Warehousing for Association Mining

Data Warehousing for Association Mining

Yuefeng Li (Queensland University of Technology, Australia)
Copyright: © 2009 |Pages: 6
DOI: 10.4018/978-1-60566-010-3.ch093
OnDemand PDF Download:
$37.50

Abstract

With the phenomenal growth of electronic data and information, there are many demands for developments of efficient and effective systems (tools) to address the issue of performing data mining tasks on data warehouses or multidimensional databases. Association rules describe associations between itemsets (i.e., sets of data items) (or granules). Association mining (or called association rule mining) finds interesting or useful association rules in databases, which is the crucial technique for the development of data mining. Association mining can be used in many application areas, for example, the discovery of associations between customers’ locations and shopping behaviours in market basket analysis. Association mining includes two phases. The first phase is called pattern mining that is the discovery of frequent patterns. The second phase is called rule generation that is the discovery of the interesting and useful association rules in the discovered patterns. The first phase, however, often takes a long time to find all frequent patterns that also include much noise as well (Pei and Han, 2002). The second phase is also a time consuming activity (Han and Kamber, 2000) and can generate many redundant rules (Zaki, 2004) (Xu and Li, 2007). To reduce search spaces, user constraintbased techniques attempt to find knowledge that meet some sorts of constraints. There are two interesting concepts that have been used in user constraint-based techniques: meta-rules (Han and Kamber, 2000) and granule mining (Li et al., 2006). The aim of this chapter is to present the latest research results about data warehousing techniques that can be used for improving the performance of association mining. The chapter will introduce two important approaches based on user constraint-based techniques. The first approach requests users to inputs their meta-rules that describe their desires for certain data dimensions. It then creates data cubes based these meta-rules and then provides interesting association rules. The second approach firstly requests users to provide condition and decision attributes that used to describe the antecedent and consequence of rules, respectively. It then finds all possible data granules based condition attributes and decision attributes. It also creates a multi-tier structure to store the associations between granules, and association mappings to provide interesting rules.
Chapter Preview
Top

Background

Data warehouse mainly aims to make data easily accessible, present data consistently and be adaptive and resilient to change (Kimball and Ross, 2002). A data warehouse is an application that contains a collection of data, which is subject-oriented, integrated, non-volatile and time-variant, supporting management’s decisions (Inmon, 2005). Data warehousing focuses on constructing and using data warehouses. The construction includes data cleaning, data integration and data consolidation. After these steps, a collection of data in a specific form can be stored in a data warehouse.

Data warehouses can also provide clean, integrated and complete data to improve the process of data mining (Han and Kamber, 2000). Han and Kamber also defined different levels of the integration of data mining and data warehouse. At the loosest level the data warehouse only acts as a normal data source of data mining. While at the tightest level both the data warehouse and data mining are sub-components that cooperate with each other. In a data mining oriented data warehouse, the data warehouse not only cleans and integrates data, but also tailors data to meet user constraints for knowledge discovery in databases. Thus, data mining can return what users want in order to improve the quality of discovered knowledge.

It is painful when we review the two steps in association mining: both take a long time and contain uncertain information for determining useful knowledge. Data mining oriented data warehousing is a promising direction for solving this problem. It refers to constructing systems, in which both the data mining and data warehouse are a sub-component cooperating with each other. Using these systems, the data warehouse not only cleans and integrates data, but tailors data to fit the requirements of data mining. Thus, data mining becomes more efficient and accurate. In this chapter we discuss how data warehousing techniques are useful for association mining.

Complete Chapter List

Search this Book:
Reset