Mining Association Rules from Fuzzy DataCubes

Mining Association Rules from Fuzzy DataCubes

Nicolás Marín (University of Granada, Spain), Carlos Molina (University of Jaen, Spain), Daniel Sánchez (University of Granada, Spain) and M. Amparo Vila (University of Granada, Spain)
DOI: 10.4018/978-1-60566-858-1.ch004
OnDemand PDF Download:
$37.50

Abstract

The use of online analytical processing (OLAP) systems as data sources for data mining techniques has been widely studied and has resulted in what is known as online analytical mining (OLAM). As a result of both the use of OLAP technology in new fields of knowledge and the merging of data from different sources, it has become necessary for models to support imprecision. We, therefore, need OLAM methods which are able to deal with this imprecision. Association rules are one of the most used data mining techniques. There are several proposals that enable the extraction of association rules on DataCubes but few of these deal with imprecision in the process and give as result complex rule sets. In this chapter the authors will present a method that manages the imprecision and reduces the complexity. They will study the influence of the use of fuzzy logic using different size problems and comparing the results with a crisp approach.
Chapter Preview
Top

Introduction

As defined by OLAP Council (2007) “On-Line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user”. According to Han (1997), the use of OLAP systems in data mining is interesting for the following three main reasons:

  • Data mining techniques need integrated, consistent and clean data to work with (Fayyad, Piatetsky-Shapiro, Smyth, & Uthurusamy, 1996). The data processing performed when building a data warehouse guarantees these qualities in data and converts data warehouses into good data sources for data mining.

  • Users frequently need to explore the stored data, selecting only a portion of them, and might want to analyze data at different abstraction levels (different levels of granularity). OLAP systems are designed to ease these operations in a flexible way . The integration of data mining techniques with OLAP provides the user with even more flexibility.

  • It is difficult to predict what knowledge is required a priori. The integrated use of OLAP and suitable data mining methods allows the user to obtain this knowledge using different approaches and representations.

Information in decision support systems usually has an ill-defined nature. The use of data from human interaction may enrich the analysis (Gorry & Morton, 1971) and, nowadays, it is common for companies to require external data for strategic decisions. These external data are not always compatible with the format of internal information and even if they are, they are not as reliable as internal data. Moreover, information may also be obtained from semi-structured or non-structured sources.

In addition, OLAP systems are now being used in new fields of knowledge (e.g. medical data) that present complex domains which are difficult to represent using crisp structures (Lee & Kim, 1997). In all these cases, flexible models and query languages are needed to manage this information.

These reasons, among many others, justify the search for multidimensional models which are able to represent and manage imprecision. Some significant proposals in this direction can be found in the literature (Laurent, 2002; Jensen, Kligys, Pedersen, & Timko, 2004; Alhajj & Kaya, 2003; Molina, Sánchez, Vila, & Rodríguez-Ariza, 2006). These proposals support imprecision from different perspectives. In (Molina, Sánchez, Vila, & Rodríguez-Ariza, 2006), we propose a fuzzy multidimensional model that manages imprecision both in facts and in the definition of hierarchical relationships. These proposals organize imprecise data using DataCubes (imprecise DataCubes) and it is therefore necessary to develop data mining techniques that can work over these imprecise DataCube models.

Our aim in this chapter is to study the influence of using fuzzy logic in the scalability of a method to extract association rules from a fuzzy multidimensional model that can represent and manage imprecision in different aspects: COGARE. As we have already mentioned, previous proposals in the literature are directed towards obtaining as many associations as possible. However, they produce complex results (e.g. a high number of rules, rules that represent the same knowledge at different detail levels, etc.). In contrast, this proposal has two main goals:

  • Firstly, to manage data imprecision throughout the entire process.

  • Secondly, to reduce the complexity of the final result using both the fuzzy concepts and the hierarchical relation between elements, without reducing the quality of the rule set.

Complete Chapter List

Search this Book:
Reset