Article Preview
TopIntroduction And Motivations
The extraction of constrained multidimensional patterns in the area of multidimensional database mining (OLAP database mining or data mining from categorical database relations) is achieved for solving various problems such as discovering multidimensional association rules (Lu, Feng, & Han, 2000), Roll-Up dependencies (Calders, Ng, & Wijsen, 2002), multidimensional constrained gradients (Dong et al., 2004), closed constrained gradients (Wang, Han, & Pei, 2006), classification rules (Liu, Hsu, & Ma, 1998; Li, Han, & Pei, 2001), correlation rules (Brin, Motwani, & Silverstein, 1997; Grahne, Lakshmanan, & Wang, 2000), datacube (Gray et al., 1997; Beyer & Ramakrishnan, 1999; Han, Pei, Dong, & Wang, 2001; Xin, Han, Li, & Wah, 2003), iceberg cube (Beyer & Ramakrishnan, 1999; Han et al., 2001), emerging cube (Nedjar, Casali, Cicchetti, & Lakhal, 2007; Casali, Nedjar, Cicchetti, & Lakhal, 2009b), quotient cube (Lakshmanan, Pei, & Han, 2002), and closed cube (Casali, Cicchetti, & Lakhal, 2003b; Li & Wang, 2005; Xin, Shao, Han, & Liu, 2006; Casali, Nedjar, Cicchetti, & Lakhal, 2009a). We believe that a precise semantics is required for characterizing the search space and solving these multidimensional data mining problems. Such semantics can be captured through an algebraic structure, the cube lattice, provided with a similar expression power than the power set lattice, which is used for binary database mining (transaction database mining (Agrawal, Mannila, Srikant, Toivonen, & Verkamo, 1996)).
Adapting to this new multidimensional context, approaches and algorithms successfully used when mining binary databases (and thus using the power set lattice as a search space) is possible but not relevant. However, such adaptations have been frequently proposed for the extraction of quantitative association rules (Srikant & Agrawal, 1996) and for classification (Liu et al., 1998; Li et al., 2001). Moreover, (Beyer & Ramakrishnan, 1999; Han et al., 2001) have extended Apriori (Agrawal et al., 1996) for computing iceberg cubes and observed that such extensions “perform terribly”. Reasons behind these failures are the following. Firstly each multidimensional attribute must be replaced by a set of binary attributes, each of which representing a single value of the multidimensional attribute (Srikant & Agrawal, 1996). If the original attribute domains are large (which is the case in data warehouses (Beyer & Ramakrishnan, 1999)), the attribute substitution results in dealing with a great number of binary attributes (called items). On the other hand, the search space, considered when mining binary databases, is the lattice representing the power set of items. Nevertheless, this large search space encompasses a great number of solutions, which are known to be semantically erroneous in a multidimensional context. Let us suppose that in a relation the attribute A has k distinct values. It is replaced by k binary attributes a1,..., ak. In the powerset lattice, all the couples (ai,aj) with 1 ≤ i, j ≤ k and i ≠ j are considered and evaluating a constraint requires a costly memory space and a pass over the binary relation. But we know that the original data set does not contain any pattern (ai,aj), simply because the initial attribute A is atomic and thus its values ai and aj are exclusive.