Association Bundle Identification
Wenxue Huang (Generation5 Mathematical Technologies, Inc., Canada), Milorad Krneta (Generation5 Mathematical Technologies, Inc., Canada), Limin Lin (Generation5 Mathematical Technologies, Inc., Canada) and Jianhong Wu (Mathematics and Statistics Department, York University, Toronto, Canada)
Copyright: © 2009
An association pattern describes how a group of items (for example, retail products) are statistically associated together, and a meaningful association pattern identifies ‘interesting’ knowledge from data. A wellestablished association pattern is the association rule (Agrawal, Imielinski & Swami, 1993), which describes how two sets of items are associated with each other. For example, an association rule A-->B tells that ‘if customers buy the set of product A, they would also buy the set of product B with probability greater than or equal to c’. Association rules have been widely accepted for their simplicity and comprehensibility in problem statement, and subsequent modifications have also been made in order to produce more interesting knowledge, see (Brin, Motani, Ullman and Tsur, 1997; Aggarwal and Yu, 1998; Liu, Hsu and Ma, 1999; Bruzzese and Davino, 2001; Barber and Hamilton, 2003; Scheffer, 2005; Li, 2006). A relevant concept is the rule interest and excellent discussion can be found in (Shapiro 1991; Tan, Kumar and Srivastava, 2004). Huang et al. recently developed association bundles as a new pattern for association analysis (Huang, Krneta, Lin and Wu, 2006). Rather than replacing the association rule, the association bundle provides a distinctive pattern that can present meaningful knowledge not explored by association rules or any of its modifications.
Association bundles are important to the field of Association Discovery. The following comparison between association bundles and association rules support this argument. This comparison is made with focus on the association structure.
An association structure describes the structural features of an association pattern. It tells how many association relationships are presented by the pattern, and whether these relationships are asymmetric or symmetric, between-set or between-item. For example, an association rule contains one association relationship, and this relationship exists between two sets of item, and it is asymmetric from the rule antecedent to the rule consequent. However, the asymmetric between-set association structure limits the application of association rules in two ways. Firstly, when reasoning based on an association rule, the items in the rule antecedent (or consequent) must be treated as whole -a combined item, not as individual items. One can not reason based on an association rule that a certain individual antecedent item, as one of the many items in rule antecedent, is associated with any or all of the consequent items. Secondly, one must be careful that this association between the rule antecedent and the rule consequent is asymmetric. If the occurrence of the entire set of antecedent items is not deterministically given, for example, the only given information is that a customer has chosen the consequent items, not the antecedent items, it is highly probably that she/he does not chose any of the antecedent items. Therefore, for applications where between-item symmetric associations are required, for example, cross selling a group of items by discounting on one of them, association rules cannot be applied.