Association rule mining typically produces large numbers of rules, thereby creating a second-order data mining problem: which of the generated rules are the most interesting? And: should interestingness be measured objectively or subjectively? To tackle the amount of rules that are created during the mining step, the authors propose the combination of two novel ideas: first, there is rule change mining, which is a novel extension to standard association rule mining which generates potentially interesting time-dependent features for an association rule. It does not require changes in the existing rule mining algorithms and can therefore be applied during post-mining of association rules. Second, the authors make use of the existing textual description of a rule and those newly derived objective features and combine them with a novel approach towards subjective interestingness by using relevance feedback methods from information retrieval. The combination of these two new approaches yields a powerful, intuitive way of exploring the typically vast set of association rules. It is able to combine objective and subjective measures of interestingness and will incorporate user feedback. Hence, it increases the probability of finding the most interesting rules given a large set of association rules.
Nowadays, the discovery of association rules is a relatively mature and well-researched topic. Many algorithms have been proposed to ever faster discover and maintain association rules. However, one of the biggest problems of association rules still remains unresolved. Usually, the number of discovered associations will be immense, easily in the thousands or even tens of thousands. Clearly, the large numbers make rules difficult to examine by a user. Moreover, many of the discovered rules will be obvious, already known, or not relevant.
For this reason a considerable amount of methods have been proposed to assist a user in detecting the most interesting or relevant ones. Studies about interestingness measures can roughly be divided into two classes: objective and subjective measures. Objective (data-driven) measures are usually derived from statistics, information theory or machine learning and assess numerical or structural properties of a rule and the data to produce a ranking. In contrast to objective measures, subjective (user-driven) measures incorporate a user’s background knowledge and mostly rank rules based on some notion of actionability and unexpectedness.
In spite of a multitude of available publications the problem of interestingness assessment still is regarded as one of the unsolved problems in data mining and still experiencing slow progress (Piatetsky-Shapiro, 2000). The search for a general solution is one of the big challenges of today’s data mining research (Fayyad et al., 2003). Existing approaches for interestingness assessment have several shortcomings which render them inadequate for many real-world applications.
Nonetheless, objective and subjective measures both have their justification to be used within the process of interestingness assessment. Objective measures help a user to get a first impression at what has been discovered and to obtain a starting point for further exploration of the rule set. This exploration step can then be accomplished by methods for subjective interestingness assessment. Ideally, the interestingness assessment of association rules should therefore be seen as a two step process. It is clear that for this process to be optimal it is necessary that both, the calculus used for the objective and the subjective rating, are based on the same notion of interestingness. Nevertheless, most approaches for objective and subjective ratings have been developed independently from each other with no interaction in mind such that the information utilized for the objective is neglected for the subjective rating. In fact, approaches rarely do fit together.
In this article we discuss a framework which combines objective and subjective interestingness measures to a powerful tool for interestingness assessment and addresses the problems mentioned above. Our framework incorporates several concepts which only recently have been introduced to the area of interestingness assessment: rule change mining and user dynamics. In particular, we show how to analyse association rules for changes and how information about change can be used to derive meaningful and interpretable objective interestingness measures. Based on the notion of change, we discuss a novel relevance feedback approach for association rules. We relate the problem of subjective interestingness to the field of Information Retrieval where relevance estimation is a rather mature and well-researched field. By using a vector-based representation of rules and by utilizing concepts from information retrieval we provide the necessary tool set to incorporate the knowledge about change into the relevance feedback process.