Decision rules mining is an important technique in machine learning and data mining, it has been studied intensively during the past few years. However, most existing algorithms are based on flat data tables, from which sets of decision rules mined may be very large for massive data sets. Such sets of rules are not easily understandable and really useful for users. Moreover, too many rules may lead to over-fitting. Thus, a method of decision rules mining from different abstract levels was provided in this chapter, which aims to improve the efficiency of decision rules mining by combining the hierarchical structure of multidimensional model and the techniques of rough set theory. Our algorithm for decision rules mining follows the so called separate-and-conquer strategy. Namely, certain rules were mined beginning from the most abstract level, and supporting sets of those certain rules were removed from the universe, then drill down to the next level to recursively mine other certain rules which supporting sets are included in the remaining objects until no objects remain in the universe or getting to the primitive level. So this algorithm can output some generalized rules with different degree of generalization.
Decision rules mining is an important technique in data mining and machine learning. It has been widely used in business, medicine and finance, etc. A multitude of promising algorithms of decision rules mining (Xiaohua Hu & Nick Cercone, 1997, 2001; Michiyuki Hirokane, et al. 2007; María C. Fernández, et al. 2001) have been developed during the past few years.
The aim of decision rules mining is to find a good rule set, which has a minimum number of rules and each rule should be as short as possible. We all know, each row of a decision table specifies a decision rule that determine decisions in terms of conditions. Thus, a set of decision rules mined from a decision table may be very large for a massive data set. Such a set of rules are not easily understandable and really useful for users. Moreover, too many rules may lead to over-fitting. Most existing methods to this problem follow the strategy of post-processing for those mined decision rules.
Marek Sikora and Marcin Michalak (2006) summed up that post-processing of decision rules mining can be implemented in two ways: rules generalization (rules shortening and rules joining) and rules filtering (rules that are not needed in view of certain criterion are removed from the final rules set). The technique of rules generalization is to obtain rules from detailed level first, and then use the techniques of generalization for these rules, combination or joining to reduce the number of decision rules. The technique of rules filtering is accomplished by template, interestingness or constraint to further process those rules mined from detailed level.
Most existing algorithms of decision rules mining only manipulate data at individual level, namely, they have not consider the case of multiple abstract levels for the given data set. However, in many applications, data contain structured information which is multidimensional and multilevel in nature, such as e-commerce, stocks, scientific data, etc. That is, there exists partial order relation among some attribute values in the given dataset, such as day, month, quarter and year for Time attribute. The partial order relation among attribute values for each attribute can be represented by a tree or a lattice, which constitutes a concept hierarchy. Obviously, there exists the relation of generalization or specification among concepts at different levels.
The technique of rules generalization is to obtain rules from detailed level first, and then use the techniques of generalization for these rules, combination or joining to reduce the number of decision rules. While in this chapter, we first generalize the given data set according to concept hierarchies built in advance to form a multidimensional data cube, then mine decision rules at various levels of abstraction. This not only can obtain generalized rules, but also can mine decision rules from different levels of abstraction. Of course, the algorithm in this chapter can reduce the number of mined rules greatly.
In this chapter, an approach to mine decision rules from various abstract levels is provided, which aims to improve the efficiency of decision rules mining by combining the multidimensional model (Antoaneta Ivanova & Boris Rachev, 2004) and rough set theory (Z. Pawlak, 1991; Wang Guoyin, 2001; Zhang Wenxiu et al. 2003).