Association rules present one of the most versatile techniques for the analysis of binary data, with applications in areas as diverse as retail, bioinformatics, and sociology. In this chapter, the origin of association rules is discussed along with the functions by which association rules are traditionally characterised. Following the formal definition of an association rule, these functions – support, confidence and lift – are defined and various methods of rule generation are presented, spanning 15 years of development. There is some discussion about negations and negative association rules and an analogy between association rules and 2×2 tables is outlined. Pruning methods are discussed, followed by an overview of measures of interestingness. Finally, the post-mining stage of the association rule paradigm is put in the context of the preceding stages of the mining process.
In general, association rules present an efficient method of analysing very large binary, or discretized, data sets. One common application is to discover relationships between binary variables in transaction databases, and this type of analysis is called a ‘market basket analysis’. While association rules have been used to analyse non-binary data, such analyses typically involve the data being coded as binary before proceeding. Association rules present one of the most versatile methods of analysing large binary datasets; recent applications have ranged from detection of bio-terrorist attacks (Fienberg and Shmeeli, 2005) and the analysis of gene expression data (Carmona-Saez et al., 2006), to the analysis of Irish third level education applications (McNicholas, 2007).
There are two or three steps involved in a typical association rule analysis: (Box 1)Box 1.
Steps involved in a typical association rule analysis
Coding of data as binary (if data is not binary)
This book focuses on the third step, post-mining, and the purpose of this chapter is to set the scene for this focus. This chapter begins with a look back towards the foundations of thought on the association of attributes; the idea of an association rule is then introduced, followed by discussion about rule generation. Finally, there is a broad review of pruning and interestingness.Top
Although formally introduced towards the end of the twentieth century (Agrawal et al. 1993), many of the ideas behind association rules can be seen in the literature over a century earlier. Yule (1903) wrote about associations between attributes and, in doing so, he built upon the earlier writings of De Mogran (1847), Boole (1847, 1854) and Jevons (1890). Although the premise was the analysis of non-binary data that is converted into binary data, Yule (1903) raised many of the issues that are now central to association rule analysis. Furthermore, Yule (1903) built the idea of not possessing an attribute into his paradigm, from the outset, as an important concept. Yet, it was several years after the introduction of association rules before such issues were seriously considered and it is still the case that the absence of items from rules is often ignored in analyses. We will revisit this issue later in this chapter, when discussing negative association rules and negations.