The problem differs with the number of variables. In the sequel, problems with two, three, or more variables are discussed.
The link between two variables (A and B) depends on the coding. The outcome of statistics is better when data are quantitative. A current model is linear regression. For instance, the salary (S) of a worker can be expressed by the following equation:S = 100 Y + 20000 + ε (1) where Y is the number of years in the company, and ε is a random number. This model means that the salary of a newcomer in the company is $20,000 and increases by $100 per year.
The association rule for this model is: Y→S. This means that there are a few senior workers with a small paycheck. For this, the variables are translated into binary variables. Y is not the number of years, but the property has seniority, which is not quantitative but of type Yes/No. The same transformation is applied to the salary S, which becomes the property “has a big salary.”
Therefore, these two methods both provide the link between the two variables and have their own instruments for measuring the quality of the link. For statistics, there are the tests of regression model (Baillargeon, 1996), and for association rules, there are measures like support, confidence, and so forth (Kodratoff, 2001). But, depending on the type of data, one model is more appropriate than the other (Figure 1).