Several classes of computational and statistical methods for data mining are available. Each class can be parameterised so that models within the class differ in terms of such parameters (See for instance Giudici, 2003, Hastie et al., 2001, Han and Kamber, 200, Hand et al, 2001 and Witten and Frank, 1999). For example the class of linear regression models, which differ in the number of explanatory variables; the class of bayesian networks, which differ in the number of conditional dependencies (links in the graph); the class of tree models, which differ in the number of leaves and the class multi-layer perceptrons which differ in terms of the number of hidden strata and nodes. Once a class of models has been established the problem is to choose the “best” model from it.
Main Thrust Of The Chapter
Comparison criteria for data mining models can be classified schematically into: criteria based on statistical tests, based on scoring functions, computational criteria, bayesian criteria and business criteria.