Model Free Data Mining
Can Yang (Zhejiang University, Hangzhou, P. R. China), Jun Meng (Zhejiang University, Hangzhou, P. R. China), Shanan Zhu (Zhejiang University, Hangzhou, P. R. China) and Mingwei Dai (Xi’an Jiao Tong University, Xi’an, P. R. China)
Copyright: © 2008
Input selection is a crucial step for nonlinear regression modeling problem, which contributes to build an interpretable model with less computation. Most of the available methods are model-based, and few of them are model-free. Model-based methods often make use of prediction error or sensitivity analysis for input selection and Model-free methods exploit consistency. In this paper, we show the underlying relationship between sensitivity analysis and consistency analysis for input selection, and then derive an efficient model-free method from our common sense, and then formulate this common sense by fuzzy logic, thus it can be called Fuzzy Consistency Analysis (FCA). In contrast to available methods, FCA has the following desirable properties: 1) it is a model-free method so that it will not be biased on a specific model, exploiting “what the data say” rather than “what the model say”, which is the essential point of data mining – input selection should not be biased on a specific model. 2) it is implemented as efficiently as classical model-free methods, but more flexible than them. 3) it can be directly applied to a data set with mix continuous and discrete inputs without doing rotation. Four benchmark problems study indicates that the proposed method works effectively for nonlinear problems. With the input selection procedure, the underlying reasons which effect the prediction are work out, which helps to gain an insight into a specific problem and servers the purpose of data mining very well.