Multiple Criteria Optimization in Data Mining
Gang Kou (University of Electronic Science and Technology of China, China), Yi Peng (University of Electronic Science and Technology of China, China) and Yong Shi (CAS Research Center on Fictitious Economy and Data Sciences, China & U)
Copyright: © 2009
Multiple criteria optimization seeks to simultaneously optimize two or more objective functions under a set of constraints. It has a great variety of applications, ranging from financial management, energy planning, sustainable development, to aircraft design. Data mining is aim at extracting hidden and useful knowledge from large databases. Major contributors of data mining include machine learning, statistics, pattern recognition, algorithms, and database technology (Fayyad, Piatetsky-Shapiro, & Smyth, 1996). In recent years, the multiple criteria optimization research community has actively involved in the field of data mining (See, for example: Yu 1985; Bhattacharyya 2000; Francisci & Collard, 2003; Kou, Liu, Peng, Shi, Wise, & Xu, 2003; Freitas 2004; Shi, Peng, Kou, & Chen, 2005; Kou, Peng, Shi, Wise, & Xu, 2005; Kou, Peng, Shi, & Chen, 2006; Shi, Peng, Kou, & Chen, 2007). Many data mining tasks, such as classification, prediction, clustering, and model selection, can be formulated as multi-criteria optimization problems. Depending upon the nature of problems and the characteristics of datasets, different multi-criteria models can be built. Utilizing methodologies and approaches from mathematical programming, multiple criteria optimization is able to provide effective solutions to large-scale data mining problems. An additional advantage of multi-criteria programming is that it assumes no deterministic relationships between variables (Hand & Henley, 1997).
Currently, the main focuses of multiple criteria optimization in data mining include: model construction, algorithm design, and results interpretation and application.
Model construction refers to the process of establishing mathematical models for multi-criteria data mining problems, which exist in many data mining tasks. For example, in network intrusion detection, the goal is to build classifiers that can achieve not only high classification accuracy, but also low false alarm rate. Although multiple objectives can be modeled separately, they normally can not provide optimal solutions to the overall problem (Fonseca & Fleming, 1995). Furthermore, a model may perform well on one objective, but poorly on other objectives. In this kind of scenario, multiple criteria optimization can be used to build models that can optimize two or more objectives simultaneously and find solutions to satisfy users’ preferences.