Article Preview
Top1. Introduction
Most firms hold many data in their databases about the products and the customers buying those products. The problem with the data preserved into the databases is that they are hardly utilized to improve the business prospects. To improve their business prospects, firms have to make or sell the products with greater efficiency which means they have to know which products are preferred by the customers and which are not. However, it is not feasible to directly contact the customers to get some information about their needs and preferences. To understand their customers’ needs, most of the firms have to analyze the databases that hold the customers’ past purchase information.
In order to analyze these data sets and get valid, novel and potentially useful information, the data mining community has developed various analysis and data mining tools. These have been used to operate on a large amount of data to discover useful information, patterns and relationships to help make a decision. There are many kinds of data mining techniques depending on the dataset and the purpose of the study. In this paper, we will implement classification by supervised learning.
Classification is one of the most useful data mining techniques. It is a predictive data mining technique, making prediction about unknown data using known results found from different data (Bhardwaj, & Pal, 2011). The task of any supervised learner is typically to build a model that both fits a number of training examples (training data) and predicts the class into which new or unseen data falls, where the class of the instance may not be known (test data). Some of the well-known supervised learning algorithms include decision trees (Murthy, 1998; Gama, & Brazdil, 1999), probabilistic and statistical learning techniques, such as Naive Bayes classifiers (Bouckaert, 2004), Bayesian networks (Jensen, 1996; Cheng, & Greiner, 2001), support vector machines (Cristianini, & Shawe-Taylor, 2000), etc. A more detailed review of various classification techniques can be found in Kotsiantis, 2007).
The state-of-the-art technology to implement supervised learning is Artificial Neural Networks (ANN) (Carpenter, & Grossberg, 1992; Yamamoto, & Nikiforuk, 2000; Tang, Wang, Tamura, & Ishii, 2003; Khoshgoftaar, Van Hulse, & Napolitano, 2010). ANN consists of interconnected layers of neurons. Adjusting the connection weights so as to produce the correct output for a given input is called “learning”. The conventional algorithm is the Back Propagation Algorithm (Du, Hou, & Li, 1992; Hsin, Li, Sun, & Sclabassi, 1995; Lee, Yang, & Ho, 2006) which systematically varies the weights of the connections by propagating the magnitude of error in the backward direction from the output to the input in the form of a feedback. However, this algorithm is found to be slow in convergence and is sensitive to the initial weights that are usually chosen randomly. Recently, Evolutionary Algorithms (EA) have been widely used in optimizing diverse engineering and business domain problems. The EA optimization being a meta-heuristic strategy can, in principle, be applied to any optimization problem since it does not exploit the problem domain heuristics. Moreover, being a population-based parallel search, it is robust and rapid in convergence. Recently, they have been used in training ANN (Abass, 2001; Xiao, Wu,Yang, 2001; Siu, Yang, Lee, & Ho, 2007).