Article Preview
Top1. Introduction
Classification is a relatively old problem that has been widely studied in areas such as machine learning, pattern recognition, data mining, and artificial intelligence. Classification problems can be defined as follows: given a dataset, a given dataset is called a training dataset (Jiawei & Kamber 2001; Zhongzhi, 2002). The training dataset consists of a set of database tuples (often referred to as training samples, instances, or objects), each training sample is a feature vector consisting of attribute values or eigenvalues, and each training sample also has a class label attribute. A specific sample form can be expressed as: ; where represents the attribute value and represents the class label. A given training dataset is used to build a classification function (often referred to as a classification model or classifier (Huajun & Yinkui, 2003), and the established classifier is used to predict the classes of tuples of data with unknown class numbers in the database.
As an age-old problem, taxonomy has been extensively studied in many fields. So far, the classical classification methods that have been studied mainly include: decision tree method, which is a Supervised learning technique that can be used for both classification and Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome (classical decision tree algorithm mainly includes: ID3 algorithm, C4.5 algorithm and CART algorithm, etc.), neural network method (BP algorithm), genetic algorithm (GABIL system), Bayesian classification, K-nearest neighbor algorithm, case-based inference and support vector machine.