A Hybrid Classification Algorithm and Its Application on Four Real-World Data Sets

A Hybrid Classification Algorithm and Its Application on Four Real-World Data Sets

Lamiaa M. El bakrawy, Abeer S. Desuky
DOI: 10.4018/978-1-6684-5656-9.ch006
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The aim of this chapter is to propose a hybrid classification algorithm based on particle swarm optimization (PSO) to enhance the generalization performance of the adaptive boosting (AdaBoost) algorithm. AdaBoost enhances any given machine learning algorithm performance by producing some weak classifiers which requires more time and memory and may not give the best classification accuracy. For this purpose, PSO is proposed as a post optimization procedure for the resulted weak classifiers and removes the redundant classifiers. The experiments were conducted on the basis of ionosphere data set, thoracic surgery data set, blood transfusion service center data set (btsc) and Statlog (Australian credit approval) data set. The experimental results show that a given boosted classifier with post optimization based on PSO improves the classification accuracy for all used data. Also, the experiments show that the proposed algorithm outperforms other techniques with best generalization.
Chapter Preview
Top

Introduction

Nowadays there is tremendous amount of data being collected and stored in databases everywhere across our realm. It is easy now to find databases with Terabytes - about 1,099,511,627,776 bytes - of data in enterprises and research fields. Numerous invaluable information and knowledge is buried in such databases; and without facile methods for extracting this buried information it is practically impossible to mine for them. Many algorithms were created throughout the decades for extracting what is called nuggets of knowledge from large sets of data. There are several diverse methodologies to approach this problem: classification, clustering, association rule, etc. our chapter will focus on classification (Sanakal & Jayakumari, 2014; Witten & Frank, 2005).

Classification is one of the most frequently studied problems by data mining and machine learning researchers (Sanakal & Jayakumari, 2014). Classification consists of predicting a certain outcome based on a given input. A classifier is a function or an algorithm that maps every possible input (from a legal set of inputs) to a finite set of classes or categories (Dhande & Dandekar, 2011). Adaptive boosting (AdaBoost) (Freund & Schapire, 1996) is a widespread successful technique used to boost the classification performance of weak learner. (Hu et al., 2014) proposed two algorithms based on AdaBoost classifier for online intrusion detection. They used the traditional AdaBoost where decision stumps are used as weak classifiers in the first algorithm. In the second algorithm, online Gaussian mixture models (GMMs) are used as weak classifiers to improve online AdaBoost process. The second algorithm showed a better performance in the experiments than the traditional AdaBoost process that uses decision stumps. Another improved AdaBoost algorithm named (ISABoost) proposed by (Qian et al., 2013) and applied in scene categorization. In ISABoost the inner structure of each trained weak classifier is adjusted before the traditional weights determination process. ISABoost algorithm after inner structure adjusting in each iteration of AdaBoost learning selects an optimal weak classifier and determines its weight. Three scene data sets used in Comparisons of ISABoost and traditional AdaBoost algorithms, where Back-propagation net- works and SVM are served as weak classifiers, and ISABoost verified its effectiveness.

(Choi et al., 2012) presented a novel multiple classifier system termed: “classifier ensemble”- based on AdaBoost for tackling false-positive (FP) reduction problem in Computer- aided Detection (CADe) systems, especially of mass abnormalities on Mammograms. Different feature representations were combined with data resampling based on AdaBoost learning to create the “classifier ensemble”. Adjusting the size of a resampled set is the effective mechanism used by classifier ensemble to regulate the degree of weakness of the weak classifiers of conventional AdaBoost ensemble. Support vector machines (SVM) and neural network (NN) with back-propagation algorithm were used as base classifiers and applied on digital database for screening mammography (DDSM) DB. The area under the receiver operating characteristics (ROC) was the used criterion to evaluate the classification performance and the comparative results showed the potential clinical effectiveness of the proposed ensemble. As the AdaBoost approach produces a large number of weak classifiers, particle swarm optimization (PSO) (Kennedy & Eberhart, 1995) has the potential to automatically elect a good set of weak classifiers for AdaBoost and improve the algorithm performance. Our goal is to optimize the AdaBoost algorithm performance using Particle Swarm Optimization technique.

Key Terms in this Chapter

Dataset: A set (collection) of related (discrete) items that may be managed independently or in groups or accessed as a whole entity.

UCI Machine Learning Repository: A set (collection) of domain theories, databases and data generators which are utilized for experimental analysis of machine learning algorithms by machine learning researchers.

Data Mining: The computer-assisted process of analyzing dense volumes of data and extracting interesting and useful information.

Accuracy: Refers to the degree of closeness between a measurement and its accepted or true value.

Machine Learning: The scientific study of using computer systems and improving it to be able to learn and adapt without using an explicit instruction.

Fitness Function: A function that takes an elected solution to the problem as an input and yields as an output how the solution is fit.

Optimization Algorithms: Search techniques that are used to find an optimal solution to an optimization problem, possibly subject to a set of constraints.

Complete Chapter List

Search this Book:
Reset