A Combined GA-Fuzzy Classification System for Mining Gene Expression Databases

A Combined GA-Fuzzy Classification System for Mining Gene Expression Databases

Gerald Schaefer, Tomoharu Nakashima
DOI: 10.4018/978-1-60566-814-7.ch006
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Microarray studies and gene expression analysis have received significant attention over the last few years and provide many promising avenues towards the understanding of fundamental questions in biology and medicine. In this chapter, the authors show that a combined GA-fuzzy classification system can be employed for effective mining of gene expression data. The applied classifier consists of a set of fuzzy if-then rules that allow for accurate non-linear classification of input patterns. A small number of fuzzy if-then rules are selected through means of a genetic algorithm, and are capable of providing a compact classifier for gene expression analysis. Experimental results on various well-known gene expression datasets confirm good classification performance of our approach.
Chapter Preview
Top

Introduction

Microarray expression studies use a hybridisation process to measure the levels of genes expressed in biological samples. Knowledge gained from these studies is deemed increasingly important as it will contribute to the understanding of fundamental questions in biology and clinical medicine. Microarray experiments can either monitor each gene several times under varying conditions, or can be used to analyse genes in the same environment but in different types of tissue. In this chapter, we focus on the latter, that is on the classification of the recorded samples. This classification can be used to either categorise different types of cancerous tissues as in (Golub et al., 1999) where different types of leukemia are identified, or to discriminate cancerous tissue from normal tissue as done in (Alon et al., 1999) where tumor and normal colon tissues are analysed.

One of the main challenges in classifying gene expression data is that the number of genes is typically much higher than the number of analysed samples. Furthermore, it is difficult to determine which of the genes are important and which can be omitted without reducing the classification performance. Many pattern classification techniques have been employed to analyse microarray data. For example, Golub et al. (1999) used a weighted voting scheme, Fort and Lambert-Lacroix (2005) employed partial least squares and logistic regression techniques, whereas Furey et al. (2000) applied support vector machines (SVMs). Dudoit et al. (Dudoit, Fridlyand, & Speed, 2002) investigated nearest neighbour classifiers, discriminant analysis, classification trees and boosting, while Statnikov et al. (2005) explored several support vector machine techniques, nearest neighbour classifiers, neural networks and probabilistic neural networks. In several of these studies it has been found that no single classification algorithm is performing best on all datasets (although for several datasets SVMs seem to perform best) and that hence the exploration of several classifiers is useful. Similarly, no universally ideal gene selection method has yet been found as several studies (Liu, Li, & Wong, 2002; Statnikov et al., 2005) have shown.

In this chapter we apply a hybrid GA-fuzzy classification scheme to analyse microarray expression data. Our classifier consists of a set of fuzzy if-then rules that allow for accurate non-linear classification of input patterns. A small number of fuzzy if-then rules are then selected by means of a genetic algorithm to arrive at a compact yet effective rule base. Experimental results on several gene expression datasets show that this approach affords classification performance comparable with that of a fuzzy classifier (Schaefer et al., 2007) where a much larger rule base is used.

Top

Fuzzy Rule-Based Classification

Pattern classification is a process where, based on a set of training samples with known classifications, a classifier is derived that performs automatic assignment to classes based on unseen data. Let us assume that our pattern classification problem is an n-dimensional problem with C classes (in microarray analysis C is often 2) and m given training patterns 978-1-60566-814-7.ch006.m01, p=1,2,...,m. Without loss of generality, we assume each attribute of the given training patterns to be normalised into the unit interval [0,1]; the pattern space is hence an n-dimensional unit hypercube 978-1-60566-814-7.ch006.m02. In this study we use fuzzy if-then rules of the following type as a base of our classification systems:

978-1-60566-814-7.ch006.m03
(1) where 978-1-60566-814-7.ch006.m04 is the label of the j-th fuzzy if-then rule, 978-1-60566-814-7.ch006.m05 are antecedent fuzzy sets on the unit interval [0,1], 978-1-60566-814-7.ch006.m06 is the consequent class (i.e., one of the C given classes), and 978-1-60566-814-7.ch006.m07 is the grade of certainty of the fuzzy if-then rule 978-1-60566-814-7.ch006.m08. As antecedent fuzzy sets we use triangular fuzzy sets as in Figure 1 where we show the partitioning into a number of fuzzy sets.

Complete Chapter List

Search this Book:
Reset