Article Preview
TopOne of the main challenges in classifying gene expression data is that the number of genes is typically much higher than the number of analysed samples. Also, is it not clear which genes are important and which can be omitted without reducing the classification performance. Many pattern classification techniques have been employed to analyse microarray data. For example, Golub et al. (1999) used a weighted voting scheme, Fort and Lambert-Lacroix (2005) employed partial least squares and logistic regression techniques, whereas Furey et al. (2000) applied support vector machines. Dudoit et al. (2002) investigated nearest neighbour classifiers, discriminant analysis, classification trees and boosting, while Statnikov et al. (2005) explored several support vector machine techniques, nearest neighbour classifiers, neural networks and probabilistic neural networks. In several of these studies it has been found that no one classification algorithm is performing best on all datasets (although for several datasets SVMs seem to perform best) and that hence the exploration of several classifiers is useful. Similarly, no universally ideal gene selection method has yet been found as several studies (Liu, Li, & Wong, 2002, Statnikov et al., 2005) have shown.
Several authors have used fuzzy logic to analyse gene expression data before. Woolf and Wang (2000) used fuzzy rules to explore the relationships between several genes of a profile while Vinterbo, Kim, and Ohno-Machado (2005) used fuzzy rule bases to classify gene expression data. However, Vinterbo et al.’s method has the disadvantage that it allows only linear discrimination and that they describe each gene by only 2 fuzzy partitions (‘up’ and ‘down’). In Schaefer et al. (2007), Schaefer & Nakashima (2010), we presented a fuzzy rule-based classification system to analyse microarray expression data. Gene expression data was described by fuzzy sets and rules of combinations of these sets employed to arrive at a classification. In (Schaefer & Nakashima (2010), we derived a more compact rule base using a GA that assesses the fitness of individual rules and selects a rule ensemble that maximises classification performance.