One of the most active areas of research in supervised machine learning has been to study methods for constructing good ensembles of classifiers. The main discovery is that the ensemble classifier often performs much better than single classifiers that make them up. Recent researches (Dettling, 2004, Tan & Gilbert, 2003) have confirmed the utility of ensemble machine learning algorithms for gene expression analysis. The motivation of this work is to investigate a suitable machine learning algorithm for classification and prediction on gene expression data. The research starts with analyzing the behavior and weaknesses of three popular ensemble machine learning methods—Bagging, Boosting, and Arcing—followed by presentation of a new ensemble machine learning algorithm. The proposed method is evaluated with the existing ensemble machine learning algorithms over 12 gene expression datasets (Alon et al., 1999; Armstrong et al., 2002; Ash et al., 2000; Catherine et al., 2003; Dinesh et al., 2002; Gavin et al., 2002; Golub et al., 1999; Scott et al., 2002; van ’t Veer et al., 2002; Yeoh et al., 2002; Zembutsu et al., 2002). The experimental results show that the proposed algorithm greatly outperforms existing methods, achieving high accuracy in classification. The outline of this chapter is as follows: Ensemble machine learning approach and three popular ensembles (i.e., Bagging, Boosting, and Arcing) are introduced first in the Background section; second, the analyses on existing ensembles, details of the proposed algorithm, and experimental results are presented in Method section, followed by discussions on the future trends and conclusion.
Key Terms in this Chapter
Attributes: A property of an instance that may be used to determine its classification.
Classifier: A set of patterns and rules to assign a class to new examples.
Base Model: The model generated by the base learner. A base classifier.
Supervised Machine Learning: Machine learning algorithms require training datasets with prior knowledge of class labels.
Base Learner: The algorithm used for building base classifiers (e.g, decision tree).
Gene Expression Data: A typical gene expression array is made up of x columns of sample tissues, whereas each sample tissue contains y number of genes values. Every sample has its own individual class label, which represents the condition it belongs to.
Machine Learning (Mitchell, 1997): A computer system is said to learn from some experience E with respect to some class of tasks T and performance measure P, if it improves its performance (as measured by P) at tasks in T after passing the experience E