Methods for Gene Selection and Classification of Microarray Dataset

Methods for Gene Selection and Classification of Microarray Dataset

Mekour Norreddine (Dr. Tahar Moulay University of Saida, Algeria)
DOI: 10.4018/978-1-5225-3004-6.ch004


One of the problems that gene expression data resolved is feature selection. There is an important process for choosing which features are important for prediction; there are two general approaches for feature selection: filter approach and wrapper approach. In this chapter, the authors combine the filter approach with method ranked information gain and wrapper approach with a searching method of the genetic algorithm. The authors evaluate their approach on two data sets of gene expression data: Leukemia, and the Central Nervous System. The classifier Decision tree (C4.5) is used for improving the classification performance.
Chapter Preview

Microaray Data Format

A gene expression data set from a microarray experiment can be represented by a real-valued.Expression matrix = { G(i,j) | 1≤ i≤ n,1≤ j ≤ m }where the columns G=, ,...., form the expression patterns of genes, the rows S=, ,...., . An example of a gene expression microarray dataset for Leukemia is shown (in Table 1). the table organizes data into m columns (genes) and n rows (samples) where m mostly varies from thousand to hundred thousand according to the accuracy of microarray image processing technique, while n is always less than 200 samples according to the previously collected datasets (Zeeshan et al., 2014a). Category column presents the actual class of the sample. For the shown example AML stands for acute myeloid leukemia disease and ALL represents acute lymphoblastic.

Complete Chapter List

Search this Book: