Wolf-Swarm Colony for Signature Gene Selection Using Weighted Objective Method

Wolf-Swarm Colony for Signature Gene Selection Using Weighted Objective Method

Prativa Agarwalla (Heritage Institute of Technology, Kolkata, India) and Sumitra Mukhopadhyay (Institute of Radio Physics & Electronics, India)
Copyright: © 2019 |Pages: 26
DOI: 10.4018/978-1-5225-5852-1.ch007


Microarray study has a huge impact on the proper detection and classification of cancer, as it analyzes the changes in expression level of genes which are strongly associated with cancer. In this chapter, a new weighted objective wolf-swarm colony optimization (WOWSC) technique is proposed for the selection of significant and informative genes from the cancer dataset. To extract the relevant genes from datasets, WOWSC utilizes four different objective functions in a weighted manner. Experimental analysis shows that the proposed methodology is very efficient in obtaining differential and biologically relevant genes which are effective for the classification of disease. The technique is able to generate a good subset of genes which offers more useful insight to the gene-disease association.
Chapter Preview


Cancer is a heterogeneous disease which has different stages, classes and subtypes. Early prediction of subtypes and detection of advancement rate of disease can improve the mortality rate and also vital for the course of treatment. In biological terms, cancer can be defined as uncontrolled growth of certain cells due to changes in expression of genes in molecular level. For proper understanding of the disease and categorizing it into different classes, investigation of the changes in genetic expression level is necessary. Selection of relevant genes involved in tumor progression is very essential for the proper medical diagnosis as well as for drug target prediction. Gene expression data (Zhang, Kuljis & Liu, 2008) has a huge impact on the study of cancer classification and identification. It includes the expression levels of thousands of genes, collected from various samples. The expression of a gene in a carcinogenic cell is compared with the expression in normal cell and then through proper analysis microarray gene expression dataset is formed. Proper analysis of the dataset is required as it contains the information regarding the abnormal behavior of a disease gene. But, the high dimensionality of gene microarray datasets makes it challenging to examine and extracting important feature genes from it. Again, the availability of larger number of genes compared to the small number of samples can cause the overfitting issue for classification of samples. Also, the presence of noise and the heterogeneous nature of dataset cause problem in the task of informative feature extraction. It motivates the researchers to apply various statistical and learning based techniques for realizing the useful information content of the dataset. The importance of classifying cancer and appropriate diagnosis of advancement of the disease using those feature genes has led to many research fields, from biomedical to the application of machine learning (ML) methods. The ability of machine learning approaches to detect key features from a huge complex dataset reveals their importance in the field of feature selection from datasets as well as the ability to examine big data framework. So, the modelling of cancer progression and classification of disease by investigating large microarray datasets can be studied by employing learning-based approaches.

Key Terms in this Chapter

Cancer: It is a collection of disease due to abnormal proliferation of cell.

Swarm Algorithm: A set of meta-heuristic, population-based optimization techniques that uses nature inspired processes.

Classification: It is a process to categorize the objects so that they can be differentiated from others.

DNA Microarray: It is the gene expression level of thousand genes collected from different samples in a single microscopic chip.

Feature Selection: It is a machine learning technique which is used for selecting redundant subset of feature or attributes from a huge dataset.

Complete Chapter List

Search this Book: