Gene Expression Dataset Classification Using Artificial Neural Network and Clustering-Based Feature Selection

Gene Expression Dataset Classification Using Artificial Neural Network and Clustering-Based Feature Selection

Audu Musa Mabu, Rajesh Prasad, Raghav Yadav
Copyright: © 2020 |Pages: 22
DOI: 10.4018/IJSIR.2020010104
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

With the progression of bioinformatics, applications of GE profiles on cancer diagnosis along with classification have become an intriguing subject in the bioinformatics field. It holds numerous genes with few samples that make it arduous to examine and process. A novel strategy aimed at the classification of GE dataset as well as clustering-centered feature selection is proposed in the paper. The proposed technique first preprocesses the dataset using normalization, and later, feature selection was accomplished with the assistance of feature clustering support vector machine (FCSVM). It has two phases, gene clustering and gene representation. To make the chose top-positioned features worthy for classification, feature reduction is performed by utilizing SVM-recursive feature elimination (SVM-RFE) algorithm. Finally, the feature-reduced data set was classified using artificial neural network (ANN) classifier. When compared with some recent swarm intelligence feature reduction approach, FCSVM-ANN showed an elegant performance.
Article Preview
Top

1. Introduction

A huge quantity of data generation driven the progression of numerous complex strategies and tools aimed at visualization and scrutiny of information. These tremendous measures of data, especially aimed at the biological examination along with explanations, are made accessible by microarray technology Kohbalan, et al., (2013). The microarray technology advent has profited research workers in directing extensive experiments on chiliads of genes via scrutinizing the difference of communications amongst genes Muhammad, (2017). Actually, just few genes are exceptionally connected to a similar example classes. Those genes are alluded to as the information gene. These enclose the samples’ classification information Jiang, Xie, et al., (2013). Numerous cases have been established that extensive observing of GE through microarrays is the utmost propitious strategies to enhance medicinal diagnostics in addition to functional genomics studies Muhammad, (2017). In the uprightness of gene microarray examination, precise categorization of tumor subtypes might progress toward becoming reality, taking into consideration particular treatment that amplifies efficacy, further, limits toxicity Liu, et al., (2007).

Microarray technologies as of late have initiated numerous chances to explore cancer utilizing gene expressions. The essential onus of a microarray data analysis stands to decide a computational model as of specified microarray data which foresee the type of the specified unidentified examples. The accuracy, value, and also strength are imperative components of microarray analysis Hala, et al., (2014). The tumor diagnosis along with classification of GE data stands as a two interesting topics recently. As it may be, GE data contains a chiliads of genes with few samples that makes it tough to examine and process. In addition, it is linearly indivisible, noisy besides being imbalanced Huijuan, et al., (2017). In the preceding decade, a few endeavors are dutiful to the improvement of classification techniques for higher-dimensional GE data started by means of microarray experiments Carlotta and Carlo, (2013). It is obvious that K-means is the most popular clustering algorithm, but can only generate local optimal solution. Swarm optimization clustering algorithms are more advantageous as they perform a globalized search over entire search space. A PSO+K-means algorithm has the ability to search globally, thereby enhancing fast convergence than using conventional K-means algorithm alone. It is promising to generate multi-objective PSO based K-means clustering algorithm that has the ability to cluster both genes and samples simultaneously for GE data Cui and Potok, (2005). The categorization of diverse tumor sorts in GE data is of extraordinary significance in cancer analysis besides drug discovery. Nevertheless, it is intricate attributable to its enormous size. There are many of techniques attainable to assess gene expression profiles. A general trait for these means is picking a subset of genes which is extremely instructive aimed at classification process furthermore to decrease the dimensionality issue of profiles Udhaya, et al., (2014). Dimensionality reduction is especially applicable in bio-informatics research, especially with regards to microarray data, described by moderately little samples in a high-dimensional gene (feature) spaces. Unrelated genes (features) prompt deficient classification accuracy and furthermore include additional troubles in discovering possibly valuable information Amit, et al., (2014).

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 3 Issues (2023)
Volume 13: 4 Issues (2022)
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing