Efficient Retrieval Technique for Microarray Gene Expression

Efficient Retrieval Technique for Microarray Gene Expression

J. Jacinth Salome
Copyright: © 2012 |Pages: 9
DOI: 10.4018/ijirr.2012040104
(Individual Articles)
No Current Special Offers


The DNA mciroarray gene data is in the expression levels of thousands of genes for a small amount of samples. From the microarray gene data, the process of extracting the required knowledge remains an open challenge. Acquiring knowledge is the intricacy in such types of gene data, though number of researches is arising in order to acquire information from these gene data. In order to retrieve the required information, gene classification is vital; however, the task is complex because of the data characteristics, high dimensionality and smaller sample size. Initially, the dimensionality diminution process is carried out in order to shrink the microarray data without losing information with the aid of LPP and PCA techniques and utilized for information retrieval. In this paper, we propose an effective gene retrieval technique based on LPP and PCA called LPCA. The technique like LPP and PCA is chosen for the dimensionality reduction for efficient retrieval of microarray gene data. An application of microarray gene data is included with classification by SVM. SVM is trained by the dimensionality reduced gene data for effective classification. A comparative study is made with these dimensionality reduction techniques.
Article Preview

1. Introduction

Data mining plays a vital role in twenty-first century – the information age, (Ramamohanarao, 1989). They contribute in various fields of research such as 1) information sharing and collaboration, 2) security association mining, 3) classification and clustering, 4) intelligence text mining, 5) spatial and temporal crime pattern mining, 6) criminal/terrorist network analysis and more, (Chen, 2008). Recent advances in microarray technology have enabled the measurement of the simultaneous expression of thousands of genes under multiple experimental conditions. DNA microarray technology is also the field that exploits the data mining techniques. The molecular biologists face the challenges in discovering the essential knowledge from this kind of enormous volume of data, (Slavkov, 2005). In this type of knowledge seeking applications; information retrieval is one of the most crucial technologies (Lee, 2007) to mine the required information from the enormous amount of data. Microarray techniques have been successfully used to investigate useful information for cancer diagnosis at the gene expression level, the true integration of existing methods into day-to-day clinical practice is very challenging. In the process of information retrieval in DNA microarray technology, gene classification is quite tough task, because of the characteristics of the data, which contain high dimensionality and small sample size (Leung, 2009; Alireza, 2009). While the DNA micro array technology considerably expedite the procedure of discovering the utility of genes like cancer classification. In the process of mining gene expressions under multi-conditions microarray experiments, gene clustering is another interesting task (Wai-Ho, 2005). The tools may be used for the identification of new tumor classes using gene expression profiles.

Microarray experiments normally produce a large amount of datasets with expression values for thousands of genes but still not more than a few dozens of samples, thus very exact arrangement of tissue samples in such high dimensional problems is a tricky task (Zhang, 2007). For the purpose of retrieving information from a microarray gene expression, we propose an effective retrieval technique based on LPP and PCA. Within this context, efficient retrieval emerges as a suitable paradigm specially intended for the development of biomedical informatics applications and decision support systems. As a first process in the proposed gene retrieval, the high dimensionality of the microarray gene data is reduced using dimensionality reduction technique (Changjing, 2005; Jian, 2006). The LPP is chosen for the dimensionality reduction because of its ability of preserving locality of neighborhood relationship. The SVM is trained by the dimensionality reduced gene data for effective classification. SVM has the ability to learn with very few samples and so it is selected for the proposed technique. Hence, the classification is developed with the blending of dimensionality reduced technique and SVM results in effectual and powerful classification of gene expression data. Moreover, a comparative study is made with the LPP and PCA-based gene retrieval techniques (Gunanidhi, 2009).

Complete Article List

Search this Journal:
Volume 14: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 13: 1 Issue (2023)
Volume 12: 4 Issues (2022): 3 Released, 1 Forthcoming
Volume 11: 4 Issues (2021)
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing