Article Preview
Top1. Introduction
Data mining plays a vital role in twenty-first century – the information age, (Ramamohanarao, 1989). They contribute in various fields of research such as 1) information sharing and collaboration, 2) security association mining, 3) classification and clustering, 4) intelligence text mining, 5) spatial and temporal crime pattern mining, 6) criminal/terrorist network analysis and more, (Chen, 2008). Recent advances in microarray technology have enabled the measurement of the simultaneous expression of thousands of genes under multiple experimental conditions. DNA microarray technology is also the field that exploits the data mining techniques. The molecular biologists face the challenges in discovering the essential knowledge from this kind of enormous volume of data, (Slavkov, 2005). In this type of knowledge seeking applications; information retrieval is one of the most crucial technologies (Lee, 2007) to mine the required information from the enormous amount of data. Microarray techniques have been successfully used to investigate useful information for cancer diagnosis at the gene expression level, the true integration of existing methods into day-to-day clinical practice is very challenging. In the process of information retrieval in DNA microarray technology, gene classification is quite tough task, because of the characteristics of the data, which contain high dimensionality and small sample size (Leung, 2009; Alireza, 2009). While the DNA micro array technology considerably expedite the procedure of discovering the utility of genes like cancer classification. In the process of mining gene expressions under multi-conditions microarray experiments, gene clustering is another interesting task (Wai-Ho, 2005). The tools may be used for the identification of new tumor classes using gene expression profiles.
Microarray experiments normally produce a large amount of datasets with expression values for thousands of genes but still not more than a few dozens of samples, thus very exact arrangement of tissue samples in such high dimensional problems is a tricky task (Zhang, 2007). For the purpose of retrieving information from a microarray gene expression, we propose an effective retrieval technique based on LPP and PCA. Within this context, efficient retrieval emerges as a suitable paradigm specially intended for the development of biomedical informatics applications and decision support systems. As a first process in the proposed gene retrieval, the high dimensionality of the microarray gene data is reduced using dimensionality reduction technique (Changjing, 2005; Jian, 2006). The LPP is chosen for the dimensionality reduction because of its ability of preserving locality of neighborhood relationship. The SVM is trained by the dimensionality reduced gene data for effective classification. SVM has the ability to learn with very few samples and so it is selected for the proposed technique. Hence, the classification is developed with the blending of dimensionality reduced technique and SVM results in effectual and powerful classification of gene expression data. Moreover, a comparative study is made with the LPP and PCA-based gene retrieval techniques (Gunanidhi, 2009).