De-Noising, Clustering, Classification, and Representation of Microarray Data for Disease Diagnostics

De-Noising, Clustering, Classification, and Representation of Microarray Data for Disease Diagnostics

Nitin Baharadwaj (Netaji Subhas Institute of Technology, India), Sheena Wadhwa (Netaji Subhas Institute of Technology, India), Pragya Goel (Netaji Subhas Institute of Technology, India), Isha Sethi (Netaji Subhas Institute of Technology, India), Chanpreet Singh Arora (Netaji Subhas Institute of Technology, India), Aviral Goel (Netaji Subhas Institute of Technology, India), Sonika Bhatnagar (Netaji Subhas Institute of Technology, India) and Harish Parthasarathy (Netaji Subhas Institute of Technology, India)
DOI: 10.4018/978-1-4666-4558-5.ch009
OnDemand PDF Download:
No Current Special Offers


A microarray works by exploiting the ability of a given mRNA molecule to bind specifically to the DNA template from which it originated under specific high stringency conditions. After this, the amount of mRNA bound to each DNA site on the array is determined, which represents the expression level of each gene. Qualification of the mRNA (probe) bound to each DNA spot (target) can help us to determine which genes are active or responsible for the current state of the cell. The probe target hybridization is usually detected and quantified using dyes/flurophore/chemiluminescence labels. The microarray data gives a single snapshot of the gene activity profile of a cell at any given time. Microarray data helps to elucidate the various genes involved in the disease and may also be used for diagnosis /prognosis. In spite of its huge potential, microarray data interpretation and use is limited by its error prone nature, the sheer size of the data and the subjectivity of the analysis. Initially, we describe the use of several techniques to develop a pre-processing methodology for denoising microarray data using signal process techniques. The noise free data thus obtained is more suitable for classification of the data as well as for mining useful information from the data. Discrete Fourier Transform (DFT) and Autocorrelation were explored for denoising the data. We also used microarray data to develop the use of microarray data as diagnostic tool in cancer using One Dimensional Fourier Transform followed by simple Euclidean Distance Calculations and Two Dimensional MUltiple SIgnal Classification (MUSIC). To improve the accuracy of the diagnostic tool, Volterra series were used to model the nonlinear behavior of the data. Thus, our efforts at denoising, representation, and classification of microarray data with signal processing techniques show that appreciable results could be attained even with the most basic techniques. To develop a method to search for a gene signature, we used a combination of PCA and density based clustering for inferring the gene signature of Parkinson’s disease. Using this technique in conjunction with gene ontology data, it was possible to obtain a signature comprising of 21 genes, which were then validated by their involvement in known Parkinson’s disease pathways. The methodology described can be further developed to yield future biomarkers for early Parkinson’s disease diagnosis, as well as for drug development.
Chapter Preview


DNA Microarrays

As a result of the rise in the number of completed genome sequencing projects, the DNA sequence present in a large number of genomes has become available. The central dogma of modern biology takes into account that the information present in the DNA is read to manufacture mRNA or messenger RNA. This, in turn, is read by the cellular machinery to manufacture proteins. The first step of this process is known as transcription (i.e., decoding of the DNA to synthesize mRNA while the second is translation or decoding of mRNA to produce functional proteins). Such a DNA fragment whose information is converted into a mRNA or a protein molecule is termed as a gene. Experimental and computational tools have helped us elucidate the presence of protein coding genes and thus map the vast, unexplored length of DNA. While all cells of the body contain the full set of DNA material (and genes), it is the subset that is transcribed or expressed that confers on the cell its unique properties. These genes are responsible for the response of the cell to its environment in both health and disease. Thus, the gene transcription profile of a cell gives an important insight into the type of genes that are transcribed under any physiological and pathological condition. The gene transcription profile reflects the response of the cell to its environment, a highly and tightly regulated process. Such a gene transcription profile can be recorded using a microarray.

The microarray capitalizes on the ability of a given mRNA molecule to bind specifically or hybridize to the DNA template from which it originated. Thus, in a microarray experiment, probes consisting of DNA are attached to a solid surface by covalent bonding to a chemical matrix like epoxy-silane, lysine, polyacrylamide, and so forth. The solid surface can be a glass or a silicon chip, commonly known as a gene chip. A gene chip consists of thousands of microscopic spots, containing a small but specific DNA sequence. In order to measure gene expression data, short section of genes are anchored to the solid surface and used as probes to detect the presence of their templates (targets) in a biological sample of interest. The target has the ability to hybridize specifically with its complementary probe mRNA under high stringency experimental conditions. Hybridization can be detected and quantified by labeling the targets with fluorophore or chemiluminescent dye materials.

The state of a disease affects the cell and changes the number/type of genes activated/ expression. This is termed as differential expression. To study the differential expression pattern of genes, mRNA is isolated from both diseased and normal samples. The mRNA is used as template to generate cDNA with a fluorescent tag attached. Use of tags of distinct colors facilitates the identification of diseased and normal samples in later stages. The two types of samples are mixed and incubated with a microarray having immobilized genes of interest. The labeled complements of these immobilized genes hybridize with them. Next, the microarray chip is placed in a reader. The samples are scanned with lasers in order to excite the fluorescent labeled tags. Upon hybridization of the probe with the target, the dye is excited with the help of laser of specific wavelength and emits a signal that can be recorded. The intensity of the signal varies proportionally with the relative abundance of the nucleic acid sequences in the target. Thus, an array containing many DNA probes can be used to determine the expression level of thousands of genes within a cell by measuring the mRNA bound to each site on the array as shown in Figure 1. The mRNA is precisely quantitated and reflects the level of activity of a gene under the conditions studied. The resulting digital image is recorded and stored in a computer for record and analysis. Since each spot in the array is associated with a specific gene, color development at each location gives information about whether the gene is present in the control or sample DNA. The intensity of the color provides an estimate of the level of expression of the gene (Deonier et al., 2005).

Figure 1.

Schematic flowchart showing the organization of a microarray experiment


Complete Chapter List

Search this Book: