Microarray technology allows the comprehensive measurement of the expression level of many genes simultaneously on a common substrate. Typical applications of microarrays include the quantification of expression profiles of a system under different experimental conditions, or expression profile comparisons of two systems for one or more conditions. Microarray image analysis is a crucial step in the analysis of microarray data. In this chapter an extensive overview of the segmentation of the microarray image is presented. Methods already presented in the literature are classified into two main categories:methods which are based on image processing techniques and those which are based on Machine learning techniques. A novel classification-based application for the segmentation is also presented to demonstrate efficiency.
Several types of microarrays have been developed to address different biological processes: (i) cDNA microarrays (Eisen, 1999) are used for the monitoring of the gene expression levels to study the effects of certain treatments, diseases, and developmental stages on gene expression. As a result, microarray gene expression profiling can be used to identify disease genes by comparing gene expression in diseased and normal cells. (ii) Comparative genomic hybridization application assesses genome content in different cells or closely related organisms (Pollack et al., 1999). (iii) SNP detection arrays identify single nucleotide polymorphism among alleles within or between populations (Moran & Whitney, 2004). (iv) Finally, Chromatin immunoprecipitation (chIP) technologies determine protein binding site occupancy throughout the genome, employing ChIP-on-chip technology (Buck & Lieb, 2004).
The experiment of cDNA microarrays typically starts by taking two biological tissues and extracting their mRNA. The mRNA samples are reverse transcribed into complementary DNA (cDNA) and labelled with fluorescent dyes resulting in a fluorescence-tagged cDNA. The most common dyes for tagging cDNA are the red fluorescent dye Cy5 (emission from 630-660 nm) and the green-fluorescent dye Cy3 (emission from 510-550 nm). Next, the tagged cDNA copy, called the sample probe, is hybridized on a slide containing a grid or array of single-stranded cDNAs called probes. Probes are usually known genes of interest which were printed on a glass microscope slide by a robotic arrayer. According to the hybridization principles, a sample probe will only hybridize with its complementary probe. The probe-sample hybridization process on a microarray typically occurs after several hours. All unhybridized sample probes are then washed off and the microarray is scanned twice, at different wavelengths corresponding to the different dyes used in the assay. The digital image scanner records the intensity level at each grid location producing two greyscale images. The intensity level is correlated with the absolute amount of RNA in the original sample, and thus, the expression level of the gene associated with this RNA.
Automated quantification of gene expression levels is realized analyzing the microarray images. Microarray images contain several blocks (or subgrids) which consist of a number of spots, placed in rows and columns (Figure 1). The level of intensity of each spot represents the amount of sample which is hybridized with the corresponding gene. The processing of microarray images (Schena et al., 1995) includes three stages: initially, spots and blocks are preliminarily located from the images (gridding). Second, using the available gridding information, each microarray spot is individually segmented into foreground and background. Finally, intensity extraction, calculates the foreground fluorescence intensity, which represents each gene expression level, and the background intensities. Ideally, the image analysis would be a rather trivial process, if all the spots had circular shape, similar size, and the background was noise and artefact free. However, a scanned microarray image has none of the above characteristics, thus microarray image analysis becomes a difficult task. In this chapter, we describe several microarray segmentation algorithms based on image processing and machine learning techniques.
A Block of a Typical Microarray Image
Key Terms in this Chapter
Machine Learning: It refers to the design and development of algorithms and techniques that allow computers to “learn”. The purpose of machine learning is to extract information from several types of data automatically, using computational and statistical methods.
Block: Blocks are also known as grids or subgrids. These are areas of the microarray slide (and relatively of the microarray image) in which a number of spots are located.
Clustering: It is the task of decomposing or partitioning a dataset into groups so that the points in one group are similar to each other and are as different as possible from the points in the other groups.
Spot: It is a small and almost circular area in the microarray image whose mean intensity represents the expression level of the corresponding gene.
Classi fication: It is a procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items and based on a training set of previously labelled items.
Microarray: Sets of miniaturized chemical reaction areas that may also be used to test DNA fragments, antibodies, or proteins, by using a chip having immobilised target and hybridising them with a probed sample.
Image Processing: The analysis of an image using techniques that can identify shades, colours and relationships that cannot be perceived by the human eye. In the biomedical field, image processing is used to produce medical diagnosis or to extract data for further analysis.