The completion of the Human Genome Project and the emergence of high-throughput technologies at the dawn of the new millennium, are rapidly changing the way we approach biological problems. DNA microarrays represent a promising new technological development, widely used for the investigation and identification of genes associated with important biological processes. The chapter is divided in two parts: the first discusses current methods for the acquisition and quantitation of the microarray image while the second focuses in the analysis and interpretation of the microarray signals (standardization, normalization, statistical analysis etc.)
A DNA microarray is normally a slide made of silica or synthetic materials where on top an ordered array of oligonucleotide clones is imprinted, corresponding to regions of all discovered or putative genes of an organism’s genome, at sufficient quantities to ensure evasion of saturation effects, which allows the specific binding of genes or gene products (Schena, 2003). DNA microarrays are composed of thousands of DNA sequences (probes), each representing a gene. The DNA sequences can be long (500-2500bp) cDNA sequences or shorter (25-70bp) oligonucleotide sequences. Oligonucleotide sequences can be pre-synthesized and deposited with a pin or piezoelectric spray, synthesized in situ by photolithographic (Affymetrix) or inkjet (Agilent) technologies, or be attached to microscopic beads (Illumina) which are then randomly dispersed over the wells of the microarray slide.
Relative quantitative detection of gene expression can be carried out between two samples on a single array or by single samples using multiple arrays. The first approach entails (at least) two sample sources which are labelled with different fluorescent molecules, usually Cy3 (green fluorescence) and Cy5 (red fluorescence) Conventionally Cy3 represents the ‘control’ state whereas Cy5 represents the state under examination. These samples are hybridized together on the same array, a scanner laser-excites the dyes and an image is produced for each dye. The relative intensities of each channel represent the relative abundance of the RNA or DNA product in each sample. In the second approach, each sample is labelled with the same dye and hybridized onto separate arrays (Bajcsy, Liu, & Band, 2007). The absolute fluorescent values of each spot may then be scaled and compared to detect possible alterations in gene expression.
The resulting images are used to generate a dataset where pre-processing is performed prior to the analysis and interpretation of the results, in order to ensure the same level of comparison within and across slides, as well as to mitigate the role of noise. The pre-processing step entails useful transformations and assessment of the signal quality of the gene probes, in order to extract or enhance reliable signal characteristics which render the dataset amenable to the application of various data analysis methods.
The quantification of gene expression implies that the amount of fluorescence measured at each sequence specific location is proportional to the amount of mRNA hybridized onto the gene probes on the array. Processing of the images maps the arrayed gene spots and quantifies their expression, to the relative fluorescence intensities, measured for each spot. Microarray experiments do not directly provide insight on the absolute level of expression of a particular gene; nevertheless, they are useful to compare the expression level among conditions and genes (e.g. health vs. disease, treated vs. untreated) (Quackenbush, 2002; Tarca et al. 2006).
Key Terms in this Chapter
Signal Information Exctraction: The process of calculating foreground and background intensities, based on the respective pixel distributions derived from the segmentation step.
Meta-Analysis: The exhaustive search process which comprises numerous and versatile algorithmic procedures to exploit the gene expression results by combining or further processing them with sophisticated statistical learning and data mining techniques coupled with annotated information concerning functional properties of these genes residing in large databases.
Normalization: The set of processes applied to compensate for systematic errors among genes or arrays in order to derive meaningful biological comparisons.
DNA Microarray: Normally a slide made of silica or synthetic materials where on top an ordered array of oligonucleotide clones is imprinted, corresponding to regions of all discovered or putative genes of an organism’s genome, which allows the specific binding of genes or gene products.
Missing Value Imputation: The estimation of missing probe values for a gene by the expression of other probes over the rest of the slides, based on certain statistical or geometrical criteria.
Segmentation: The process of classification of the area regarding a specific spot on the array to permit the distinction of the spot pixels either as foreground, or background.
Addressing or Gridding: The process of assigning coordinates to each of the spots for a spotted or bead array or the alignment of a rectangular lattice in order to map pixel elements to specific probes in Affymetrix arrays.