Readers shall find a quick introduction with recommendations into the preprocessing of Affymetrix GeneChip® microarrays. In the rapidly growing field of microarrays, gene expression, especially the Affymetrix GeneChip arrays, is an established technology present on the market for over ten years. Used in biomedical research, the mass of information demands statistics for its analysis. Here we present the particular design of GeneChip arrays, where much research has already been invested and some validation resources for the comparison of the methods are available. For a basic understanding of the preprocessing, we emphasize the steps, namely: background correction, normalization, perfect match correction, summarization, and couple these with alternative probe-gene assignments. Combined with a recommendation of successful methods a first use of the new technology becomes possible.
Design Of The Platform
In the GeneChip approach the expression of a gene is measured by several probes. The probes are selected from the transcript sequence of the respective gene. The UniGene database is the reference for the gene sequence. To avoid cross hybridization between several genes, the sequence of the probes has to be chosen unique to the gene. The length of the probes is always 25 nucleotides.
A number of such probes collected in probe sets stands for independent measurements of the number of transcripts for the gene. The number of probes in a probe set varies between chip platforms. For example in the popular Human Genome U133 Plus 2.0 array there are eleven probes in each probe set.
With the advancement of the human genome sequence and transcript libraries the choice of probe sequences has to be updated from one chip platform to the next. The assignment of the probe sets to genes is updated quarterly and can be retrieved from the NetAffx service on the Affymetrix homepage.
In the classic chip designs, each probe is spotted with its perfect match (PM) sequence and the so-called mismatch (MM) sequence. In the mismatch sequence the 13th nucleotide is altered. The idea is, that the mismatch sequence measures the background expression. The perfect match signal then contains the background expression plus the gene expression. In the newer chips Affymetrix spares the space for additional probes and replaces the mismatches with GC-bins. For a given number of G or C nucleotides (between 0 and 25) the GC-bin contains 25mers unrelated to any gene sequence. The assumption is, that sequences with the same GC content show similar expression behaviour. To make the hybridization results independent from the degradation of the transcripts in the cell the probe sequences are selected near the 3’ end of the gene sequence.
In the production of the chips, the probes are spotted on slides using a photolitographic method. In the experiment, labeled RNA from the sample under study is injected on the chip. The hybridization result depends non-linearly on the amount of transcripts in the sample. In the analysis of the measurement results this has to be considered by calculating with the logarithm of the hybridization value. The Affymetrix chips are single channel chips. The RNA is labeled using the same dye. The comparison between different samples is done by using several chips. The chip with the hybridized solution is then scanned on the wavelength of the dye. Analysis starts by exploiting the scanner image. An approximate level of hybridization for every probe is inferred from this image.
Probe sequences are selected from the transcribed regions of the gene sequence
Key Terms in this Chapter
Microarray: A microarray (also known as gene chip or DNA chip) is a collection of microscopic DNA spots, commonly representing sequence extracts of single genes, arrayed on a solid surface by covalent attachment to a chemical matrix. DNA arrays are commonly used for expression profiling, i.e., monitoring expression levels of thousands of genes simultaneously, or for comparative genomic hybridization.
MM: Mismatch, a probe accompanying a PM, where the 13th nucleotide is changed to its complement (A to T, T to A, G to C, C to G). In the Affymetrix GeneChip design MM are spotted along with the PM aiming to measure the non-specific hybridization to the PM.
Gene Expression: Gene expression is the process by which the inheritable information in a gene, such as the DNA sequence, is made into a functional gene product, such as protein or RNA.
Probe Set: a set consisting of all probes addressing the transcripts from the same gene. In the Affymetrix GeneChip design the expression level of a gene shall be measured with several probes.
FARMS: Factor Analysis for Robust Microarray Summarization, a probabilistic latent variable model for summarizing high-density oligonucleotide Affymetrix GeneChip array data at probe level.
Probe: A probe is a fragment of DNA of 25 nucleotides/basepairs length, which is used to detect in RNA samples the presence of nucleotide sequences (the DNA target) that are complementary to the sequence in the probe. The probe thereby hybridizes to single-stranded nucleic acid (DNA or RNA) the base sequence of which allows probe-target base pairing due to complementarity between the probe and target.