Eukaryotic genes have the ability to produce several distinct products from a single genomic locus. Recent developments in microarray technology allow monitoring of such isoform variation at a genome-wide scale. In our research, we have used Affymetrix Exon Arrays to detect variation in alternative splicing, initiation of transcription, and polyadenylation among humans. We demonstrated that such variation is common in human populations and has an underlying genetic component. Here, we use our study to illustrate the use of Exon Arrays to detect alternative isoforms, to outline the analysis involved, and to point out potential problems that may be encountered by researchers using this technology.
Alternative pre-mRNA splicing is a process allowing the production of several distinct gene isoforms from a single genomic locus. The most common type of alternative splicing events in mammals results in cassette exons, where each such exon can be either included or excluded from the mature mRNA. Other events include alternative use of donor or acceptor splice sites, and intron retention. In addition, processes such as alternative promoter usage and alternative polyadenylation, resulting in differences in initiation and termination of the transcript, respectively, further diversify eukaryotic transcriptomes and proteomes. Such processes have been suggested to be at least partly responsible for mammalian complexity, which is otherwise difficult to explain in view of our relatively low number of genomic loci - less than 25,000 genes in humans, versus approximately 20,000 in the nematode worm C. elegans (Claverie, 2001). It is estimated that a high percentage of mammalian genes is alternatively spliced, and this frequency is highest in specialized and complex tissues, such as the brain and the liver. Differences in splicing patterns have been shown to exist across species, and within populations of the same species. In humans, splicing defects are known to result in numerous genetic disorders (Faustino & Cooper, 2003) and may confer susceptibility to complex genetic diseases. Thus, the process of alternative splicing attracts the interest of researchers across the entire biomedical sciences spectrum, ranging from evolutionary biology, through development, to medicine.
In recent years, alternative transcript investigation in a genome-wide context has been carried out using expressed sequence tag libraries (ESTs). Generally, ESTs (short cDNA sequence reads) are mapped to the genomic sequence, and different isoforms can be inferred from incongruence of splicing patterns (Modrek, Resch, Grasso, & Lee, 2001). However, EST library analyses are prone to sequencing errors, biased towards highly expressed genes, and influenced by cancer-derived ESTs, which may not generally be present in healthy tissues.
More recently, microarray platforms have been proposed as a tool for studying gene expression at the isoform level (Black & Graveley, 2006; Lee & Roy, 2004; Zhang et al., 2006). Splicing sensitive microarrays employ a number of exon body oligonucleotide probes, or exon junction probes, or a combination of the two designs, to determine mRNA levels at the resolution of a single exon or splice site. The Affymetrix GeneChip® Human Exon 1.0 ST Array is the first commercially available microarray product designed for genome-wide, exon level expression analysis. The array relies on targeting multiple probes to individual exons and allows simultaneous, exon-level detection of expression intensity for 1.4 million probesets covering over 1 million known and predicted human exons. The Exon Array is a flexible tool, which can be used to perform the function of classical expression arrays and concurrently provide supplementary information on isoform changes. However, because of the complexity of the design, statistical analysis of the data becomes much more intensive, both at the theoretical and computational level. The simplest illustration is the multiple testing problem; whereas in classical expression arrays the number of tests is of the order of the number of genes, in an exon array, the number of tests is over ten-fold higher – and can vary between a few hundred thousand to over 1 million (if computationally predicted exons are included). The statistical approaches need to be able to distinguish between whole gene expression differences and isoform differences, which introduces a new level of complexity. The robustness of measurement is also an issue, since the exon array has on average four probes per probeset, whereas Affymetrix expression arrays relied on more than 10 probes per probeset to estimate expression.
Key Terms in this Chapter
Allelic Association: A statistical association of a genetic marker allele with a phenotypic trait. Here, we use association analysis to detect SNPs statistically correlated with changes in isoform-level expression. While association does not directly imply causation, it is highly likely that a causative genetic variant is in linkage disequilibrium with the significant SNP marker.
Isoform: In the context presented here, an isoform is one of the transcript variants produced by each locus. A gene isoform can result from alternative splicing, alternative transcription initiation, or polyadenylation.
Alternative Splicing: A mechanism which results in the production of several mRNA variants from a single genomic locus, by preferential inclusion or exclusion of certain splice sites or exons.
SNP: Single nucleotide polymorphism. SNPs are single base pair mutations which have been driven to detectable frequencies in human populations. On average, two human individuals will differ at 1 polymorphic site for each 1000 bp of DNA. Vast majority of the SNPs are likely to be neutral, but some may affect phenotypic traits.
Exon Array: A type of microarray using probes targeted to individual exons within each gene. Exon Arrays may be used to measure the expression of an entire transcript, but also detect higher level changes, such as alternative splicing and other transcript isoform differences.
Pre-mRNA Splicing: A process which removes intronic sequences from the precursor messenger RNA of eukaryotic genes, to produce mature messenger (m)RNA.
EST: Expressed sequence tag. Short sequence reads are produced on large scale from cDNA libraries. EST sequencing allowed quantification of known transcripts, detection of novel genes, and novel isoforms.