Article Preview
TopIntroduction
Saccharomyces cerevisiae (Bakers’ yeast) is commonly used by molecular biologists as a model eukaryote for the study of basic cellular processes. In addition, studies of the response of yeast to different stresses have implications in the brewing and baking industries and for developing strains of yeast that produce lower levels of acetic acid (Mizuno et al., 2006) in response to varying ethanol concentrations, with potential for being a viable source of ethanol for the bio-fuel industry.
In the budding yeast there are over 6000 expressed genes. Until very recently, analyzing the expression patterns of these genes was confined to only a few gene products or message levels at a time, or could only be addressed by means of perturbation analysis in combination with computer simulations (Glass & Mackey, 1979). The advent of microarray techniques can now provide a means for assessing the sum total of expressed genes in a cell in a quantitative, reproducible, and internally standardized manner. A single array provides a snapshot of the transcriptional state of the cell at some point in time. When multiple snapshots are taken from a temporally coherent system such as synchronous cells, these signals yield the characteristic time signature of each of the genes in the cell. Thus, by clustering the genes according to the similarity in their expression patterns, one can learn about potential functions of the gene products that otherwise would be very difficult to determine.
Furthermore, the regulation of gene expression in the yeast is a non-linear process under control of a network of connected regulatory proteins called transcription factors (Nicholas & Prigogine, 1977) although many epigenetic factors such as RNA splicing and degradation, are known to modify the expression levels. Consequently, concentrations of a variety of cellular reagents oscillate as the cell goes through the different phases of cell cycle (Mitchison, 1971; Klevecz et al., 1984). Chemical or physical perturbations to the cell cycle can thus introduce a phase shift in the onset time of a given cell cycle event (i.e. mitosis) (Klevecz et al., 1978). If the oscillatory kinetics of expression is confined to a small number of genes, then finding these ‘cell cycle regulated’ genes becomes a fairly easy clustering task. However, if a large number of genes or indeed the entire genome oscillates and the fundamental harmonic of this oscillation is significantly distinguishable from the characteristic period of the cell cycle, then the different functional groups can be identified by their characteristic kinetics or oscillation frequency.
A successful gene-expression clustering program must be able to handle noisy time series data with possible uneven time interval data points. In addition the algorithm needs to be able to offset time shifts in the onset time for the expression of one or more genes in the dataset due to variability in experimental protocols, or variations in timing between different labs.