Uncovering Fine Structure in Gene Expression Profile by Maximum Entropy Modeling of cDNA Microarray Images and Kernel Density Methods
George Sakellaropoulos (University of Patras, Greece), Antonis Daskalakis (University of Patras, Greece), George Nikiforidis (University of Patras, Greece) and Christos Argyropoulos (University of Pittsburgh Medical Center, USA)
Copyright: © 2009
The presentation and interpretation of microarray-based genome-wide gene expression profiles as complex biological entities are considered to be problematic due to their featureless, dense nature. Furthermore microarray images are characterized by significant background noise, but the effects of the latter on the holistic interpretation of gene expression profiles remains under-explored. We hypothesize that a framework combining (a) Bayesian methodology for background adjustment in microarray images with (b) model-free modeling tools, may serve the dual purpose of data and model reduction, exposing hitherto hidden features of gene expression profiles. Within the proposed framework, microarray image restoration and noise adjustment is facilitated by a class of prior Maximum Entropy distributions. The resulting gene expression profiles are non-parametrically modeled by kernel density methods, which not only normalize the data, but facilitate the generation of reduced mathematical descriptions of biological variability as mixture models.
The advent of complementary DNA (cDNA) microarray technologies enabled the simultaneous and specific assessment of the expression levels of thousands of genes (Southern, Mir, & Shchepinov, 1999). The conventional approach to analyze such datasets is to explore quantitative co-expression relations across a variety of experimental conditions prior to invoking putative similarities in gene regulation or function (DeRisi, Iyer, & Brown, 1997; Eisen, Spellman, Brown, & Botstein, 1998). The alternative viewpoint considers gene expression profiles from specific conditions to be informative of distinct molecular signatures that characterize cellular states. Such genome wide, transcriptional signatures have been used to distinguish normal from abnormal samples in benign developmental conditions (Barnes et al., 2005), solid tumors and hematologic malignancies (Febbo et al., 2005; Valentini, 2002) and differentiate distinct disease states of renal allografts (Sarwal et al., 2003). It has been suggested that the thousands of expression values in a microarray experiment are too dense and irregular to be directly interpreted in a holistic manner and that alternative transformations of the normalized gene profiles should be sought after (Guo, Eichler, Feng, Ingber, & Huang, 2006). Nevertheless one could justifiably argue that the irregularity of the gene profiles is due to incomplete modeling and adjustment for the presence of measurement noise. However this alternative hypothesis has not been adequately addressed in the current literature. These considerations underline the impetus for the present work, which aims to:
Establish the role of microarray image background in the irregularity and “featureless” appearance of gene expression profiles (GEP) from individual experimental states.
Propose a data and model reduction framework for the analysis of GEP consisting of:
A probabilistic Bayesian algorithm for background adjustment of microarray images based on Maximum Entropy distributions.
Non-parametric kernel density estimation methods for the mathematical representation and exploration of the resultant gene expression profiles.
Key Terms in this Chapter
Ontological Modeling: Modeling that implies the existence of certain objects in the physical natural world. The distinction between ontological and epistemological modeling is a subtle one; whereas the former is an investigation about natural objects and properties, the latter concerns the analysis of (usually) subjective statements about models of the world.
Non–Parametric Kernel Density Estimation: Non-parametric kernel density estimation methods are model free techniques for the estimation of an empiric distribution from experimental data. Formally such estimators smooth out the contribution of each observed data point over a local neighborhood.
Power Series Distribution: Discrete probability distributions with probability mass function given by: . Modified power series distribution (MPSD) are more general distributions which arise when ? is a function of another (simple) parameter. In such a case we define the power parameter ? ( m ) and series function ? ( ? ) by: .Particular choices of the power parameter render power distributions that are analytic approximations to Maximum Entropy priors over finite domains.
Normalization: The process in which mathematical transformations of the microarray data are undertaken to reduce variability in the expression levels and make data from different experiments directly comparable.
Epistemological Modeling: Modeling that quantifies one’s perception about the world rather than the world per se. The object of such modeling is to generate coherent descriptions of one’s knowledge usually in the face of uncertainty.
Bayesian probability: An interpretation of the colloquial term probability, which identifies the latter with the degree of belief of a proposition about the world. This interpretation is firmly grounded in the rules of Aristotelian logic and in fact extends the latter in situations of uncertainty i.e. when the truth or falsity of propositions cannot be ascertained completely. Stated in other terms, the construct of Bayesian Probability and the supporting theory is nothing more than “common sense reduced to numbers”. The major instrument for updating one’s prior beliefs to posterior inferences in light of new information is the computational machinery of the Bayes’ theorem.
Maximum Entropy Prior: The distribution that results from application of the variational maximum entropy algorithm. The latter uniquely determines the least biased epistemic (Bayesian probability distribution) that encodes certain testable information is by maximizing the convex functional that information negative entropy defines. The resulting distributions are least informative or “ objective” ones in the sense that they are most compatible with the “pre-data” constraints, while being maximally noncommittal about the missing information.