Mass Informatics in Differential Proteomics
Xiang Zhang (University of Louisville, USA), Seza Orcun (Purdue University, USA), Mourad Ouzzani (Purdue University, USA) and Cheolhwan Oh (Purdue University, USA)
Copyright: © 2009
Systems biology aims to understand biological systems on a comprehensive scale, such that the components that make up the whole are connected to one another and work in harmony. As a major component of systems biology, differential proteomics studies the differences between distinct but related proteomes such as normal versus diseased cells and diseased versus treated cells. High throughput mass spectrometry (MS) based analytical platforms are widely used in differential proteomics (Domon, 2006; Fenselau, 2007). As a common practice, the proteome is usually digested into peptides first. The peptide mixture is then separated using multidimensional liquid chromatography (MDLC) and is finally subjected to MS for further analysis. Thousands of mass spectra are generated in a single experiment. Discovering the significantly changed proteins from millions of peaks involves mass informatics. This paper introduces data mining steps used in mass informatics, and concludes with a descriptive examination of concepts, trends and challenges in this rapidly expanding field.
Data Mining Framework
A variety of mass spectrometers are commercially available. Each of these mass spectrometers stores raw mass spectra in a proprietary format. The raw spectra have to be transformed into common data format first. As in any study of biological phenomena, it is crucial that only relevant observations are identified and related to each other. The interpretation and comprehension of the collection of mass spectra presents major challenges and involve several data mining steps. The aim of mass informatics is to reduce data dimensionality and to extract relevant knowledge from thousands of mass spectra (Arneberg, 2007). Figure 1 shows an overall framework for mass informatics in differential proteomics. Most of the components of this framework will be discussed in this paper with the exception of the pathway modeling.
Information flow chart in differential proteomics