NBPMF: Novel Peptide Mass Fingerprinting Based on Network Inference

NBPMF: Novel Peptide Mass Fingerprinting Based on Network Inference

Zhewei Liang (Department of Computer Science, University of Western Ontario, London, Canada), Gilles Lajoie (Department of Biochemistry, University of Western Ontario, London, Canada) and Kaizhong Zhang (Department of Computer Science, University of Western Ontario, London, Canada)
DOI: 10.4018/IJCINI.2017100103
OnDemand PDF Download:
List Price: $37.50


Mass spectrometry (MS) is an analytical technique for determining the composition of a sample. In bottom-up techniques, peptide mass fingerprinting (PMF) is widely used to identify proteins from MS dataset. In this article, the authors developed a novel network-based inference software termed NBPMF. By analyzing peptide-protein bipartite network, they designed new peptide protein matching score functions. They present two methods: the static one, ProbS, is based on an independent probability framework; and the dynamic one, HeatS, depicts input data as dependent peptides. The authors also use linear regression to adjust the matching score according to the masses of proteins. In addition, they consider the order of retention time to further correct the score function. In post processing, a peak can only be assigned to one peptide in order to reduce random matches. Finally, the authors try to filter out false positive proteins. The experiments on simulated and real data demonstrate that their NBPMF approaches lead to significantly improved performance compared to several state-of-the-art methods.
Article Preview

1. Introduction

Mass spectrometry (MS) is one of the most informative techniques for determining the composition of a sample. Recently it has become a primary tool for protein identification, quantification, and post translational modification characterization (PTM) in proteomics research. There are usually two different approaches by MS to identify proteins: top-down and bottom-up. In top-down proteomics, intact protein ions can be generated by electrospray mass spectrometry (ESI), then introduced into a mass analyzer and subjected to gas-phase fragmentation. Top-down MS has the ability to sequence intact proteins, especially for the analysis of PTMs (Lanucara & Eyers, 2013). While in conventional bottom-up method, protein identification is based on mass spectrometric analysis of peptides derived from proteolytic digestion, usually with trypsin.

There are usually two modes for bottom-up approaches, the most widespread one is data dependent acquisition (DDA), where selected peptide precursors following chromatographic separation are fragmented by MS/MS (Link et al., 1999). Another mode is data-independent acquisition (DIA), where all ions within a selected m/z range are fragmented and analyzed in tandem MS. DIA is an alternative to DDA where a fixed number of precursor ions are selected and analyzed by tandem MS.

In wet-lab procedures for protein identification based on the most used DDA mode, a sample undergoes by enzymatic digestion. Then liquid chromatography and tandem mass spectrometry (LC-MS/MS) are used for analyzing the resultant peptides. This bottom-up approach attempts to reconstruct the original protein sample based on identified peptides, since they can be surrogates for their parent proteins. In order to analyze the dataset, we should have a protein sequence database that contains all target proteins. Each MS/MS scan is used to identify a peptide-spectrum match from it; finally, these peptides are searched against the database to identify the proteins.

For tandem mass spectra in DDA mode, there are roughly four ways to interpret the dataset and identify the fragmentation of proteins: sequence database searching (Cottrell & London, 1999), spectral library searching (Yates et.al, 1998), database-independent approach (de novo sequencing, Ma et.al, 2003), and the hybrid interpretation algorithms (Mann & Wilm, 1994). This method is also called peptide fragment fingerprinting (PFF).

Certain challenges will arise when the above enzymatic digestion LC-MS/MS work flow is applied to complex protein samples, such as plasma or a whole-cell lysate. For example, after digestion a sample of proteins can produce a multitude of peptides, including expected, missed cleavages, and PTMs. This will lead to a peptide under-sampling problem. Even with thorough sample preparation and chromatographic separation, the introductions of peptides into the mass spectrometer are still faster than their isolations and fragmentations. Therefore, the majority of peptides in the sample are often left unanalyzed. Even in another alternative DIA mode, such as selected reaction monitoring (SRM, Anderson & Hunter, 2006) or accurate mass and time tags (ATM, Smith, 2002), under-sampling is unlikely eliminated completely.

To avoid the above problems, advances in LC and MS technologies make it possible to identify peptides solely on their MS masses and retention time (RT) without MS/MS. These advances require instrumentation capable of high-accuracy measurements, LC systems with sufficient RT precision, as well as precise prediction algorithms for relative RT (Krokhin et al., 2004). This technique is analogous to traditional peptide mass fingerprinting (PMF), which has long been used to identify proteins separated by gel-electrophoresis (Shevchenko et.al, 1996). However, due to the lack of specificity with a low-accuracy dataset in peptide identification, PMF has been limited to low complexity samples. The reason for the limitation is that each mass used for fingerprinting can typically be assigned to several peptides from different proteins. Therefore, in a complex sample, it becomes impossible to infer potential presented proteins.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing