» Articles » PMID: 20562346

Protein and Gene Model Inference Based on Statistical Modeling in K-partite Graphs

Overview
Specialty Science
Date 2010 Jun 22
PMID 20562346
Citations 17
Authors
Affiliations
Soon will be listed here.
Abstract

One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.

Citing Articles

MetaLP: An integrative linear programming method for protein inference in metaproteomics.

Feng S, Ji H, Wang H, Zhang B, Sterzenbach R, Pan C PLoS Comput Biol. 2022; 18(10):e1010603.

PMID: 36269761 PMC: 9629623. DOI: 10.1371/journal.pcbi.1010603.


Characterization of peptide-protein relationships in protein ambiguity groups via bipartite graphs.

Schork K, Turewicz M, Uszkoreit J, Rahnenfuhrer J, Eisenacher M PLoS One. 2022; 17(10):e0276401.

PMID: 36269744 PMC: 9586388. DOI: 10.1371/journal.pone.0276401.


An analysis of proteogenomics and how and when transcriptome-informed reduction of protein databases can enhance eukaryotic proteomics.

Fancello L, Burger T Genome Biol. 2022; 23(1):132.

PMID: 35725496 PMC: 9208142. DOI: 10.1186/s13059-022-02701-2.


Detecting and Testing Altered Brain Connectivity Networks with K-partite Network Topology.

Chen S, Bowman F, Xing Y Comput Stat Data Anal. 2020; 141:109-122.

PMID: 32831438 PMC: 7442212. DOI: 10.1016/j.csda.2019.06.007.


EPIFANY: A Method for Efficient High-Confidence Protein Inference.

Pfeuffer J, Sachsenberg T, Dijkstra T, Serang O, Reinert K, Kohlbacher O J Proteome Res. 2020; 19(3):1060-1072.

PMID: 31975601 PMC: 7583457. DOI: 10.1021/acs.jproteome.9b00566.


References
1.
Brunner E, Ahrens C, Mohanty S, Baetschmann H, Loevenich S, Potthast F . A high-quality catalog of the Drosophila melanogaster proteome. Nat Biotechnol. 2007; 25(5):576-83. DOI: 10.1038/nbt1300. View

2.
Feng J, Naiman D, Cooper B . Probability model for assessing proteins assembled from peptide sequences inferred from tandem mass spectrometry data. Anal Chem. 2007; 79(10):3901-11. DOI: 10.1021/ac070202e. View

3.
Keller A, Nesvizhskii A, Kolker E, Aebersold R . Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem. 2002; 74(20):5383-92. DOI: 10.1021/ac025747h. View

4.
Eng J, McCormack A, Yates J . An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom. 2013; 5(11):976-89. DOI: 10.1016/1044-0305(94)80016-2. View

5.
Moore R, Young M, Lee T . Qscore: an algorithm for evaluating SEQUEST database search results. J Am Soc Mass Spectrom. 2002; 13(4):378-86. DOI: 10.1016/S1044-0305(02)00352-5. View