» Articles » PMID: 23517142

Predicting Tryptic Cleavage from Proteomics Data Using Decision Tree Ensembles

Overview
Journal J Proteome Res
Specialty Biochemistry
Date 2013 Mar 23
PMID 23517142
Citations 14
Authors
Affiliations
Soon will be listed here.
Abstract

Trypsin is the workhorse protease in mass spectrometry-based proteomics experiments and is used to digest proteins into more readily analyzable peptides. To identify these peptides after mass spectrometric analysis, the actual digestion has to be mimicked as faithfully as possible in silico. In this paper we introduce CP-DT (Cleavage Prediction with Decision Trees), an algorithm based on a decision tree ensemble that was learned on publicly available peptide identification data from the PRIDE repository. We demonstrate that CP-DT is able to accurately predict tryptic cleavage: tests on three independent data sets show that CP-DT significantly outperforms the Keil rules that are currently used to predict tryptic cleavage. Moreover, the trees generated by CP-DT can make predictions efficiently and are interpretable by domain experts.

Citing Articles

Application of machine learning for mass spectrometry-based multi-omics in thyroid diseases.

Che Y, Zhao M, Gao Y, Zhang Z, Zhang X Front Mol Biosci. 2025; 11:1483326.

PMID: 39741929 PMC: 11685090. DOI: 10.3389/fmolb.2024.1483326.


Sterol Derivatives Specifically Increase Anti-Inflammatory Oxylipin Formation in M2-like Macrophages by LXR-Mediated Induction of 15-LOX.

Ohno R, Mainka M, Kirchhoff R, Hartung N, Schebb N Molecules. 2024; 29(8).

PMID: 38675565 PMC: 11052137. DOI: 10.3390/molecules29081745.


Use of Nonhuman Sera as a Highly Cost-Effective Internal Standard for Quantitation of Multiple Human Proteins Using Species-Specific Tryptic Peptides: Applicability in Clinical LC-MS Analyses.

Williams G, Couchman L, Taylor D, Sandhu J, Slingsby O, Ng L J Proteome Res. 2024; 23(8):3052-3063.

PMID: 38533909 PMC: 11301776. DOI: 10.1021/acs.jproteome.3c00762.


DbyDeep: Exploration of MS-Detectable Peptides via Deep Learning.

Son J, Na S, Paek E Anal Chem. 2023; 95(30):11193-11200.

PMID: 37459568 PMC: 10401496. DOI: 10.1021/acs.analchem.3c00460.


PepGM: a probabilistic graphical model for taxonomic inference of viral proteome samples with associated confidence scores.

Holstein T, Kistner F, Martens L, Muth T Bioinformatics. 2023; 39(5).

PMID: 37129543 PMC: 10182852. DOI: 10.1093/bioinformatics/btad289.