» Articles » PMID: 27493588

Recognizing Millions of Consistently Unidentified Spectra Across Hundreds of Shotgun Proteomics Datasets

Overview
Journal Nat Methods
Date 2016 Aug 6
PMID 27493588
Citations 75
Authors
Affiliations
Soon will be listed here.
Abstract

Mass spectrometry (MS) is the main technology used in proteomics approaches. However, on average 75% of spectra analysed in an MS experiment remain unidentified. We propose to use spectrum clustering at a large-scale to shed a light on these unidentified spectra. PRoteomics IDEntifications database (PRIDE) Archive is one of the largest MS proteomics public data repositories worldwide. By clustering all tandem MS spectra publicly available in PRIDE Archive, coming from hundreds of datasets, we were able to consistently characterize three distinct groups of spectra: 1) incorrectly identified spectra, 2) spectra correctly identified but below the set scoring threshold, and 3) truly unidentified spectra. Using a multitude of complementary analysis approaches, we were able to identify less than 20% of the consistently unidentified spectra. The complete spectrum clustering results are available through the new version of the PRIDE Cluster resource (http://www.ebi.ac.uk/pride/cluster). This resource is intended, among other aims, to encourage and simplify further investigation into these unidentified spectra.

Citing Articles

Targeting the ERK1/2 and p38 MAPK pathways attenuates Golgi tethering factor golgin-97 depletion-induced cancer progression in breast cancer.

Liu Y, Lin T, Chong K, Chen G, Kuo C, Lin Y Cell Commun Signal. 2025; 23(1):22.

PMID: 39800687 PMC: 11727508. DOI: 10.1186/s12964-024-02010-0.


The Hunt Lab Guide to De Novo Peptide Sequence Analysis by Tandem Mass Spectrometry.

Anderson L, Bai D, Blakney G, Butcher D, Reser L, Shabanowitz J Mol Cell Proteomics. 2024; 23(12):100875.

PMID: 39515468 PMC: 11665681. DOI: 10.1016/j.mcpro.2024.100875.


The Proteomics Standards Initiative Standardized Formats for Spectral Libraries and Fragment Ion Peak Annotations: mzSpecLib and mzPAF.

Klein J, Lam H, Mak T, Bittremieux W, Perez-Riverol Y, Gabriels R Anal Chem. 2024; 96(46):18491-18501.

PMID: 39514576 PMC: 11579979. DOI: 10.1021/acs.analchem.4c04091.


Dear-PSM: A deep learning-based peptide search engine enables full database search for proteomics.

He Q, Li X, Zhong J, Yang G, Han J, Shuai J Smart Med. 2024; 3(3):e20240014.

PMID: 39420951 PMC: 11425048. DOI: 10.1002/SMMD.20240014.


Alternate RNA decoding results in stable and abundant proteins in mammals.

Tsour S, Machne R, Leduc A, Widmer S, Guez J, Karczewski K bioRxiv. 2024; .

PMID: 39253435 PMC: 11383030. DOI: 10.1101/2024.08.26.609665.


References
1.
Collins M, Wright J, Jones M, Rayner J, Choudhary J . Confident and sensitive phosphoproteomics using combinations of collision induced dissociation and electron transfer dissociation. J Proteomics. 2014; 103:1-14. PMC: 4047622. DOI: 10.1016/j.jprot.2014.03.010. View

2.
Schittmayer M, Fritz K, Liesinger L, Griss J, Birner-Gruenberger R . Cleaning out the Litterbox of Proteomic Scientists' Favorite Pet: Optimized Data Analysis Avoiding Trypsin Artifacts. J Proteome Res. 2016; 15(4):1222-9. PMC: 4820788. DOI: 10.1021/acs.jproteome.5b01105. View

3.
Frank A, Pevzner P . PepNovo: de novo peptide sequencing via probabilistic network modeling. Anal Chem. 2005; 77(4):964-73. DOI: 10.1021/ac048788h. View

4.
Vizcaino J, Csordas A, Del-Toro N, Dianes J, Griss J, Lavidas I . 2016 update of the PRIDE database and its related tools. Nucleic Acids Res. 2015; 44(D1):D447-56. PMC: 4702828. DOI: 10.1093/nar/gkv1145. View

5.
Lam H, Deutsch E, Eddes J, Eng J, King N, Stein S . Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics. 2007; 7(5):655-67. DOI: 10.1002/pmic.200600625. View