» Articles » PMID: 24116388

Application of Machine Learning to Proteomics Data: Classification and Biomarker Identification in Postgenomics Biology

Overview
Journal OMICS
Date 2013 Oct 15
PMID 24116388
Citations 99
Authors
Affiliations
Soon will be listed here.
Abstract

Mass spectrometry is an analytical technique for the characterization of biological samples and is increasingly used in omics studies because of its targeted, nontargeted, and high throughput abilities. However, due to the large datasets generated, it requires informatics approaches such as machine learning techniques to analyze and interpret relevant data. Machine learning can be applied to MS-derived proteomics data in two ways. First, directly to mass spectral peaks and second, to proteins identified by sequence database searching, although relative protein quantification is required for the latter. Machine learning has been applied to mass spectrometry data from different biological disciplines, particularly for various cancers. The aims of such investigations have been to identify biomarkers and to aid in diagnosis, prognosis, and treatment of specific diseases. This review describes how machine learning has been applied to proteomics tandem mass spectrometry data. This includes how it can be used to identify proteins suitable for use as biomarkers of disease and for classification of samples into disease or treatment groups, which may be applicable for diagnostics. It also includes the challenges faced by such investigations, such as prediction of proteins present, protein quantification, planning for the use of machine learning, and small sample sizes.

Citing Articles

Integrating genetic algorithms and language models for enhanced enzyme design.

Nana Teukam Y, Zipoli F, Laino T, Criscuolo E, Grisoni F, Manica M Brief Bioinform. 2025; 26(1.

PMID: 39780486 PMC: 11711099. DOI: 10.1093/bib/bbae675.


Proteomics and machine learning: Leveraging domain knowledge for feature selection in a skeletal muscle tissue meta-analysis.

Shahin-Shamsabadi A, Cappuccitti J Heliyon. 2024; 10(24):e40772.

PMID: 39720035 PMC: 11667615. DOI: 10.1016/j.heliyon.2024.e40772.


Integrating Molecular Perspectives: Strategies for Comprehensive Multi-Omics Integrative Data Analysis and Machine Learning Applications in Transcriptomics, Proteomics, and Metabolomics.

Sanches P, de Melo N, M Porcari A, de Carvalho L Biology (Basel). 2024; 13(11).

PMID: 39596803 PMC: 11592251. DOI: 10.3390/biology13110848.


Mass Spectrometry Advancements and Applications for Biomarker Discovery, Diagnostic Innovations, and Personalized Medicine.

Son A, Kim W, Park J, Park Y, Lee W, Lee S Int J Mol Sci. 2024; 25(18).

PMID: 39337367 PMC: 11432749. DOI: 10.3390/ijms25189880.


Comprehensive Overview of Bottom-Up Proteomics Using Mass Spectrometry.

Jiang Y, Rex D, Schuster D, Neely B, Rosano G, Volkmar N ACS Meas Sci Au. 2024; 4(4):338-417.

PMID: 39193565 PMC: 11348894. DOI: 10.1021/acsmeasuresciau.3c00068.


References
1.
Vlahou A, Schorge J, Gregory B, Coleman R . Diagnosis of Ovarian Cancer Using Decision Tree Classification of Mass Spectral Data. J Biomed Biotechnol. 2003; 2003(5):308-314. PMC: 521504. DOI: 10.1155/S1110724303210032. View

2.
Bellew M, Coram M, Fitzgibbon M, Igra M, Randolph T, Wang P . A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS. Bioinformatics. 2006; 22(15):1902-9. DOI: 10.1093/bioinformatics/btl276. View

3.
Aebersold R, Mann M . Mass spectrometry-based proteomics. Nature. 2003; 422(6928):198-207. DOI: 10.1038/nature01511. View

4.
Voshol H, Ehrat M, Traenkle J, Bertrand E, van Oostrum J . Antibody-based proteomics: analysis of signaling networks using reverse protein arrays. FEBS J. 2009; 276(23):6871-9. DOI: 10.1111/j.1742-4658.2009.07395.x. View

5.
Heinecke N, Pratt B, Vaisar T, Becker L . PepC: proteomics software for identifying differentially expressed proteins based on spectral counting. Bioinformatics. 2010; 26(12):1574-5. PMC: 2881356. DOI: 10.1093/bioinformatics/btq171. View