» Articles » PMID: 34615866

A Proteomics Sample Metadata Representation for Multiomics Integration and Big Data Analysis

Abstract

The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.

Citing Articles

Temporal phosphoproteomics reveals circuitry of phased propagation in insulin signaling.

Turewicz M, Skagen C, Hartwig S, Majda S, Thedinga K, Herwig R Nat Commun. 2025; 16(1):1570.

PMID: 39939313 PMC: 11821911. DOI: 10.1038/s41467-025-56335-6.


Cooperation of GlycoPOST and UniCarb-DR towards a comprehensive glycomics data repository workflow.

Takahashi Y, Karlsson N, Okuda S, Aoki-Kinoshita K Anal Bioanal Chem. 2024; 417(5):1015-1023.

PMID: 39611991 PMC: 11782440. DOI: 10.1007/s00216-024-05673-3.


The Proteomics Standards Initiative Standardized Formats for Spectral Libraries and Fragment Ion Peak Annotations: mzSpecLib and mzPAF.

Klein J, Lam H, Mak T, Bittremieux W, Perez-Riverol Y, Gabriels R Anal Chem. 2024; 96(46):18491-18501.

PMID: 39514576 PMC: 11579979. DOI: 10.1021/acs.analchem.4c04091.


Data-Independent Acquisition Mass Spectrometry as a Tool for Metaproteomics: Interlaboratory Comparison Using a Model Microbiome.

Rajczewski A, Blakeley-Ruiz J, Meyer A, Vintila S, McIlvin M, Van Den Bossche T bioRxiv. 2024; .

PMID: 39345414 PMC: 11430069. DOI: 10.1101/2024.09.18.613707.


A roadmap to the molecular human linking multiomics with population traits and diabetes subtypes.

Halama A, Zaghlool S, Thareja G, Kader S, Al Muftah W, Mook-Kanamori M Nat Commun. 2024; 15(1):7111.

PMID: 39160153 PMC: 11333501. DOI: 10.1038/s41467-024-51134-x.


References
1.
Perez-Riverol Y, Alpi E, Wang R, Hermjakob H, Vizcaino J . Making proteomics data accessible and reusable: current state of proteomics databases and repositories. Proteomics. 2014; 15(5-6):930-49. PMC: 4409848. DOI: 10.1002/pmic.201400302. View

2.
Sarkans U, Fullgrabe A, Ali A, Athar A, Behrangi E, Diaz N . From ArrayExpress to BioStudies. Nucleic Acids Res. 2020; 49(D1):D1502-D1506. PMC: 7778911. DOI: 10.1093/nar/gkaa1062. View

3.
Pfeuffer J, Sachsenberg T, Alka O, Walzer M, Fillbrunn A, Nilse L . OpenMS - A platform for reproducible analysis of mass spectrometry data. J Biotechnol. 2017; 261:142-148. DOI: 10.1016/j.jbiotec.2017.05.016. View

4.
Gonzalez-Beltran A, Maguire E, Sansone S, Rocca-Serra P . linkedISA: semantic representation of ISA-Tab experimental metadata. BMC Bioinformatics. 2014; 15 Suppl 14:S4. PMC: 4255742. DOI: 10.1186/1471-2105-15-S14-S4. View

5.
Deutsch E, Orchard S, Binz P, Bittremieux W, Eisenacher M, Hermjakob H . Proteomics Standards Initiative: Fifteen Years of Progress and Future Work. J Proteome Res. 2017; 16(12):4288-4298. PMC: 5715286. DOI: 10.1021/acs.jproteome.7b00370. View