» Articles » PMID: 36961377

OpenCustomDB: Integration of Unannotated Open Reading Frames and Genetic Variants to Generate More Comprehensive Customized Protein Databases

Overview
Journal J Proteome Res
Specialty Biochemistry
Date 2023 Mar 24
PMID 36961377
Authors
Affiliations
Soon will be listed here.
Abstract

Proteomic diversity in biological samples can be characterized by mass spectrometry (MS)-based proteomics using customized protein databases generated from sets of transcripts previously detected by RNA-seq. This diversity has only been increased by the recent discovery that many translated alternative open reading frames rest unannotated at unsuspected locations of mRNAs and ncRNAs. These novel protein products, termed alternative proteins, have been left out of all previous custom database generation tools. Consequently, genetic variations that impact alternative open reading frames and variant peptides from their translated proteins are not detectable with current computational workflows. To fill this gap, we present OpenCustomDB, a bioinformatics tool that uses sample-specific RNaseq data to identify genomic variants in canonical and alternative open reading frames, allowing for more than one coding region per transcript. In a test reanalysis of a cohort of 16 patients with acute myeloid leukemia, 5666 peptides from alternative proteins were detected, including 201 variant peptides. We also observed that a significant fraction of peptide-spectrum matches previously assigned to peptides from canonical proteins got better scores when reassigned to peptides from alternative proteins. Custom protein libraries that include sample-specific sequence variations of all possible open reading frames are promising contributions to the development of proteomics and precision medicine. The raw and processed proteomics data presented in this study can be found in PRIDE repository with accession number PXD029240.

Citing Articles

Proteomics Can Rise to the Challenge of Pseudogenes' Coding Nature.

Vasylieva V, Arefiev I, Bourassa F, Trifiro F, Brunet M J Proteome Res. 2024; 23(12):5233-5249.

PMID: 39486438 PMC: 11629383. DOI: 10.1021/acs.jproteome.4c00116.


Deciphering the ghost proteome in ovarian cancer cells by deep proteogenomic characterization.

Garcia-Del Rio D, Derhourhi M, Bonnefond A, Leblanc S, Guilloy N, Roucou X Cell Death Dis. 2024; 15(9):712.

PMID: 39349928 PMC: 11442847. DOI: 10.1038/s41419-024-07046-1.


Relevance of mutation-derived neoantigens and non-classical antigens for anticancer therapies.

Aparicio B, Theunissen P, Hervas-Stubbs S, Fortes P, Sarobe P Hum Vaccin Immunother. 2024; 20(1):2303799.

PMID: 38346926 PMC: 10863374. DOI: 10.1080/21645515.2024.2303799.


OpenProt 2.0 builds a path to the functional characterization of alternative proteins.

Leblanc S, Yala F, Provencher N, Lucier J, Levesque M, Lapointe X Nucleic Acids Res. 2023; 52(D1):D522-D528.

PMID: 37956315 PMC: 10767855. DOI: 10.1093/nar/gkad1050.


BamQuery: a proteogenomic tool to explore the immunopeptidome and prioritize actionable tumor antigens.

Ruiz Cuevas M, Hardy M, Larouche J, Apavaloaei A, Kina E, Vincent K Genome Biol. 2023; 24(1):188.

PMID: 37582761 PMC: 10426134. DOI: 10.1186/s13059-023-03029-1.

References
1.
Cesnik A, Miller R, Ibrahim K, Lu L, Millikin R, Shortreed M . Spritz: A Proteogenomic Database Engine. J Proteome Res. 2020; 20(4):1826-1834. PMC: 8024408. DOI: 10.1021/acs.jproteome.0c00407. View

2.
The M, MacCoss M, Noble W, Kall L . Fast and Accurate Protein False Discovery Rates on Large-Scale Proteomics Data Sets with Percolator 3.0. J Am Soc Mass Spectrom. 2016; 27(11):1719-1727. PMC: 5059416. DOI: 10.1007/s13361-016-1460-7. View

3.
Landry C, Zhong X, Nielly-Thibault L, Roucou X . Found in translation: functions and evolution of a recently discovered alternative proteome. Curr Opin Struct Biol. 2015; 32:74-80. DOI: 10.1016/j.sbi.2015.02.017. View

4.
Li Y, Wang X, Cho J, Shaw T, Wu Z, Bai B . JUMPg: An Integrative Proteogenomics Pipeline Identifying Unannotated Proteins in Human Brain and Cancer Cells. J Proteome Res. 2016; 15(7):2309-20. PMC: 5033046. DOI: 10.1021/acs.jproteome.6b00344. View

5.
Bray N, Pimentel H, Melsted P, Pachter L . Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016; 34(5):525-7. DOI: 10.1038/nbt.3519. View