» Articles » PMID: 22103967

Protein Identification Using Customized Protein Sequence Databases Derived from RNA-Seq Data

Overview
Journal J Proteome Res
Specialty Biochemistry
Date 2011 Nov 23
PMID 22103967
Citations 90
Authors
Affiliations
Soon will be listed here.
Abstract

The standard shotgun proteomics data analysis strategy relies on searching MS/MS spectra against a context-independent protein sequence database derived from the complete genome sequence of an organism. Because transcriptome sequence analysis (RNA-Seq) promises an unbiased and comprehensive picture of the transcriptome, we reason that a sample-specific protein database derived from RNA-Seq data can better approximate the real protein pool in the sample and thus improve protein identification. In this study, we have developed a two-step strategy for building sample-specific protein databases from RNA-Seq data. First, the database size is reduced by eliminating unexpressed or lowly expressed genes according to transcript quantification. Second, high-quality nonsynonymous coding single nucleotide variations (SNVs) are identified based on RNA-Seq data, and corresponding protein variants are added to the database. Using RNA-Seq and shotgun proteomics data from two colorectal cancer cell lines SW480 and RKO, we demonstrated that customized protein sequence databases could significantly increase the sensitivity of peptide identification, reduce ambiguity in protein assembly, and enable the detection of known and novel peptide variants. Thus, sample-specific databases from RNA-Seq data can enable more sensitive and comprehensive protein discovery in shotgun proteomics studies.

Citing Articles

Chemoproteogenomic stratification of the missense variant cysteinome.

Desai H, Andrews K, Bergersen K, Ofori S, Yu F, Shikwana F Nat Commun. 2024; 15(1):9284.

PMID: 39468056 PMC: 11519605. DOI: 10.1038/s41467-024-53520-x.


Moving Toward Metaproteogenomics: A Computational Perspective on Analyzing Microbial Samples via Proteogenomics.

Singer F, Kuhring M, Renard B, Muth T Methods Mol Biol. 2024; 2859:297-318.

PMID: 39436609 DOI: 10.1007/978-1-0716-4152-1_17.


A practical introduction to holo-omics.

Odriozola I, Rasmussen J, Gilbert M, Limborg M, Alberdi A Cell Rep Methods. 2024; 4(7):100820.

PMID: 38986611 PMC: 11294832. DOI: 10.1016/j.crmeth.2024.100820.


Transcription factors and splice factors - interconnected regulators of stem cell differentiation.

Mehlferber M, Kuyumcu-Martinez M, Miller C, Sheynkman G Curr Stem Cell Rep. 2024; 9(2):31-41.

PMID: 38939410 PMC: 11210451. DOI: 10.1007/s40778-023-00227-2.


moPepGen: Rapid and Comprehensive Identification of Non-canonical Peptides.

Zhu C, Liu L, Ha A, Yamaguchi T, Zhu H, Hugh-White R bioRxiv. 2024; .

PMID: 38585946 PMC: 10996593. DOI: 10.1101/2024.03.28.587261.


References
1.
Edwards N . Novel peptide identification from tandem mass spectra using ESTs and sequence database compression. Mol Syst Biol. 2007; 3:102. PMC: 1865584. DOI: 10.1038/msb4100142. View

2.
Bossi G, Lapi E, Strano S, Rinaldo C, Blandino G, Sacchi A . Mutant p53 gain of function: reduction of tumor malignancy of human cancer cell lines through abrogation of mutant p53 expression. Oncogene. 2005; 25(2):304-9. DOI: 10.1038/sj.onc.1209026. View

3.
Milicevic Z, Bogojevic D, Mihailovic M, Petrovic M, Krivokapic Z . Molecular characterization of hsp90 isoforms in colorectal cancer cells and its association with tumour progression. Int J Oncol. 2008; 32(6):1169-78. View

4.
Ramakrishnan S, Vogel C, Prince J, Li Z, Penalva L, Myers M . Integrating shotgun proteomics and mRNA expression data to improve protein identification. Bioinformatics. 2009; 25(11):1397-403. PMC: 2682515. DOI: 10.1093/bioinformatics/btp168. View

5.
Fermin D, Allen B, Blackwell T, Menon R, Adamski M, Xu Y . Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics. Genome Biol. 2006; 7(4):R35. PMC: 1557991. DOI: 10.1186/gb-2006-7-4-r35. View