» Articles » PMID: 21771858

Misannotations of RRNA Can Now Generate 90% False Positive Protein Matches in Metatranscriptomic Studies

Overview
Specialty Biochemistry
Date 2011 Jul 21
PMID 21771858
Citations 17
Authors
Affiliations
Soon will be listed here.
Abstract

In the course of analyzing 9,522,746 pyrosequencing reads from 23 stations in the Southwestern Pacific and equatorial Atlantic oceans, it came to our attention that misannotations of rRNA as proteins is now so widespread that false positive matching of rRNA pyrosequencing reads to the National Center for Biotechnology Information (NCBI) non-redundant protein database approaches 90%. One conserved portion of 23S rRNA was consistently misannotated often enough to prompt curators at Pfam to create a spurious protein family. Detailed examination of the annotation history of each seed sequence in the spurious Pfam protein family (PF10695, 'Cw-hydrolase') uncovered issues in the standard operating procedures and quality assurance programs of major sequencing centers, and other issues relating to the curation practices of those managing public databases such as GenBank and SwissProt. We offer recommendations for all these issues, and recommend as well that workers in the field of metatranscriptomics take extra care to avoid including false positive matches in their datasets.

Citing Articles

Genome structure and evolutionary history of frankincense producing .

Khan A, Al-Harrasi A, Wang J, Asaf S, Riethoven J, Shehzad T iScience. 2022; 25(7):104574.

PMID: 35789857 PMC: 9249616. DOI: 10.1016/j.isci.2022.104574.


CRISPR sequences are sometimes erroneously translated and can contaminate public databases with spurious proteins containing spaced repeats.

Rubio A, Mier P, Andrade-Navarro M, Garzon A, Jimenez J, Perez-Pulido A Database (Oxford). 2020; 2020.

PMID: 33206958 PMC: 7673337. DOI: 10.1093/database/baaa088.


The Ribosome as a Missing Link in Prebiotic Evolution III: Over-Representation of tRNA- and rRNA-Like Sequences and Plieofunctionality of Ribosome-Related Molecules Argues for the Evolution of Primitive Genomes from Ribosomal RNA Modules.

Root-Bernstein R, Root-Bernstein M Int J Mol Sci. 2019; 20(1).

PMID: 30609737 PMC: 6337102. DOI: 10.3390/ijms20010140.


A database of metazoan cytochrome c oxidase subunit I gene sequences derived from GenBank with CO-ARBitrator.

Heller P, Casaletto J, Ruiz G, Geller J Sci Data. 2018; 5:180156.

PMID: 30084847 PMC: 6080493. DOI: 10.1038/sdata.2018.156.


Gene Unprediction with Spurio: A tool to identify spurious protein sequences.

Hops W, Jeffryes M, Bateman A F1000Res. 2018; 7:261.

PMID: 29721311 PMC: 5897793. DOI: 10.12688/f1000research.14050.1.


References
1.
KROGH A, Mian I, Haussler D . A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res. 1994; 22(22):4768-78. PMC: 308529. DOI: 10.1093/nar/22.22.4768. View

2.
Scharf M, Wu-Scharf D, Zhou X, Pittendrigh B, Bennett G . Gene expression profiles among immature and adult reproductive castes of the termite Reticulitermes flavipes. Insect Mol Biol. 2005; 14(1):31-44. DOI: 10.1111/j.1365-2583.2004.00527.x. View

3.
Kermekchiev M, Ivanova L . Ribin, a protein encoded by a message complementary to rRNA, modulates ribosomal transcription and cell proliferation. Mol Cell Biol. 2001; 21(24):8255-63. PMC: 99991. DOI: 10.1128/MCB.21.24.8255-8263.2001. View

4.
Mitschke J, Georg J, Scholz I, Sharma C, Dienst D, Bantscheff J . An experimentally anchored map of transcriptional start sites in the model cyanobacterium Synechocystis sp. PCC6803. Proc Natl Acad Sci U S A. 2011; 108(5):2124-9. PMC: 3033270. DOI: 10.1073/pnas.1015154108. View

5.
Huson D, Auch A, Qi J, Schuster S . MEGAN analysis of metagenomic data. Genome Res. 2007; 17(3):377-86. PMC: 1800929. DOI: 10.1101/gr.5969107. View