» Articles » PMID: 33290720

Automated Prediction and Annotation of Small Open Reading Frames in Microbial Genomes

Overview
Publisher Cell Press
Date 2020 Dec 8
PMID 33290720
Citations 18
Authors
Affiliations
Soon will be listed here.
Abstract

Small open reading frames (smORFs) and their encoded microproteins play central roles in microbes. However, there is a vast unexplored space of smORFs within human-associated microbes. A recent bioinformatic analysis used evolutionary conservation signals to enhance prediction of small protein families. To facilitate the annotation of specific smORFs, we introduce SmORFinder. This tool combines profile hidden Markov models of each smORF family and deep learning models that better generalize to smORF families not seen in the training set, resulting in predictions enriched for Ribo-seq translation signals. Feature importance analysis reveals that the deep learning models learn to identify Shine-Dalgarno sequences, deprioritize the wobble position in each codon, and group codon synonyms found in the codon table. A core-genome analysis of 26 bacterial species identifies several core smORFs of unknown function. We pre-compute smORF annotations for thousands of RefSeq isolate genomes and Human Microbiome Project metagenomes and provide these data through a public web portal.

Citing Articles

The hidden bacterial microproteome.

Fesenko I, Sahakyan H, Dhyani R, Shabalina S, Storz G, Koonin E Mol Cell. 2025; 85(5):1024-1041.e6.

PMID: 39978337 PMC: 11890958. DOI: 10.1016/j.molcel.2025.01.025.


Dual quorum-sensing control of purine biosynthesis drives pathogenic fitness of .

Zlitni S, Bowden S, Sberro H, Torres M, Vaughan J, Pinto A bioRxiv. 2024; .

PMID: 39185165 PMC: 11343167. DOI: 10.1101/2024.08.13.607696.


Origins of Life: The Protein Folding Problem all over again?.

Kocher C, Dill K Proc Natl Acad Sci U S A. 2024; 121(34):e2315000121.

PMID: 39133848 PMC: 11348307. DOI: 10.1073/pnas.2315000121.


PSPI: A deep learning approach for prokaryotic small protein identification.

Weston M, Hu H, Li X Front Genet. 2024; 15:1439423.

PMID: 39050248 PMC: 11266045. DOI: 10.3389/fgene.2024.1439423.


A survey of experimental and computational identification of small proteins.

Beals J, Hu H, Li X Brief Bioinform. 2024; 25(4).

PMID: 39007598 PMC: 11247407. DOI: 10.1093/bib/bbae345.


References
1.
Zou J, Huss M, Abid A, Mohammadi P, Torkamani A, Telenti A . A primer on deep learning in genomics. Nat Genet. 2018; 51(1):12-18. PMC: 11180539. DOI: 10.1038/s41588-018-0295-5. View

2.
Lu S, Wang J, Chitsaz F, Derbyshire M, Geer R, Gonzales N . CDD/SPARCLE: the conserved domain database in 2020. Nucleic Acids Res. 2019; 48(D1):D265-D268. PMC: 6943070. DOI: 10.1093/nar/gkz991. View

3.
Makarewich C, Baskin K, Munir A, Bezprozvannaya S, Sharma G, Khemtong C . MOXI Is a Mitochondrial Micropeptide That Enhances Fatty Acid β-Oxidation. Cell Rep. 2018; 23(13):3701-3709. PMC: 6066340. DOI: 10.1016/j.celrep.2018.05.058. View

4.
Storz G, Wolf Y, Ramamurthi K . Small proteins can no longer be ignored. Annu Rev Biochem. 2014; 83:753-77. PMC: 4166647. DOI: 10.1146/annurev-biochem-070611-102400. View

5.
Pinel-Marie M, Brielle R, Felden B . Dual toxic-peptide-coding Staphylococcus aureus RNA under antisense regulation targets host cells and bacterial rivals unequally. Cell Rep. 2014; 7(2):424-435. DOI: 10.1016/j.celrep.2014.03.012. View