Automated Prediction and Annotation of Small Open Reading Frames in Microbial Genomes
Overview
Microbiology
Authors
Affiliations
Small open reading frames (smORFs) and their encoded microproteins play central roles in microbes. However, there is a vast unexplored space of smORFs within human-associated microbes. A recent bioinformatic analysis used evolutionary conservation signals to enhance prediction of small protein families. To facilitate the annotation of specific smORFs, we introduce SmORFinder. This tool combines profile hidden Markov models of each smORF family and deep learning models that better generalize to smORF families not seen in the training set, resulting in predictions enriched for Ribo-seq translation signals. Feature importance analysis reveals that the deep learning models learn to identify Shine-Dalgarno sequences, deprioritize the wobble position in each codon, and group codon synonyms found in the codon table. A core-genome analysis of 26 bacterial species identifies several core smORFs of unknown function. We pre-compute smORF annotations for thousands of RefSeq isolate genomes and Human Microbiome Project metagenomes and provide these data through a public web portal.
The hidden bacterial microproteome.
Fesenko I, Sahakyan H, Dhyani R, Shabalina S, Storz G, Koonin E Mol Cell. 2025; 85(5):1024-1041.e6.
PMID: 39978337 PMC: 11890958. DOI: 10.1016/j.molcel.2025.01.025.
Dual quorum-sensing control of purine biosynthesis drives pathogenic fitness of .
Zlitni S, Bowden S, Sberro H, Torres M, Vaughan J, Pinto A bioRxiv. 2024; .
PMID: 39185165 PMC: 11343167. DOI: 10.1101/2024.08.13.607696.
Origins of Life: The Protein Folding Problem all over again?.
Kocher C, Dill K Proc Natl Acad Sci U S A. 2024; 121(34):e2315000121.
PMID: 39133848 PMC: 11348307. DOI: 10.1073/pnas.2315000121.
PSPI: A deep learning approach for prokaryotic small protein identification.
Weston M, Hu H, Li X Front Genet. 2024; 15:1439423.
PMID: 39050248 PMC: 11266045. DOI: 10.3389/fgene.2024.1439423.
A survey of experimental and computational identification of small proteins.
Beals J, Hu H, Li X Brief Bioinform. 2024; 25(4).
PMID: 39007598 PMC: 11247407. DOI: 10.1093/bib/bbae345.