Computational Discovery and Annotation of Conserved Small Open Reading Frames in Fungal Genomes
Overview
Affiliations
Background: Small open reading frames (smORF/sORFs) that encode short protein sequences are often overlooked during the standard gene prediction process thus leading to many sORFs being left undiscovered and/or misannotated. For many genomes, a second round of sORF targeted gene prediction can complement the existing annotation. In this study, we specifically targeted the identification of ORFs encoding for 80 amino acid residues or less from 31 fungal genomes. We then compared the predicted sORFs and analysed those that are highly conserved among the genomes.
Results: A first set of sORFs was identified from existing annotations that fitted the maximum of 80 residues criterion. A second set was predicted using parameters that specifically searched for ORF candidates of 80 codons or less in the exonic, intronic and intergenic sequences of the subject genomes. A total of 1986 conserved sORFs were predicted and characterized.
Conclusions: It is evident that numerous open reading frames that could potentially encode for polypeptides consisting of 80 amino acid residues or less are overlooked during standard gene prediction and annotation. From our results, additional targeted reannotation of genomes is clearly able to complement standard genome annotation to identify sORFs. Due to the lack of, and limitations with experimental validation, we propose that a simple conservation analysis can provide an acceptable means of ensuring that the predicted sORFs are sufficiently clear of gene prediction artefacts.
Discovering the hidden function in fungal genomes.
Gervais N, Shapiro R Nat Commun. 2024; 15(1):8219.
PMID: 39300175 PMC: 11413187. DOI: 10.1038/s41467-024-52568-z.
Gray J, Torres V, Goodall E, McKeand S, Scales D, Collins C Elife. 2024; 12.
PMID: 39189918 PMC: 11349299. DOI: 10.7554/eLife.88971.
Foldi C, Merenyi Z, Balazs B, Csernetics A, Miklovics N, Wu H mSystems. 2024; 9(3):e0120823.
PMID: 38334416 PMC: 10949477. DOI: 10.1128/msystems.01208-23.
Exploring microproteins from various model organisms using the mip-mining database.
Zhao B, Zhao J, Wang M, Guo Y, Mehmood A, Wang W BMC Genomics. 2023; 24(1):661.
PMID: 37919660 PMC: 10623795. DOI: 10.1186/s12864-023-09735-1.
Pervasive translation of small open reading frames in plant long non-coding RNAs.
Sruthi K, Menon A, P A, Soniya E Front Plant Sci. 2022; 13:975938.
PMID: 36352887 PMC: 9638090. DOI: 10.3389/fpls.2022.975938.