» Articles » PMID: 30717662

Computational Discovery and Annotation of Conserved Small Open Reading Frames in Fungal Genomes

Overview
Publisher Biomed Central
Specialty Biology
Date 2019 Feb 6
PMID 30717662
Citations 12
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Small open reading frames (smORF/sORFs) that encode short protein sequences are often overlooked during the standard gene prediction process thus leading to many sORFs being left undiscovered and/or misannotated. For many genomes, a second round of sORF targeted gene prediction can complement the existing annotation. In this study, we specifically targeted the identification of ORFs encoding for 80 amino acid residues or less from 31 fungal genomes. We then compared the predicted sORFs and analysed those that are highly conserved among the genomes.

Results: A first set of sORFs was identified from existing annotations that fitted the maximum of 80 residues criterion. A second set was predicted using parameters that specifically searched for ORF candidates of 80 codons or less in the exonic, intronic and intergenic sequences of the subject genomes. A total of 1986 conserved sORFs were predicted and characterized.

Conclusions: It is evident that numerous open reading frames that could potentially encode for polypeptides consisting of 80 amino acid residues or less are overlooked during standard gene prediction and annotation. From our results, additional targeted reannotation of genomes is clearly able to complement standard genome annotation to identify sORFs. Due to the lack of, and limitations with experimental validation, we propose that a simple conservation analysis can provide an acceptable means of ensuring that the predicted sORFs are sufficiently clear of gene prediction artefacts.

Citing Articles

Discovering the hidden function in fungal genomes.

Gervais N, Shapiro R Nat Commun. 2024; 15(1):8219.

PMID: 39300175 PMC: 11413187. DOI: 10.1038/s41467-024-52568-z.


Transposon mutagenesis screen in identifies genetic determinants required for growth in human urine and serum.

Gray J, Torres V, Goodall E, McKeand S, Scales D, Collins C Elife. 2024; 12.

PMID: 39189918 PMC: 11349299. DOI: 10.7554/eLife.88971.


Snowball: a novel gene family required for developmental patterning of fruiting bodies of mushroom-forming fungi (Agaricomycetes).

Foldi C, Merenyi Z, Balazs B, Csernetics A, Miklovics N, Wu H mSystems. 2024; 9(3):e0120823.

PMID: 38334416 PMC: 10949477. DOI: 10.1128/msystems.01208-23.


Exploring microproteins from various model organisms using the mip-mining database.

Zhao B, Zhao J, Wang M, Guo Y, Mehmood A, Wang W BMC Genomics. 2023; 24(1):661.

PMID: 37919660 PMC: 10623795. DOI: 10.1186/s12864-023-09735-1.


Pervasive translation of small open reading frames in plant long non-coding RNAs.

Sruthi K, Menon A, P A, Soniya E Front Plant Sci. 2022; 13:975938.

PMID: 36352887 PMC: 9638090. DOI: 10.3389/fpls.2022.975938.


References
1.
Fu L, Niu B, Zhu Z, Wu S, Li W . CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012; 28(23):3150-2. PMC: 3516142. DOI: 10.1093/bioinformatics/bts565. View

2.
Pruitt K, Tatusova T, Maglott D . NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2006; 35(Database issue):D61-5. PMC: 1716718. DOI: 10.1093/nar/gkl842. View

3.
Aparicio G, Gotz S, Conesa A, Segrelles D, Blanquer I, Garcia J . Blast2GO goes grid: developing a grid-enabled prototype for functional genomics analysis. Stud Health Technol Inform. 2006; 120:194-204. View

4.
Firdaus-Raih M, Fazlin Hashim N, Bharudin I, Abu Bakar M, Huang K, Alias H . The Glaciozyma antarctica genome reveals an array of systems that provide sustained responses towards temperature variations in a persistently cold habitat. PLoS One. 2018; 13(1):e0189947. PMC: 5791967. DOI: 10.1371/journal.pone.0189947. View

5.
Hemm M, Paul B, Schneider T, Storz G, Rudd K . Small membrane proteins found by comparative genomics and ribosome binding site models. Mol Microbiol. 2009; 70(6):1487-501. PMC: 2614699. DOI: 10.1111/j.1365-2958.2008.06495.x. View