Noncoding RNA Gene Detection Using Comparative Sequence Analysis
Overview
Affiliations
Background: Noncoding RNA genes produce transcripts that exert their function without ever producing proteins. Noncoding RNA gene sequences do not have strong statistical signals, unlike protein coding genes. A reliable general purpose computational genefinder for noncoding RNA genes has been elusive.
Results: We describe a comparative sequence analysis algorithm for detecting novel structural RNA genes. The key idea is to test the pattern of substitutions observed in a pairwise alignment of two homologous sequences. A conserved coding region tends to show a pattern of synonymous substitutions, whereas a conserved structural RNA tends to show a pattern of compensatory mutations consistent with some base-paired secondary structure. We formalize this intuition using three probabilistic "pair-grammars": a pair stochastic context free grammar modeling alignments constrained by structural RNA evolution, a pair hidden Markov model modeling alignments constrained by coding sequence evolution, and a pair hidden Markov model modeling a null hypothesis of position-independent evolution. Given an input pairwise sequence alignment (e.g. from a BLASTN comparison of two related genomes) we classify the alignment into the coding, RNA, or null class according to the posterior probability of each class.
Conclusions: We have implemented this approach as a program, QRNA, which we consider to be a prototype structural noncoding RNA genefinder. Tests suggest that this approach detects noncoding RNA genes with a fair degree of reliability.
TSS-Captur: a user-friendly pipeline for characterizing unclassified RNA transcripts.
Witte Paz M, Vogel T, Nieselt K NAR Genom Bioinform. 2024; 6(4):lqae168.
PMID: 39703424 PMC: 11655288. DOI: 10.1093/nargab/lqae168.
Backofen R, Gorodkin J, Hofacker I, Stadler P Methods Mol Biol. 2024; 2802:347-393.
PMID: 38819565 DOI: 10.1007/978-1-0716-3838-5_12.
Whole-genome sequencing and evolutionary analysis of the wild edible mushroom, .
Li Y, Yang T, Qiao J, Liang J, Li Z, Sa W Front Microbiol. 2024; 14:1309703.
PMID: 38361578 PMC: 10868677. DOI: 10.3389/fmicb.2023.1309703.
Elkhatib W, Yanez-Guerra L, Mayorova T, Currie M, Singh A, Perera M Commun Biol. 2023; 6(1):951.
PMID: 37723223 PMC: 10507113. DOI: 10.1038/s42003-023-05312-0.
Quantification of the Diversity in Gene Structures Using the Principles of Polarization Mapping.
Zimnyakov D, Alonova M, Skripal A, Dobdin S, Feodorova V Curr Issues Mol Biol. 2023; 45(2):1720-1740.
PMID: 36826056 PMC: 9955201. DOI: 10.3390/cimb45020111.