A Probabilistic Model for the Evolution of RNA Structure

Overview

Journal BMC Bioinformatics

Publisher Biomed Central

Specialty Biology

Date 2004 Oct 28

PMID 15507142

Citations 26

Authors

Ian Holmes

Affiliations

Soon will be listed here.

Abstract

Background: For the purposes of finding and aligning noncoding RNA gene- and cis-regulatory elements in multiple-genome datasets, it is useful to be able to derive multi-sequence stochastic grammars (and hence multiple alignment algorithms) systematically, starting from hypotheses about the various kinds of random mutation event and their rates.

Results: Here, we consider a highly simplified evolutionary model for RNA, called "The TKF91 Structure Tree" (following Thorne, Kishino and Felsenstein's 1991 model of sequence evolution with indels), which we have implemented for pairwise alignment as proof of principle for such an approach. The model, its strengths and its weaknesses are discussed with reference to four examples of functional ncRNA sequences: a riboswitch (guanine), a zipcode (nanos), a splicing factor (U4) and a ribozyme (RNase P). As shown by our visualisations of posterior probability matrices, the selected examples illustrate three different signatures of natural selection that are highly characteristic of ncRNA: (i) co-ordinated basepair substitutions, (ii) co-ordinated basepair indels and (iii) whole-stem indels.

Conclusions: Although all three types of mutation "event" are built into our model, events of type (i) and (ii) are found to be better modeled than events of type (iii). Nevertheless, we hypothesise from the model's performance on pairwise alignments that it would form an adequate basis for a prototype multiple alignment and genefinding tool.

Citing Articles

Median and small parsimony problems on RNA trees.

Marchand B, Anselmetti Y, Lafond M, Ouangraoua A Bioinformatics. 2024; 40(Suppl 1):i237-i246.

PMID: 38940169 PMC: 11256950. DOI: 10.1093/bioinformatics/btae229.

EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM.

Lim D, Blanchette M Bioinformatics. 2020; 36(Suppl_1):i353-i361.

PMID: 32657367 PMC: 7355264. DOI: 10.1093/bioinformatics/btaa447.

Solving the master equation for Indels.

Holmes I BMC Bioinformatics. 2017; 18(1):255.

PMID: 28494756 PMC: 5427538. DOI: 10.1186/s12859-017-1665-1.

Dynalign II: common secondary structure prediction for RNA homologs with domain insertions.

Fu Y, Sharma G, Mathews D Nucleic Acids Res. 2014; 42(22):13939-48.

PMID: 25416799 PMC: 4267632. DOI: 10.1093/nar/gku1172.

Genome-wide transcriptome analysis shows extensive alternative RNA splicing in the zoonotic parasite Schistosoma japonicum.

Piao X, Hou N, Cai P, Liu S, Wu C, Chen Q BMC Genomics. 2014; 15:715.

PMID: 25156522 PMC: 4203478. DOI: 10.1186/1471-2164-15-715.

References

Rivas E, Eddy S . Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics. 2002; 2:8. PMC: 64605. DOI: 10.1186/1471-2105-2-8. View

Miklos I, Lunter G, Holmes I . A "Long Indel" model for evolutionary sequence alignment. Mol Biol Evol. 2003; 21(3):529-40. DOI: 10.1093/molbev/msh043. View

Knudsen B, Hein J . RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics. 1999; 15(6):446-54. DOI: 10.1093/bioinformatics/15.6.446. View

Klosterman P, Hendrix D, Tamura M, Holbrook S, Brenner S . Three-dimensional motifs from the SCOR, structural classification of RNA database: extruded strands, base triples, tetraloops and U-turns. Nucleic Acids Res. 2004; 32(8):2342-52. PMC: 419439. DOI: 10.1093/nar/gkh537. View

Gorodkin J, Heyer L, Stormo G . Finding the most significant common sequence and structure motifs in a set of RNA sequences. Nucleic Acids Res. 1997; 25(18):3724-32. PMC: 146942. DOI: 10.1093/nar/25.18.3724. View

Holmes I, Bruno W . Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics. 2001; 17(9):803-20. DOI: 10.1093/bioinformatics/17.9.803. View

Pedersen J, Hein J . Gene finding with a hidden Markov model of genome structure and evolution. Bioinformatics. 2003; 19(2):219-27. DOI: 10.1093/bioinformatics/19.2.219. View

Siepel A, Haussler D . Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol Biol Evol. 2003; 21(3):468-88. DOI: 10.1093/molbev/msh039. View

Thorne J, Kishino H, Felsenstein J . Inching toward reality: an improved likelihood model of sequence evolution. J Mol Evol. 1992; 34(1):3-16. DOI: 10.1007/BF00163848. View

10.

Yang Z . Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol Biol Evol. 1993; 10(6):1396-401. DOI: 10.1093/oxfordjournals.molbev.a040082. View

11.

Klein R, Eddy S . RSEARCH: finding homologs of single structured RNA sequences. BMC Bioinformatics. 2003; 4:44. PMC: 239859. DOI: 10.1186/1471-2105-4-44. View

12.

Henikoff S, Henikoff J . Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992; 89(22):10915-9. PMC: 50453. DOI: 10.1073/pnas.89.22.10915. View

13.

Bruno W, Halpern A . Topological bias and inconsistency of maximum likelihood using wrong models. Mol Biol Evol. 1999; 16(4):564-6. DOI: 10.1093/oxfordjournals.molbev.a026137. View

14.

Thorne J, Kishino H, Felsenstein J . An evolutionary model for maximum likelihood alignment of DNA sequences. J Mol Evol. 1991; 33(2):114-24. DOI: 10.1007/BF02193625. View

15.

Mathews D, Turner D . Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J Mol Biol. 2002; 317(2):191-203. DOI: 10.1006/jmbi.2001.5351. View

16.

Klosterman P, Tamura M, Holbrook S, Brenner S . SCOR: a Structural Classification of RNA database. Nucleic Acids Res. 2001; 30(1):392-4. PMC: 99131. DOI: 10.1093/nar/30.1.392. View

17.

Wu H, Henras A, Chanfreau G, Feigon J . Structural basis for recognition of the AGNN tetraloop RNA fold by the double-stranded RNA-binding domain of Rnt1p RNase III. Proc Natl Acad Sci U S A. 2004; 101(22):8307-12. PMC: 420390. DOI: 10.1073/pnas.0402627101. View

18.

Frank D, Adamidi C, Ehringer M, Pitulle C, Pace N . Phylogenetic-comparative analysis of the eukaryal ribonuclease P RNA. RNA. 2001; 6(12):1895-904. PMC: 1370057. DOI: 10.1017/s1355838200001461. View

19.

Holmes I, Rubin G . Pairwise RNA structure comparison with stochastic context-free grammars. Pac Symp Biocomput. 2002; :163-74. DOI: 10.1142/9789812799623_0016. View

20.

Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy S . Rfam: an RNA family database. Nucleic Acids Res. 2003; 31(1):439-41. PMC: 165453. DOI: 10.1093/nar/gkg006. View