» Articles » PMID: 15507142

A Probabilistic Model for the Evolution of RNA Structure

Overview
Publisher Biomed Central
Specialty Biology
Date 2004 Oct 28
PMID 15507142
Citations 26
Authors
Affiliations
Soon will be listed here.
Abstract

Background: For the purposes of finding and aligning noncoding RNA gene- and cis-regulatory elements in multiple-genome datasets, it is useful to be able to derive multi-sequence stochastic grammars (and hence multiple alignment algorithms) systematically, starting from hypotheses about the various kinds of random mutation event and their rates.

Results: Here, we consider a highly simplified evolutionary model for RNA, called "The TKF91 Structure Tree" (following Thorne, Kishino and Felsenstein's 1991 model of sequence evolution with indels), which we have implemented for pairwise alignment as proof of principle for such an approach. The model, its strengths and its weaknesses are discussed with reference to four examples of functional ncRNA sequences: a riboswitch (guanine), a zipcode (nanos), a splicing factor (U4) and a ribozyme (RNase P). As shown by our visualisations of posterior probability matrices, the selected examples illustrate three different signatures of natural selection that are highly characteristic of ncRNA: (i) co-ordinated basepair substitutions, (ii) co-ordinated basepair indels and (iii) whole-stem indels.

Conclusions: Although all three types of mutation "event" are built into our model, events of type (i) and (ii) are found to be better modeled than events of type (iii). Nevertheless, we hypothesise from the model's performance on pairwise alignments that it would form an adequate basis for a prototype multiple alignment and genefinding tool.

Citing Articles

Median and small parsimony problems on RNA trees.

Marchand B, Anselmetti Y, Lafond M, Ouangraoua A Bioinformatics. 2024; 40(Suppl 1):i237-i246.

PMID: 38940169 PMC: 11256950. DOI: 10.1093/bioinformatics/btae229.


EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM.

Lim D, Blanchette M Bioinformatics. 2020; 36(Suppl_1):i353-i361.

PMID: 32657367 PMC: 7355264. DOI: 10.1093/bioinformatics/btaa447.


Solving the master equation for Indels.

Holmes I BMC Bioinformatics. 2017; 18(1):255.

PMID: 28494756 PMC: 5427538. DOI: 10.1186/s12859-017-1665-1.


Dynalign II: common secondary structure prediction for RNA homologs with domain insertions.

Fu Y, Sharma G, Mathews D Nucleic Acids Res. 2014; 42(22):13939-48.

PMID: 25416799 PMC: 4267632. DOI: 10.1093/nar/gku1172.


Genome-wide transcriptome analysis shows extensive alternative RNA splicing in the zoonotic parasite Schistosoma japonicum.

Piao X, Hou N, Cai P, Liu S, Wu C, Chen Q BMC Genomics. 2014; 15:715.

PMID: 25156522 PMC: 4203478. DOI: 10.1186/1471-2164-15-715.


References
1.
Rivas E, Eddy S . Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics. 2002; 2:8. PMC: 64605. DOI: 10.1186/1471-2105-2-8. View

2.
Miklos I, Lunter G, Holmes I . A "Long Indel" model for evolutionary sequence alignment. Mol Biol Evol. 2003; 21(3):529-40. DOI: 10.1093/molbev/msh043. View

3.
Knudsen B, Hein J . RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics. 1999; 15(6):446-54. DOI: 10.1093/bioinformatics/15.6.446. View

4.
Klosterman P, Hendrix D, Tamura M, Holbrook S, Brenner S . Three-dimensional motifs from the SCOR, structural classification of RNA database: extruded strands, base triples, tetraloops and U-turns. Nucleic Acids Res. 2004; 32(8):2342-52. PMC: 419439. DOI: 10.1093/nar/gkh537. View

5.
Gorodkin J, Heyer L, Stormo G . Finding the most significant common sequence and structure motifs in a set of RNA sequences. Nucleic Acids Res. 1997; 25(18):3724-32. PMC: 146942. DOI: 10.1093/nar/25.18.3724. View