FLU, an Amino Acid Substitution Model for Influenza Proteins

Overview

Journal BMC Evol Biol

Publisher Biomed Central

Specialty Biology

Date 2010 Apr 14

PMID 20384985

Citations 28

Authors

Cuong Cao Dang

Quang Si Le

Olivier Gascuel

Vinh Sy Le

Affiliations

Soon will be listed here.

Abstract

Background: The amino acid substitution model is the core component of many protein analysis systems such as sequence similarity search, sequence alignment, and phylogenetic inference. Although several general amino acid substitution models have been estimated from large and diverse protein databases, they remain inappropriate for analyzing specific species, e.g., viruses. Emerging epidemics of influenza viruses raise the need for comprehensive studies of these dangerous viruses. We propose an influenza-specific amino acid substitution model to enhance the understanding of the evolution of influenza viruses.

Results: A maximum likelihood approach was applied to estimate an amino acid substitution model (FLU) from approximately 113,000 influenza protein sequences, consisting of approximately 20 million residues. FLU outperforms 14 widely used models in constructing maximum likelihood phylogenetic trees for the majority of influenza protein alignments. On average, FLU gains approximately 42 log likelihood points with an alignment of 300 sites. Moreover, topologies of trees constructed using FLU and other models are frequently different. FLU does indeed have an impact on likelihood improvement as well as tree topologies. It was implemented in PhyML and can be downloaded from ftp://ftp.sanger.ac.uk/pub/1000genomes/lsq/FLU or included in PhyML 3.0 server at http://www.atgc-montpellier.fr/phyml/.

Conclusions: FLU should be useful for any influenza protein analysis system which requires an accurate description of amino acid substitutions.

Citing Articles

Evolutionary Insights from Association Rule Mining of Co-Occurring Mutations in Influenza Hemagglutinin and Neuraminidase.

Galeone V, Lee C, Monaghan M, Bauer D, Wilson L Viruses. 2024; 16(10).

PMID: 39459850 PMC: 11512220. DOI: 10.3390/v16101515.

A Guide to Phylogenomic Inference.

Patane J, Martins Jr J, Setubal J Methods Mol Biol. 2024; 2802:267-345.

PMID: 38819564 DOI: 10.1007/978-1-0716-3838-5_11.

Novel symmetry-preserving neural network model for phylogenetic inference.

Tang X, Zepeda-Nunez L, Yang S, Zhao Z, Solis-Lemus C Bioinform Adv. 2024; 4(1):vbae022.

PMID: 38638281 PMC: 11026143. DOI: 10.1093/bioadv/vbae022.

Selection among site-dependent structurally constrained substitution models of protein evolution by approximate Bayesian computation.

Ferreiro D, Branco C, Arenas M Bioinformatics. 2024; 40(3).

PMID: 38374231 PMC: 10914458. DOI: 10.1093/bioinformatics/btae096.

Virus Pop-Expanding Viral Databases by Protein Sequence Simulation.

Kende J, Bonomi M, Temmam S, Regnault B, Perot P, Eloit M Viruses. 2023; 15(6).

PMID: 37376527 PMC: 10304111. DOI: 10.3390/v15061227.

References

Churchill G, Von Haeseler A, Navidi W . Sample size for a phylogenetic inference. Mol Biol Evol. 1992; 9(4):753-69. DOI: 10.1093/oxfordjournals.molbev.a040757. View

Le S, Gascuel O . An improved general amino acid replacement matrix. Mol Biol Evol. 2008; 25(7):1307-20. DOI: 10.1093/molbev/msn067. View

Klosterman P, Uzilov A, Bendana Y, Bradley R, Chao S, Kosiol C . XRate: a fast prototyping, training and annotation tool for phylo-grammars. BMC Bioinformatics. 2006; 7:428. PMC: 1622757. DOI: 10.1186/1471-2105-7-428. View

Fauci A . Race against time. Nature. 2005; 435(7041):423-4. DOI: 10.1038/435423a. View

Liu K, Raghavan S, Nelesen S, Linder C, Warnow T . Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 2009; 324(5934):1561-4. DOI: 10.1126/science.1171243. View

Nguyen T, Nguyen T, Vijaykrishna D, Webster R, Guan Y, Peiris J . Multiple sublineages of influenza A virus (H5N1), Vietnam, 2005-2007. Emerg Infect Dis. 2008; 14(4):632-6. PMC: 2570938. DOI: 10.3201/eid1404.071343. View

Whelan S, Goldman N . A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol. 2001; 18(5):691-9. DOI: 10.1093/oxfordjournals.molbev.a003851. View

Ghedin E, Sengamalay N, Shumway M, Zaborsky J, Feldblyum T, Subbu V . Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution. Nature. 2005; 437(7062):1162-6. DOI: 10.1038/nature04239. View

Boni M, Zhou Y, Taubenberger J, Holmes E . Homologous recombination is very rare or absent in human influenza A virus. J Virol. 2008; 82(10):4807-11. PMC: 2346757. DOI: 10.1128/JVI.02683-07. View

10.

Yang Z . Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol Biol Evol. 1993; 10(6):1396-401. DOI: 10.1093/oxfordjournals.molbev.a040082. View

11.

FITCH W, MARGOLIASH E . A method for estimating the number of invariant amino acid coding positions in a gene using cytochrome c as a model case. Biochem Genet. 1967; 1(1):65-71. DOI: 10.1007/BF00487738. View

12.

Edgar R . MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004; 32(5):1792-7. PMC: 390337. DOI: 10.1093/nar/gkh340. View

13.

Bao Y, Bolotov P, Dernovoy D, Kiryutin B, Zaslavsky L, Tatusova T . The influenza virus resource at the National Center for Biotechnology Information. J Virol. 2007; 82(2):596-601. PMC: 2224563. DOI: 10.1128/JVI.02005-07. View

14.

Castresana J . Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000; 17(4):540-52. DOI: 10.1093/oxfordjournals.molbev.a026334. View

15.

Adachi J, Hasegawa M . Model of amino acid substitution in proteins encoded by mitochondrial DNA. J Mol Evol. 1996; 42(4):459-68. DOI: 10.1007/BF02498640. View

16.

Felsenstein J . Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981; 17(6):368-76. DOI: 10.1007/BF01734359. View

17.

Janies D, Hill A, Guralnick R, Habib F, Waltari E, Wheeler W . Genomic analysis and geographic visualization of the spread of avian influenza (H5N1). Syst Biol. 2007; 56(2):321-9. DOI: 10.1080/10635150701266848. View

18.

Goldman N, Anderson J, Rodrigo A . Likelihood-based tests of topologies in phylogenetics. Syst Biol. 2002; 49(4):652-70. DOI: 10.1080/106351500750049752. View

19.

Gu X, Fu Y, Li W . Maximum likelihood estimation of the heterogeneity of substitution rate among nucleotide sites. Mol Biol Evol. 1995; 12(4):546-57. DOI: 10.1093/oxfordjournals.molbev.a040235. View

20.

Kishino H, Hasegawa M . Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea. J Mol Evol. 1989; 29(2):170-9. DOI: 10.1007/BF02100115. View