» Articles » PMID: 20384985

FLU, an Amino Acid Substitution Model for Influenza Proteins

Overview
Journal BMC Evol Biol
Publisher Biomed Central
Specialty Biology
Date 2010 Apr 14
PMID 20384985
Citations 28
Authors
Affiliations
Soon will be listed here.
Abstract

Background: The amino acid substitution model is the core component of many protein analysis systems such as sequence similarity search, sequence alignment, and phylogenetic inference. Although several general amino acid substitution models have been estimated from large and diverse protein databases, they remain inappropriate for analyzing specific species, e.g., viruses. Emerging epidemics of influenza viruses raise the need for comprehensive studies of these dangerous viruses. We propose an influenza-specific amino acid substitution model to enhance the understanding of the evolution of influenza viruses.

Results: A maximum likelihood approach was applied to estimate an amino acid substitution model (FLU) from approximately 113,000 influenza protein sequences, consisting of approximately 20 million residues. FLU outperforms 14 widely used models in constructing maximum likelihood phylogenetic trees for the majority of influenza protein alignments. On average, FLU gains approximately 42 log likelihood points with an alignment of 300 sites. Moreover, topologies of trees constructed using FLU and other models are frequently different. FLU does indeed have an impact on likelihood improvement as well as tree topologies. It was implemented in PhyML and can be downloaded from ftp://ftp.sanger.ac.uk/pub/1000genomes/lsq/FLU or included in PhyML 3.0 server at http://www.atgc-montpellier.fr/phyml/.

Conclusions: FLU should be useful for any influenza protein analysis system which requires an accurate description of amino acid substitutions.

Citing Articles

Evolutionary Insights from Association Rule Mining of Co-Occurring Mutations in Influenza Hemagglutinin and Neuraminidase.

Galeone V, Lee C, Monaghan M, Bauer D, Wilson L Viruses. 2024; 16(10).

PMID: 39459850 PMC: 11512220. DOI: 10.3390/v16101515.


A Guide to Phylogenomic Inference.

Patane J, Martins Jr J, Setubal J Methods Mol Biol. 2024; 2802:267-345.

PMID: 38819564 DOI: 10.1007/978-1-0716-3838-5_11.


Novel symmetry-preserving neural network model for phylogenetic inference.

Tang X, Zepeda-Nunez L, Yang S, Zhao Z, Solis-Lemus C Bioinform Adv. 2024; 4(1):vbae022.

PMID: 38638281 PMC: 11026143. DOI: 10.1093/bioadv/vbae022.


Selection among site-dependent structurally constrained substitution models of protein evolution by approximate Bayesian computation.

Ferreiro D, Branco C, Arenas M Bioinformatics. 2024; 40(3).

PMID: 38374231 PMC: 10914458. DOI: 10.1093/bioinformatics/btae096.


Virus Pop-Expanding Viral Databases by Protein Sequence Simulation.

Kende J, Bonomi M, Temmam S, Regnault B, Perot P, Eloit M Viruses. 2023; 15(6).

PMID: 37376527 PMC: 10304111. DOI: 10.3390/v15061227.


References
1.
Churchill G, Von Haeseler A, Navidi W . Sample size for a phylogenetic inference. Mol Biol Evol. 1992; 9(4):753-69. DOI: 10.1093/oxfordjournals.molbev.a040757. View

2.
Le S, Gascuel O . An improved general amino acid replacement matrix. Mol Biol Evol. 2008; 25(7):1307-20. DOI: 10.1093/molbev/msn067. View

3.
Klosterman P, Uzilov A, Bendana Y, Bradley R, Chao S, Kosiol C . XRate: a fast prototyping, training and annotation tool for phylo-grammars. BMC Bioinformatics. 2006; 7:428. PMC: 1622757. DOI: 10.1186/1471-2105-7-428. View

4.
Fauci A . Race against time. Nature. 2005; 435(7041):423-4. DOI: 10.1038/435423a. View

5.
Liu K, Raghavan S, Nelesen S, Linder C, Warnow T . Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 2009; 324(5934):1561-4. DOI: 10.1126/science.1171243. View