MSAProbs: Multiple Sequence Alignment Based on Pair Hidden Markov Models and Partition Function Posterior Probabilities
Overview
Affiliations
Motivation: Multiple sequence alignment is of central importance to bioinformatics and computational biology. Although a large number of algorithms for computing a multiple sequence alignment have been designed, the efficient computation of highly accurate multiple alignments is still a challenge.
Results: We present MSAProbs, a new and practical multiple alignment algorithm for protein sequences. The design of MSAProbs is based on a combination of pair hidden Markov models and partition functions to calculate posterior probabilities. Furthermore, two critical bioinformatics techniques, namely weighted probabilistic consistency transformation and weighted profile-profile alignment, are incorporated to improve alignment accuracy. Assessed using the popular benchmarks: BAliBASE, PREFAB, SABmark and OXBENCH, MSAProbs achieves statistically significant accuracy improvements over the existing top performing aligners, including ClustalW, MAFFT, MUSCLE, ProbCons and Probalign. Furthermore, MSAProbs is optimized for multi-core CPUs by employing a multi-threaded design, leading to a competitive execution time compared to other aligners.
Availability: The source code of MSAProbs, written in C++, is freely and publicly available from http://msaprobs.sourceforge.net.
Stejskalova C, Arrigoni F, Albanesi R, Bertini L, Mollica L, Coscia F J Biol Chem. 2024; 301(1):108026.
PMID: 39608720 PMC: 11730217. DOI: 10.1016/j.jbc.2024.108026.
The hagfish genome and the evolution of vertebrates.
Marletaz F, Timoshevskaya N, Timoshevskiy V, Parey E, Simakov O, Gavriouchkina D Nature. 2024; 627(8005):811-820.
PMID: 38262590 PMC: 10972751. DOI: 10.1038/s41586-024-07070-3.
Liu Y, Yuan H, Zhang Q, Wang Z, Xiong S, Wen N Bioinformatics. 2023; 39(11).
PMID: 37856335 PMC: 10628385. DOI: 10.1093/bioinformatics/btad636.
Marletaz F, Couloux A, Poulain J, Labadie K, Da Silva C, Mangenot S Cell Genom. 2023; 3(4):100295.
PMID: 37082140 PMC: 10112332. DOI: 10.1016/j.xgen.2023.100295.
Kinateder T, Drexler L, Straub K, Merkl R, Sterner R Protein Sci. 2022; 32(1):e4536.
PMID: 36502290 PMC: 9798254. DOI: 10.1002/pro.4536.