» Articles » PMID: 28139687

QuickProbs 2: Towards Rapid Construction of High-quality Alignments of Large Protein Families

Overview
Journal Sci Rep
Specialty Science
Date 2017 Feb 1
PMID 28139687
Citations 6
Authors
Affiliations
Soon will be listed here.
Abstract

The ever-increasing size of sequence databases caused by the development of high throughput sequencing, poses to multiple alignment algorithms one of the greatest challenges yet. As we show, well-established techniques employed for increasing alignment quality, i.e., refinement and consistency, are ineffective when large protein families are investigated. We present QuickProbs 2, an algorithm for multiple sequence alignment. Based on probabilistic models, equipped with novel column-oriented refinement and selective consistency, it offers outstanding accuracy. When analysing hundreds of sequences, Quick-Probs 2 is noticeably better than ClustalΩ and MAFFT, the previous leaders for processing numerous protein families. In the case of smaller sets, for which consistency-based methods are the best performing, QuickProbs 2 is also superior to the competitors. Due to low computational requirements of selective consistency and utilization of massively parallel architectures, presented algorithm has similar execution times to ClustalΩ, and is orders of magnitude faster than full consistency approaches, like MSAProbs or PicXAA. All these make QuickProbs 2 an excellent tool for aligning families ranging from few, to hundreds of proteins.

Citing Articles

DNA binding and RAD51 engagement by the BRCA2 C-terminus orchestrate DNA repair and replication fork preservation.

Kwon Y, Rosner H, Zhao W, Selemenakis P, He Z, Kawale A Nat Commun. 2023; 14(1):432.

PMID: 36702902 PMC: 9879961. DOI: 10.1038/s41467-023-36211-x.


Spotlight on alternative frame coding: Two long overlapping genes in are translated and under purifying selection.

Kreitmeier M, Ardern Z, Abele M, Ludwig C, Scherer S, Neuhaus K iScience. 2022; 25(2):103844.

PMID: 35198897 PMC: 8850804. DOI: 10.1016/j.isci.2022.103844.


Ecological diversification reveals routes of pathogen emergence in endemic populations.

Lopez-Perez M, Jayakumar J, Grant T, Zaragoza-Solas A, Cabello-Yeves P, Almagro-Moreno S Proc Natl Acad Sci U S A. 2021; 118(40).

PMID: 34593634 PMC: 8501797. DOI: 10.1073/pnas.2103470118.


RaFAH: Host prediction for viruses of Bacteria and Archaea based on protein content.

Coutinho F, Zaragoza-Solas A, Lopez-Perez M, Barylski J, Zielezinski A, Dutilh B Patterns (N Y). 2021; 2(7):100274.

PMID: 34286299 PMC: 8276007. DOI: 10.1016/j.patter.2021.100274.


Parallelization of MAFFT for large-scale multiple sequence alignments.

Nakamura T, Yamada K, Tomii K, Katoh K Bioinformatics. 2018; 34(14):2490-2492.

PMID: 29506019 PMC: 6041967. DOI: 10.1093/bioinformatics/bty121.


References
1.
Boyce K, Sievers F, Higgins D . Reply to Tan et al.: Differences between real and simulated proteins in multiple sequence alignments. Proc Natl Acad Sci U S A. 2015; 112(2):E101. PMC: 4299201. DOI: 10.1073/pnas.1419351112. View

2.
Ye Y, Cheung D, Wang Y, Yiu S, Zhan Q, Lam T . GLProbs: Aligning Multiple Sequences Adaptively. IEEE/ACM Trans Comput Biol Bioinform. 2015; 12(1):67-78. DOI: 10.1109/TCBB.2014.2316820. View

3.
Muller T, Spang R, Vingron M . Estimating amino acid substitution models: a comparison of Dayhoff's estimator, the resolvent approach and a maximum likelihood method. Mol Biol Evol. 2001; 19(1):8-13. DOI: 10.1093/oxfordjournals.molbev.a003985. View

4.
Chakrabarti S, Lanczycki C, Panchenko A, Przytycka T, Thiessen P, Bryant S . Refining multiple sequence alignments with conserved core regions. Nucleic Acids Res. 2006; 34(9):2598-606. PMC: 1463900. DOI: 10.1093/nar/gkl274. View

5.
Mizuguchi K, Deane C, Blundell T, Overington J . HOMSTRAD: a database of protein structure alignments for homologous families. Protein Sci. 1998; 7(11):2469-71. PMC: 2143859. DOI: 10.1002/pro.5560071126. View