Seqrutinator: Scrutiny of Large Protein Superfamily Sequence Datasets for the Identification and Elimination of Non-functional Homologues
Overview
Overview
Authors
Affiliations
Affiliations
Soon will be listed here.
Abstract
Seqrutinator is an objective, flexible pipeline that removes sequences with sequencing and/or gene model errors and sequences from pseudogenes from complex, eukaryotic protein superfamilies. Testing Seqrutinator on major superfamilies BAHD, CYP, and UGT removes only 1.94% of SwissProt entries, 14% of entries from the model plant Arabidopsis thaliana, but 80% of entries from Pinus taeda's recent complete proteome. Application of Seqrutinator on crude BAHDomes, CYPomes, and UGTomes obtained from 16 plant proteomes shows convergence of the numbers of paralogues. MSAs, phylogenies, and particularly functional clustering improve drastically upon Seqrutinator application, indicating good performance.
References
1.
Talavera G, Castresana J
. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol. 2007; 56(4):564-77.
DOI: 10.1080/10635150701472164.
View
2.
Wilkins A, Erdin S, Lua R, Lichtarge O
. Evolutionary trace for prediction and redesign of protein functional sites. Methods Mol Biol. 2011; 819:29-42.
PMC: 4892863.
DOI: 10.1007/978-1-61779-465-0_3.
View
3.
Pagnuco I, Revuelta M, Bondino H, Brun M, Ten Have A
. HMMER Cut-off Threshold Tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold. PLoS One. 2018; 13(3):e0193757.
PMC: 5868777.
DOI: 10.1371/journal.pone.0193757.
View
4.
Katoh K, Standley D
. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013; 30(4):772-80.
PMC: 3603318.
DOI: 10.1093/molbev/mst010.
View
5.
Criscuolo A, Gribaldo S
. BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol Biol. 2010; 10:210.
PMC: 3017758.
DOI: 10.1186/1471-2148-10-210.
View