A Comparison of Scoring Functions for Protein Sequence Profile Alignment
Overview
Authors
Affiliations
Motivation: In recent years, several methods have been proposed for aligning two protein sequence profiles, with reported improvements in alignment accuracy and homolog discrimination versus sequence-sequence methods (e.g. BLAST) and profile-sequence methods (e.g. PSI-BLAST). Profile-profile alignment is also the iterated step in progressive multiple sequence alignment algorithms such as CLUSTALW. However, little is known about the relative performance of different profile-profile scoring functions. In this work, we evaluate the alignment accuracy of 23 different profile-profile scoring functions by comparing alignments of 488 pairs of sequences with identity < or =30% against structural alignments. We optimize parameters for all scoring functions on the same training set and use profiles of alignments from both PSI-BLAST and SAM-T99. Structural alignments are constructed from a consensus between the FSSP database and CE structural aligner. We compare the results with sequence-sequence and sequence-profile methods, including BLAST and PSI-BLAST.
Results: We find that profile-profile alignment gives an average improvement over our test set of typically 2-3% over profile-sequence alignment and approximately 40% over sequence-sequence alignment. No statistically significant difference is seen in the relative performance of most of the scoring functions tested. Significantly better results are obtained with profiles constructed from SAM-T99 alignments than from PSI-BLAST alignments.
Availability: Source code, reference alignments and more detailed results are freely available at http://phylogenomics.berkeley.edu/profilealignment/
Wang W, Cheng M, Wei X, Wang R, Fan F, Wang Z Front Plant Sci. 2023; 14:1174955.
PMID: 37063175 PMC: 10102486. DOI: 10.3389/fpls.2023.1174955.
PhenoTrack3D: an automatic high-throughput phenotyping pipeline to track maize organs over time.
Daviet B, Fernandez R, Cabrera-Bosquet L, Pradal C, Fournier C Plant Methods. 2022; 18(1):130.
PMID: 36482291 PMC: 9730636. DOI: 10.1186/s13007-022-00961-4.
Genome-Wide Identification and Analysis of the Metallothionein Genes in Genus.
Cheng M, Yuan H, Wang R, Zou J, Liang T, Yang F Int J Mol Sci. 2021; 22(17).
PMID: 34502554 PMC: 8431808. DOI: 10.3390/ijms22179651.
Deryckere A, Styfhals R, Elagoz A, Maes G, Seuntjens E Elife. 2021; 10.
PMID: 34425939 PMC: 8384421. DOI: 10.7554/eLife.69161.
Estimating statistical significance of local protein profile-profile alignments.
Margelevicius M BMC Bioinformatics. 2019; 20(1):419.
PMID: 31409275 PMC: 6693267. DOI: 10.1186/s12859-019-2913-3.