» Articles » PMID: 14663142

The Compositional Adjustment of Amino Acid Substitution Matrices

Overview
Specialty Science
Date 2003 Dec 10
PMID 14663142
Citations 47
Authors
Affiliations
Soon will be listed here.
Abstract

Amino acid substitution matrices are central to protein-comparison methods. In most commonly used matrices, the substitution scores take a log-odds form, involving the ratio of "target" to "background" frequencies derived from large, carefully curated sets of protein alignments. However, such matrices often are used to compare protein sequences with amino acid compositions that differ markedly from the background frequencies used for the construction of the matrices. Of course, the target frequencies should be adjusted in such cases, but the lack of an appropriate way to do this has been a long-standing problem. This article shows that if one demands consistency between target and background frequencies, then a log-odds substitution matrix implies a unique set of target and background frequencies as well as a unique scale. Standard substitution matrices therefore are truly appropriate only for the comparison of proteins with standard amino acid composition. Accordingly, we present and evaluate a rationale for transforming the target frequencies implicit in a standard matrix to frequencies appropriate for a nonstandard context. This rationale yields asymmetric matrices for the comparison of proteins with divergent compositions. Earlier approaches are unable to deal with this case in a fully consistent manner. Composition-specific substitution matrix adjustment is shown to be of utility for comparing compositionally biased proteins, including those of organisms with nucleotide-biased, and therefore codon-biased, genomes or isochores.

Citing Articles

Challenges in adjusting scoring matrices when comparing functional motifs with non-standard compositions.

Jarnot P Sci Rep. 2024; 14(1):31777.

PMID: 39738463 PMC: 11685636. DOI: 10.1038/s41598-024-82548-8.


tcrBLOSUM: an amino acid substitution matrix for sensitive alignment of distant epitope-specific TCRs.

Postovskaya A, Vercauteren K, Meysman P, Laukens K Brief Bioinform. 2024; 26(1).

PMID: 39576224 PMC: 11583439. DOI: 10.1093/bib/bbae602.


Computational Methods for the Discovery and Optimization of TAAR1 and TAAR5 Ligands.

Scarano N, Espinoza S, Brullo C, Cichero E Int J Mol Sci. 2024; 25(15).

PMID: 39125796 PMC: 11312273. DOI: 10.3390/ijms25158226.


New alignment method for remote protein sequences by the direct use of pairwise sequence correlations and substitutions.

Jia K, Kilinc M, Jernigan R Front Bioinform. 2023; 3:1227193.

PMID: 37900964 PMC: 10602800. DOI: 10.3389/fbinf.2023.1227193.


Mutation Space of Spatially Conserved Amino Acid Sites in Proteins.

Caswell B, Summers T, Licup G, Cantu D ACS Omega. 2023; 8(27):24302-24310.

PMID: 37457482 PMC: 10339398. DOI: 10.1021/acsomega.3c01473.


References
1.
Wan H, Wootton J . A global compositional complexity measure for biological sequences: AT-rich and GC-rich genomes encode less complex proteins. Comput Chem. 2000; 24(1):71-94. DOI: 10.1016/s0097-8485(99)00048-0. View

2.
Tatusov R, Koonin E, Lipman D . A genomic perspective on protein families. Science. 1997; 278(5338):631-7. DOI: 10.1126/science.278.5338.631. View

3.
Altschul S, Bundschuh R, Olsen R, Hwa T . The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Res. 2001; 29(2):351-61. PMC: 29669. DOI: 10.1093/nar/29.2.351. View

4.
Knight R, Freeland S, Landweber L . A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biol. 2001; 2(4):RESEARCH0010. PMC: 31479. DOI: 10.1186/gb-2001-2-4-research0010. View

5.
Muller T, Vingron M . Modeling amino acid replacement. J Comput Biol. 2001; 7(6):761-76. DOI: 10.1089/10665270050514918. View