» Articles » PMID: 16218944

Protein Database Searches Using Compositionally Adjusted Substitution Matrices

Overview
Journal FEBS J
Specialty Biochemistry
Date 2005 Oct 13
PMID 16218944
Citations 491
Authors
Affiliations
Soon will be listed here.
Abstract

Almost all protein database search methods use amino acid substitution matrices for scoring, optimizing, and assessing the statistical significance of sequence alignments. Much care and effort has therefore gone into constructing substitution matrices, and the quality of search results can depend strongly upon the choice of the proper matrix. A long-standing problem has been the comparison of sequences with biased amino acid compositions, for which standard substitution matrices are not optimal. To address this problem, we have recently developed a general procedure for transforming a standard matrix into one appropriate for the comparison of two sequences with arbitrary, and possibly differing compositions. Such adjusted matrices yield, on average, improved alignments and alignment scores when applied to the comparison of proteins with markedly biased compositions. Here we review the application of compositionally adjusted matrices and consider whether they may also be applied fruitfully to general purpose protein sequence database searches, in which related sequence pairs do not necessarily have strong compositional biases. Although it is not advisable to apply compositional adjustment indiscriminately, we describe several simple criteria under which invoking such adjustment is on average beneficial. In a typical database search, at least one of these criteria is satisfied by over half the related sequence pairs. Compositional substitution matrix adjustment is now available in NCBI's protein-protein version of blast.

Citing Articles

Insights into the diversity and conservation of the chB6 alloantigen.

Funk P Front Immunol. 2025; 16:1547896.

PMID: 40051637 PMC: 11882424. DOI: 10.3389/fimmu.2025.1547896.


Structural insights into TRPV2 modulation by probenecid.

Rocereta J, Sturhahn T, Pumroy R, Fricke T, Herzog C, Leffler A Nat Struct Mol Biol. 2025; .

PMID: 39972168 DOI: 10.1038/s41594-025-01494-9.


Identification and purification of a novel bacteriophage T7 endonuclease from the Kogelberg Biosphere Reserve (KBR) biodiversity hotspot.

Pillay P, Moralo M, Mtimka S, Shai T, Botha K, Kwezi L Biotechnol Rep (Amst). 2025; 45:e00877.

PMID: 39967824 PMC: 11833611. DOI: 10.1016/j.btre.2025.e00877.


In Silico Characterization of Sirtuins in Acetic Acid Bacteria Reveals a Novel Phylogenetically Distinctive Group.

Jugovic I, Trcek J Molecules. 2025; 30(3).

PMID: 39942739 PMC: 11820453. DOI: 10.3390/molecules30030635.


Proposal of gen. nov., sp. nov. in the ubiquitous bacterial phylum phyl. nov.

Dutkiewicz Z, Singleton C, Sereika M, Villada J, Mussig A, Chuvochina M ISME Commun. 2025; 5(1):ycae147.

PMID: 39931676 PMC: 11809585. DOI: 10.1093/ismeco/ycae147.


References
1.
Wan H, Wootton J . A global compositional complexity measure for biological sequences: AT-rich and GC-rich genomes encode less complex proteins. Comput Chem. 2000; 24(1):71-94. DOI: 10.1016/s0097-8485(99)00048-0. View

2.
Gribskov M, Robinson N . Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Comput Chem. 1996; 20(1):25-33. DOI: 10.1016/s0097-8485(96)80004-0. View

3.
Ng P, Henikoff J, Henikoff S . PHAT: a transmembrane-specific substitution matrix. Predicted hydrophobic and transmembrane. Bioinformatics. 2000; 16(9):760-6. DOI: 10.1093/bioinformatics/16.9.760. View

4.
Altschul S, Bundschuh R, Olsen R, Hwa T . The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Res. 2001; 29(2):351-61. PMC: 29669. DOI: 10.1093/nar/29.2.351. View

5.
Muller T, Vingron M . Modeling amino acid replacement. J Comput Biol. 2001; 7(6):761-76. DOI: 10.1089/10665270050514918. View