» Articles » PMID: 24281694

Revisiting Amino Acid Substitution Matrices for Identifying Distantly Related Proteins

Overview
Journal Bioinformatics
Specialty Biology
Date 2013 Nov 28
PMID 24281694
Citations 28
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Although many amino acid substitution matrices have been developed, it has not been well understood which is the best for similarity searches, especially for remote homology detection. Therefore, we collected information related to existing matrices, condensed it and derived a novel matrix that can detect more remote homology than ever.

Results: Using principal component analysis with existing matrices and benchmarks, we developed a novel matrix, which we designate as MIQS. The detection performance of MIQS is validated and compared with that of existing general purpose matrices using SSEARCH with optimized gap penalties for each matrix. Results show that MIQS is able to detect more remote homology than the existing matrices on an independent dataset. In addition, the performance of our developed matrix was superior to that of CS-BLAST, which was a novel similarity search method with no amino acid matrix. We also evaluated the alignment quality of matrices and methods, which revealed that MIQS shows higher alignment sensitivity than that with the existing matrix series and CS-BLAST. Fundamentally, these results are expected to constitute good proof of the availability and/or importance of amino acid matrices in sequence analysis. Moreover, with our developed matrix, sophisticated similarity search methods such as sequence-profile and profile-profile comparison methods can be improved further.

Availability And Implementation: Newly developed matrices and datasets used for this study are available at http://csas.cbrc.jp/Ssearch/.

Citing Articles

A transphyletic study of metazoan β-catenin protein complexes.

Mbogo I, Kawano C, Nakamura R, Tsuchiya Y, Villar-Briones A, Hirao Y Zoological Lett. 2024; 10(1):20.

PMID: 39623505 PMC: 11613877. DOI: 10.1186/s40851-024-00243-y.


Study of the Floristic, Morphological, and Genetic (atpF-atpH, Internal Transcribed Spacer (ITS), matK, psbK-psbI, rbcL, and trnH-psbA) Differences in Populations in Mangistau (Kazakhstan).

Imanbayeva A, Duisenova N, Orazov A, Sagyndykova M, Belozerov I, Tuyakova A Plants (Basel). 2024; 13(12).

PMID: 38931023 PMC: 11207986. DOI: 10.3390/plants13121591.


New alignment method for remote protein sequences by the direct use of pairwise sequence correlations and substitutions.

Jia K, Kilinc M, Jernigan R Front Bioinform. 2023; 3:1227193.

PMID: 37900964 PMC: 10602800. DOI: 10.3389/fbinf.2023.1227193.


Developing similarity matrices for antibody-protein binding interactions.

Islam S, Pantazes R PLoS One. 2023; 18(10):e0293606.

PMID: 37883504 PMC: 10602319. DOI: 10.1371/journal.pone.0293606.


Mutation Space of Spatially Conserved Amino Acid Sites in Proteins.

Caswell B, Summers T, Licup G, Cantu D ACS Omega. 2023; 8(27):24302-24310.

PMID: 37457482 PMC: 10339398. DOI: 10.1021/acsomega.3c01473.


References
1.
Jung J, Lee B . Use of residue pairs in protein sequence-sequence and sequence-structure alignments. Protein Sci. 2000; 9(8):1576-88. PMC: 2144723. DOI: 10.1110/ps.9.8.1576. View

2.
Liu X, Zhao Y . Substitution matrices of residue triplets derived from protein blocks. J Comput Biol. 2010; 17(12):1679-87. DOI: 10.1089/cmb.2008.0035. View

3.
Gambin A, Lasota S, Szklarczyk R, Tiuryn J, Tyszkiewicz J . Contextual alignment of biological sequences (Extended abstract). Bioinformatics. 2002; 18 Suppl 2:S116-27. DOI: 10.1093/bioinformatics/18.suppl_2.s116. View

4.
Hourai Y, Akutsu T, Akiyama Y . Optimizing substitution matrices by separating score distributions. Bioinformatics. 2004; 20(6):863-73. DOI: 10.1093/bioinformatics/btg494. View

5.
Dimmic M, Rest J, Mindell D, Goldstein R . rtREV: an amino acid substitution matrix for inference of retrovirus and reverse transcriptase phylogeny. J Mol Evol. 2002; 55(1):65-73. DOI: 10.1007/s00239-001-2304-y. View