» Articles » PMID: 27923999

Query-seeded Iterative Sequence Similarity Searching Improves Selectivity 5-20-fold

Overview
Specialty Biochemistry
Date 2016 Dec 8
PMID 27923999
Citations 12
Authors
Affiliations
Soon will be listed here.
Abstract

Iterative similarity search programs, like psiblast, jackhmmer, and psisearch, are much more sensitive than pairwise similarity search methods like blast and ssearch because they build a position specific scoring model (a PSSM or HMM) that captures the pattern of sequence conservation characteristic to a protein family. But models are subject to contamination; once an unrelated sequence has been added to the model, homologs of the unrelated sequence will also produce high scores, and the model can diverge from the original protein family. Examination of alignment errors during psiblast PSSM contamination suggested a simple strategy for dramatically reducing PSSM contamination. psiblast PSSMs are built from the query-based multiple sequence alignment (MSA) implied by the pairwise alignments between the query model (PSSM, HMM) and the subject sequences in the library. When the original query sequence residues are inserted into gapped positions in the aligned subject sequence, the resulting PSSM rarely produces alignment over-extensions or alignments to unrelated sequences. This simple step, which tends to anchor the PSSM to the original query sequence and slightly increase target percent identity, can reduce the frequency of false-positive alignments more than 20-fold compared with psiblast and jackhmmer, with little loss in search sensitivity.

Citing Articles

Improved selection of canonical proteins for reference proteomes.

Insana G, Martin M, Pearson W NAR Genom Bioinform. 2024; 6(2):lqae066.

PMID: 38863529 PMC: 11165316. DOI: 10.1093/nargab/lqae066.


Uncovering gene-family founder events during major evolutionary transitions in animals, plants and fungi using GenEra.

Barrera-Redondo J, Lotharukpong J, Drost H, Coelho S Genome Biol. 2023; 24(1):54.

PMID: 36964572 PMC: 10037820. DOI: 10.1186/s13059-023-02895-z.


Rational Design of Profile HMMs for Sensitive and Specific Sequence Detection with Case Studies Applied to Viruses, Bacteriophages, and Casposons.

Oliveira L, Reyes A, Dutilh B, Gruber A Viruses. 2023; 15(2).

PMID: 36851733 PMC: 9966878. DOI: 10.3390/v15020519.


Proteins Binding to the Carbohydrate HNK-1: Common Origins?.

Castillo G, Kleene R, Schachner M, Loers G, Torda A Int J Mol Sci. 2021; 22(15).

PMID: 34360882 PMC: 8347730. DOI: 10.3390/ijms22158116.


Ten Years of Collaborative Progress in the Quest for Orthologs.

Linard B, Ebersberger I, McGlynn S, Glover N, Mochizuki T, Patricio M Mol Biol Evol. 2021; 38(8):3033-3045.

PMID: 33822172 PMC: 8321534. DOI: 10.1093/molbev/msab098.


References
1.
Boratyn G, Schaffer A, Agarwala R, Altschul S, Lipman D, Madden T . Domain enhanced lookup time accelerated BLAST. Biol Direct. 2012; 7:12. PMC: 3438057. DOI: 10.1186/1745-6150-7-12. View

2.
Yu Y, Wootton J, Altschul S . The compositional adjustment of amino acid substitution matrices. Proc Natl Acad Sci U S A. 2003; 100(26):15688-93. PMC: 307629. DOI: 10.1073/pnas.2533904100. View

3.
Yu Y, Gertz E, Agarwala R, Schaffer A, Altschul S . Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches. Nucleic Acids Res. 2006; 34(20):5966-73. PMC: 1635310. DOI: 10.1093/nar/gkl731. View

4.
Altschul S, Gertz E, Agarwala R, Schaffer A, Yu Y . PSI-BLAST pseudocounts and the minimum description length principle. Nucleic Acids Res. 2008; 37(3):815-24. PMC: 2647318. DOI: 10.1093/nar/gkn981. View

5.
Pearson W . Finding Protein and Nucleotide Similarities with FASTA. Curr Protoc Bioinformatics. 2016; 53:3.9.1-3.9.25. PMC: 5072362. DOI: 10.1002/0471250953.bi0309s53. View