» Articles » PMID: 6572363

Rapid Similarity Searches of Nucleic Acid and Protein Data Banks

Overview
Specialty Science
Date 1983 Feb 1
PMID 6572363
Citations 570
Authors
Affiliations
Soon will be listed here.
Abstract

With the development of large data banks of protein and nucleic acid sequences, the need for efficient methods of searching such banks for sequences similar to a given sequence has become evident. We present an algorithm for the global comparison of sequences based on matching k-tuples of sequence elements for a fixed k. The method results in substantial reduction in the time required to search a data bank when compared with prior techniques of similarity analysis, with minimal loss in sensitivity. The algorithm has also been adapted, in a separate implementation, to produce rigorous sequence alignments. Currently, using the DEC KL-10 system, we can compare all sequences in the entire Protein Data Bank of the National Biomedical Research Foundation with a 350-residue query sequence in less than 3 min and carry out a similar analysis with a 500-base query sequence against all eukaryotic sequences in the Los Alamos Nucleic Acid Data Base in less than 2 min.

Citing Articles

The Historical Evolution and Significance of Multiple Sequence Alignment in Molecular Structure and Function Prediction.

Zhang C, Wang Q, Li Y, Teng A, Hu G, Wuyun Q Biomolecules. 2025; 14(12.

PMID: 39766238 PMC: 11673352. DOI: 10.3390/biom14121531.


SpanSeq: similarity-based sequence data splitting method for improved development and assessment of deep learning projects.

Ferrer Florensa A, Almagro Armenteros J, Nielsen H, Aarestrup F, Clausen P NAR Genom Bioinform. 2024; 6(3):lqae106.

PMID: 39157582 PMC: 11327874. DOI: 10.1093/nargab/lqae106.


Characterization of a MHYT domain-coupled transcriptional regulator that responds to carbon monoxide.

Durante-Rodriguez G, de Francisco-Polanco S, Garcia J, Diaz E Nucleic Acids Res. 2024; 52(15):8849-8860.

PMID: 38966994 PMC: 11347149. DOI: 10.1093/nar/gkae575.


PASS: Protein Annotation Surveillance Site for Protein Annotation Using Homologous Clusters, NLP, and Sequence Similarity Networks.

Tao J, Brayton K, Broschat S Front Bioinform. 2022; 1:749008.

PMID: 36303767 PMC: 9581018. DOI: 10.3389/fbinf.2021.749008.


Global, highly specific and fast filtering of alignment seeds.

Ebel M, Migliorelli G, Stanke M BMC Bioinformatics. 2022; 23(1):225.

PMID: 35689182 PMC: 9188137. DOI: 10.1186/s12859-022-04745-4.


References
1.
Smith T, Waterman M, FITCH W . Comparative biosequence metrics. J Mol Evol. 1981; 18(1):38-46. DOI: 10.1007/BF01733210. View

2.
Sellers P . Pattern recognition in genetic sequences. Proc Natl Acad Sci U S A. 1979; 76(7):3041. PMC: 383757. DOI: 10.1073/pnas.76.7.3041. View

3.
Dumas J, Ninio J . Efficient algorithms for folding and comparing nucleic acid sequences. Nucleic Acids Res. 1982; 10(1):197-206. PMC: 326126. DOI: 10.1093/nar/10.1.197. View

4.
Sankoff D . Matching sequences under deletion-insertion constraints. Proc Natl Acad Sci U S A. 1972; 69(1):4-6. PMC: 427531. DOI: 10.1073/pnas.69.1.4. View

5.
Maizel Jr J, Lenk R . Enhanced graphic matrix analysis of nucleic acid and protein sequences. Proc Natl Acad Sci U S A. 1981; 78(12):7665-9. PMC: 349330. DOI: 10.1073/pnas.78.12.7665. View