» Articles » PMID: 21575167

RAPSearch: a Fast Protein Similarity Search Tool for Short Reads

Overview
Publisher Biomed Central
Specialty Biology
Date 2011 May 18
PMID 21575167
Citations 77
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Next Generation Sequencing (NGS) is producing enormous corpuses of short DNA reads, affecting emerging fields like metagenomics. Protein similarity search--a key step to achieve annotation of protein-coding genes in these short reads, and identification of their biological functions--faces daunting challenges because of the very sizes of the short read datasets.

Results: We developed a fast protein similarity search tool RAPSearch that utilizes a reduced amino acid alphabet and suffix array to detect seeds of flexible length. For short reads (translated in 6 frames) we tested, RAPSearch achieved ~20-90 times speedup as compared to BLASTX. RAPSearch missed only a small fraction (~1.3-3.2%) of BLASTX similarity hits, but it also discovered additional homologous proteins (~0.3-2.1%) that BLASTX missed. By contrast, BLAT, a tool that is even slightly faster than RAPSearch, had significant loss of sensitivity as compared to RAPSearch and BLAST.

Conclusions: RAPSearch is implemented as open-source software and is accessible at http://omics.informatics.indiana.edu/mg/RAPSearch. It enables faster protein similarity search. The application of RAPSearch in metageomics has also been demonstrated.

Citing Articles

Mating systems and recombination landscape strongly shape genetic diversity and selection in wheat relatives.

Burgarella C, Bremaud M, Von Hirschheydt G, Viader V, Ardisson M, Santoni S Evol Lett. 2024; 8(6):866-880.

PMID: 39677571 PMC: 11637685. DOI: 10.1093/evlett/qrae039.


Microorganisms Involved in Methylmercury Demethylation and Mercury Reduction are Widely Distributed and Active in the Bathypelagic Deep Ocean Waters.

Sanz-Saez I, Bravo A, Ferri M, Carreras J, Sanchez O, Sebastian M Environ Sci Technol. 2024; 58(31):13795-13807.

PMID: 39046290 PMC: 11308531. DOI: 10.1021/acs.est.4c00663.


Species-level characterization of saliva and dental plaque microbiota reveals putative bacterial and functional biomarkers of periodontal diseases in dogs.

Alessandri G, Fontana F, Mancabelli L, Tarracchini C, Lugli G, Argentini C FEMS Microbiol Ecol. 2024; 100(6).

PMID: 38782729 PMC: 11165276. DOI: 10.1093/femsec/fiae082.


Fast, parallel, and cache-friendly suffix array construction.

Khan J, Rubel T, Molloy E, Dhulipala L, Patro R Algorithms Mol Biol. 2024; 19(1):16.

PMID: 38679714 PMC: 11056320. DOI: 10.1186/s13015-024-00263-5.


Lambda3: homology search for protein, nucleotide, and bisulfite-converted sequences.

Hauswedell H, Hetzel S, Gottlieb S, Kretzmer H, Meissner A, Reinert K Bioinformatics. 2024; 40(3).

PMID: 38485699 PMC: 10955267. DOI: 10.1093/bioinformatics/btae097.


References
1.
Bray N, Pachter L . MAVID: constrained ancestral alignment of multiple sequences. Genome Res. 2004; 14(4):693-9. PMC: 383315. DOI: 10.1101/gr.1960404. View

2.
Schafmeister C, LaPorte S, Miercke L, Stroud R . A designed four helix bundle protein with native-like structure. Nat Struct Biol. 1997; 4(12):1039-46. DOI: 10.1038/nsb1297-1039. View

3.
Eddy S . A new generation of homology search tools based on probabilistic inference. Genome Inform. 2010; 23(1):205-11. View

4.
Riddle D, Santiago J, Doshi N, Grantcharova V, Yi Q, Baker D . Functional rapidly folding proteins from simplified amino acid sequences. Nat Struct Biol. 1997; 4(10):805-9. DOI: 10.1038/nsb1097-805. View

5.
Bork P, Gibson T . Applying motif and profile searches. Methods Enzymol. 1996; 266:162-84. DOI: 10.1016/s0076-6879(96)66013-3. View