» Articles » PMID: 17593978

Indexing Strategies for Rapid Searches of Short Words in Genome Sequences

Overview
Journal PLoS One
Date 2007 Jun 28
PMID 17593978
Citations 50
Authors
Affiliations
Soon will be listed here.
Abstract

Searching for matches between large collections of short (14-30 nucleotides) words and sequence databases comprising full genomes or transcriptomes is a common task in biological sequence analysis. We investigated the performance of simple indexing strategies for handling such tasks and developed two programs, fetchGWI and tagger, that index either the database or the query set. Either strategy outperforms megablast for searches with more than 10,000 probes. FetchGWI is shown to be a versatile tool for rapidly searching multiple genomes, whose performance is limited in most cases by the speed of access to the filesystem. We have made publicly available a Web interface for searching the human, mouse, and several other genomes and transcriptomes with oligonucleotide queries.

Citing Articles

Click editing enables programmable genome writing using DNA polymerases and HUH endonucleases.

Ferreira da Silva J, Tou C, King E, Eller M, Rufino-Ramos D, Ma L Nat Biotechnol. 2024; .

PMID: 39039307 PMC: 11751136. DOI: 10.1038/s41587-024-02324-x.


Effect of C-type lectin 16 on dengue virus infection in salivary glands.

Chang Y, Liu W, Fang P, Li J, Liu K, Huang J PNAS Nexus. 2024; 3(5):pgae188.

PMID: 38813522 PMC: 11134184. DOI: 10.1093/pnasnexus/pgae188.


Click editing enables programmable genome writing using DNA polymerases and HUH endonucleases.

Ferreira da Silva J, Tou C, King E, Eller M, Ma L, Rufino-Ramos D bioRxiv. 2023; .

PMID: 37745481 PMC: 10515857. DOI: 10.1101/2023.09.12.557440.


A cis-regulatory sequence of the selector gene vestigial drives the evolution of wing scaling in Drosophila species.

Farfan-Pira K, Martinez-Cuevas T, Evans T, Nahmad M J Exp Biol. 2023; 226(10).

PMID: 37078652 PMC: 10234621. DOI: 10.1242/jeb.244692.


CRISPR mediated transactivation in the human disease vector Aedes aegypti.

Bui M, Dalla Benetta E, Dong Y, Zhao Y, Yang T, Li M PLoS Pathog. 2023; 19(1):e1010842.

PMID: 36656895 PMC: 9888728. DOI: 10.1371/journal.ppat.1010842.


References
1.
Ning Z, Cox A, Mullikin J . SSAHA: a fast search method for large DNA databases. Genome Res. 2001; 11(10):1725-9. PMC: 311141. DOI: 10.1101/gr.194201. View

2.
Iseli C, Stevenson B, de Souza S, Samaia H, Camargo A, Buetow K . Long-range heterogeneity at the 3' ends of human mRNAs. Genome Res. 2002; 12(7):1068-74. PMC: 186619. DOI: 10.1101/gr.62002. View

3.
Wheeler D, Barrett T, Benson D, Bryant S, Canese K, Chetvernin V . Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2006; 35(Database issue):D5-12. PMC: 1781113. DOI: 10.1093/nar/gkl1031. View

4.
Liu G, Loraine A, Shigeta R, Cline M, Cheng J, Valmeekam V . NetAffx: Affymetrix probesets and annotations. Nucleic Acids Res. 2003; 31(1):82-6. PMC: 165568. DOI: 10.1093/nar/gkg121. View

5.
Lal A, Sui I, Riggins G . Serial analysis of gene expression: probing transcriptomes for molecular targets. Curr Opin Mol Ther. 2009; 1(6):720-6. View