SeqHound: Biological Sequence and Structure Database As a Platform for Bioinformatics Research

Overview

Journal BMC Bioinformatics

Publisher Biomed Central

Specialty Biology

Date 2002 Oct 29

PMID 12401134

Citations 18

Authors

Katerina Michalickova

Gary D Bader

Michel Dumontier

Hao Lieu

Doron Betel

Ruth Isserlin

Christopher W V Hogue

Affiliations

Soon will be listed here.

Abstract

Background: SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment.

Results: SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries.

Conclusions: The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit.

Citing Articles

Splicosomal and serine and arginine-rich splicing factors as targets for TGF-β.

Hallgren O, Malmstrom J, Malmstrom L, Andersson-Sjoland A, Wildt M, Tufvesson E Fibrogenesis Tissue Repair. 2012; 5(1):6.

PMID: 22541002 PMC: 3472233. DOI: 10.1186/1755-1536-5-6.

High-throughput discovery and characterization of fetal protein trafficking in the blood of pregnant women.

Maron J, Alterovitz G, Ramoni M, Johnson K, Bianchi D Proteomics Clin Appl. 2010; 3(12):1389-96.

PMID: 20186258 PMC: 2825712. DOI: 10.1002/prca.200900109.

SNAD: Sequence Name Annotation-based Designer.

Sidorov I, Reshetov D, Gorbalenya A BMC Bioinformatics. 2009; 10:251.

PMID: 19682364 PMC: 2739203. DOI: 10.1186/1471-2105-10-251.

ArrayPlex: distributed, interactive and programmatic access to genome sequence, annotation, ontology, and analytical toolsets.

Killion P, Iyer V Genome Biol. 2008; 9(11):R159.

PMID: 19014503 PMC: 2614491. DOI: 10.1186/gb-2008-9-11-r159.

BIRCH: a user-oriented, locally-customizable, bioinformatics system.

Fristensky B BMC Bioinformatics. 2007; 8:54.

PMID: 17291351 PMC: 1800872. DOI: 10.1186/1471-2105-8-54.

References

Bairoch A, Apweiler R . The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 1999; 28(1):45-8. PMC: 102476. DOI: 10.1093/nar/28.1.45. View

Berman H, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H . The Protein Data Bank. Nucleic Acids Res. 1999; 28(1):235-42. PMC: 102472. DOI: 10.1093/nar/28.1.235. View

Bader G, Hogue C . BIND--a data specification for storing and describing biomolecular interactions, molecular complexes and pathways. Bioinformatics. 2000; 16(5):465-77. DOI: 10.1093/bioinformatics/16.5.465. View

Pruitt K, Maglott D . RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res. 2000; 29(1):137-40. PMC: 29787. DOI: 10.1093/nar/29.1.137. View

Bader G, Donaldson I, Wolting C, Ouellette B, Pawson T, Hogue C . BIND--The Biomolecular Interaction Network Database. Nucleic Acids Res. 2000; 29(1):242-5. PMC: 29820. DOI: 10.1093/nar/29.1.242. View

. Creating the gene ontology resource: design and implementation. Genome Res. 2001; 11(8):1425-33. PMC: 311077. DOI: 10.1101/gr.180801. View

Benson D, Karsch-Mizrachi I, Lipman D, Ostell J, Rapp B, Wheeler D . GenBank. Nucleic Acids Res. 2001; 30(1):17-20. PMC: 99127. DOI: 10.1093/nar/30.1.17. View

Stoesser G, Baker W, van den Broek A, Camon E, Garcia-Pastor M, Kanz C . The EMBL Nucleotide Sequence Database. Nucleic Acids Res. 2001; 30(1):21-6. PMC: 99098. DOI: 10.1093/nar/30.1.21. View

Wu C, Huang H, Arminski L, Castro-Alvear J, Chen Y, Hu Z . The Protein Information Resource: an integrated public resource of functional annotation of proteins. Nucleic Acids Res. 2001; 30(1):35-7. PMC: 99125. DOI: 10.1093/nar/30.1.35. View

10.

Letunic I, Goodstadt L, Dickens N, Doerks T, Schultz J, Mott R . Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res. 2001; 30(1):242-4. PMC: 99073. DOI: 10.1093/nar/30.1.242. View

11.

Wang Y, Anderson J, Chen J, Geer L, He S, Hurwitz D . MMDB: Entrez's 3D-structure database. Nucleic Acids Res. 2001; 30(1):249-52. PMC: 99072. DOI: 10.1093/nar/30.1.249. View

12.

Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy S . The Pfam protein families database. Nucleic Acids Res. 2001; 30(1):276-80. PMC: 99071. DOI: 10.1093/nar/30.1.276. View

13.

Marchler-Bauer A, Panchenko A, Shoemaker B, Thiessen P, Geer L, Bryant S . CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res. 2001; 30(1):281-3. PMC: 99109. DOI: 10.1093/nar/30.1.281. View

14.

Dumontier M, Hogue C . NBLAST: a cluster variant of BLAST for NxN comparisons. BMC Bioinformatics. 2002; 3:13. PMC: 113272. DOI: 10.1186/1471-2105-3-13. View

15.

Betel D, Hogue C . Kangaroo--a pattern-matching program for biological sequences. BMC Bioinformatics. 2002; 3:20. PMC: 119856. DOI: 10.1186/1471-2105-3-20. View

16.

Higgins D, Sharp P . CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene. 1988; 73(1):237-44. DOI: 10.1016/0378-1119(88)90330-7. View

17.

Altschul S, Gish W, Miller W, Myers E, Lipman D . Basic local alignment search tool. J Mol Biol. 1990; 215(3):403-10. DOI: 10.1016/S0022-2836(05)80360-2. View

18.

Boguski M, Lowe T, Tolstoshev C . dbEST--database for "expressed sequence tags". Nat Genet. 1993; 4(4):332-3. DOI: 10.1038/ng0893-332. View

19.

Thompson J, Higgins D, Gibson T . CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994; 22(22):4673-80. PMC: 308517. DOI: 10.1093/nar/22.22.4673. View

20.

Schuler G, Epstein J, Ohkawa H, Kans J . Entrez: molecular biology database and retrieval system. Methods Enzymol. 1996; 266:141-62. DOI: 10.1016/s0076-6879(96)66012-1. View