» Articles » PMID: 18184432

SeqAn an Efficient, Generic C++ Library for Sequence Analysis

Overview
Publisher Biomed Central
Specialty Biology
Date 2008 Jan 11
PMID 18184432
Citations 118
Authors
Affiliations
Soon will be listed here.
Abstract

Background: The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome 1 would not have been possible without advanced assembly algorithms. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there is a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use.

Results: To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. In this paper we describe the design and content of SeqAn and demonstrate its use by giving two examples. In the first example we show an application of SeqAn as an experimental platform by comparing different exact string matching algorithms. The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use.

Conclusion: We anticipate that SeqAn greatly simplifies the rapid development of new bioinformatics tools by providing a collection of readily usable, well-designed algorithmic components which are fundamental for the field of sequence analysis. This leverages not only the implementation of new algorithms, but also enables a sound analysis and comparison of existing algorithms.

Citing Articles

Meiosis-specific distal cohesion site decoupled from the kinetochore.

Pan B, Bruno M, Macfarlan T, Akera T Nat Commun. 2025; 16(1):2116.

PMID: 40032846 PMC: 11876576. DOI: 10.1038/s41467-025-57438-w.


AltaiR: a C toolkit for alignment-free and temporal analysis of multi-FASTA data.

Silva J, Pinho A, Pratas D Gigascience. 2024; 13.

PMID: 39589438 PMC: 11590114. DOI: 10.1093/gigascience/giae086.


Mapping the IscR regulon sheds light on the regulation of iron homeostasis in .

Dos Santos N, Picinato B, Santos L, de Araujo H, Balan A, Koide T Front Microbiol. 2024; 15:1463854.

PMID: 39411446 PMC: 11475020. DOI: 10.3389/fmicb.2024.1463854.


Nuclear dualism without extensive DNA elimination in the ciliate .

Seah B, Singh A, Vetter D, Emmerich C, Peters M, Soltys V Proc Natl Acad Sci U S A. 2024; 121(39):e2400503121.

PMID: 39298487 PMC: 11441545. DOI: 10.1073/pnas.2400503121.


Genetic links between ovarian ageing, cancer risk and de novo mutation rates.

Stankovic S, Shekari S, Huang Q, Gardner E, Ivarsdottir E, Owens N Nature. 2024; 633(8030):608-614.

PMID: 39261734 PMC: 11410666. DOI: 10.1038/s41586-024-07931-x.


References
1.
Myers E, Sutton G, Delcher A, Dew I, Fasulo D, Flanigan M . A whole-genome assembly of Drosophila. Science. 2000; 287(5461):2196-204. DOI: 10.1126/science.287.5461.2196. View

2.
Dutheil J, Gaillard S, Bazin E, Glemin S, Ranwez V, Galtier N . Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics. BMC Bioinformatics. 2006; 7:188. PMC: 1501049. DOI: 10.1186/1471-2105-7-188. View

3.
Pitt W, Williams M, Steven M, Sweeney B, Bleasby A, Moss D . The Bioinformatics Template Library--generic components for biocomputing. Bioinformatics. 2001; 17(8):729-37. DOI: 10.1093/bioinformatics/17.8.729. View

4.
Mural R, Adams M, Myers E, Smith H, Gabor Miklos G, Wides R . A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome. Science. 2002; 296(5573):1661-71. DOI: 10.1126/science.1069193. View

5.
Hohl M, Kurtz S, Ohlebusch E . Efficient multiple genome alignment. Bioinformatics. 2002; 18 Suppl 1:S312-20. DOI: 10.1093/bioinformatics/18.suppl_1.s312. View