» Articles » PMID: 24040893

Space-efficient and Exact De Bruijn Graph Representation Based on a Bloom Filter

Overview
Publisher Biomed Central
Date 2013 Sep 18
PMID 24040893
Citations 132
Authors
Affiliations
Soon will be listed here.
Abstract

Background: The de Bruijn graph data structure is widely used in next-generation sequencing (NGS). Many programs, e.g. de novo assemblers, rely on in-memory representation of this graph. However, current techniques for representing the de Bruijn graph of a human genome require a large amount of memory (≥30 GB).

Results: We propose a new encoding of the de Bruijn graph, which occupies an order of magnitude less space than current representations. The encoding is based on a Bloom filter, with an additional structure to remove critical false positives.

Conclusions: An assembly software implementing this structure, Minia, performed a complete de novo assembly of human genome short reads using 5.7 GB of memory in 23 hours.

Citing Articles

Fast and Scalable Parallel External-Memory Construction of Colored Compacted de Bruijn Graphs with Cuttlefish 3.

Khan J, Dhulipala L, Patro R bioRxiv. 2025; .

PMID: 39975203 PMC: 11838517. DOI: 10.1101/2025.02.02.636161.


-mer approaches for biodiversity genomics.

Jenike K, Campos-Dominguez L, Bodde M, Cerca J, Hodson C, Schatz M Genome Res. 2025; 35(2):219-230.

PMID: 39890468 PMC: 11874746. DOI: 10.1101/gr.279452.124.


Phylogeographic and genetic insights into Sinonychia martensi: an endemic cave-dwelling harvestman in Beijing.

Xiao R, Zhao J, Zhao L, Derkarabetian S, Zhang F, Zhang C BMC Ecol Evol. 2025; 25(1):5.

PMID: 39773162 PMC: 11708087. DOI: 10.1186/s12862-024-02341-z.


A Whole-Genome Survey and the Mitochondrial Genome of Provide Insights into Its Phylogenetic Relationships in Priacanthiformes.

Mao W, Xu Z, Liu Q, Li N, Liu L, Ren B Animals (Basel). 2024; 14(22).

PMID: 39595310 PMC: 11590942. DOI: 10.3390/ani14223257.


Structural variation in the pangenome of wild and domesticated barley.

Jayakodi M, Lu Q, Pidon H, Rabanus-Wallace M, Bayer M, Lux T Nature. 2024; 636(8043):654-662.

PMID: 39537924 PMC: 11655362. DOI: 10.1038/s41586-024-08187-1.


References
1.
Warren R, Holt R . Targeted assembly of short sequence reads. PLoS One. 2011; 6(5):e19816. PMC: 3092772. DOI: 10.1371/journal.pone.0019816. View

2.
Kingsford C, Schatz M, Pop M . Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics. 2010; 11:21. PMC: 2821320. DOI: 10.1186/1471-2105-11-21. View

3.
Pell J, Hintze A, Canino-Koning R, Howe A, Tiedje J, Brown C . Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Proc Natl Acad Sci U S A. 2012; 109(33):13272-7. PMC: 3421212. DOI: 10.1073/pnas.1121464109. View

4.
Idury R, Waterman M . A new algorithm for DNA sequence assembly. J Comput Biol. 1995; 2(2):291-306. DOI: 10.1089/cmb.1995.2.291. View

5.
Peterlongo P, Chikhi R . Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer. BMC Bioinformatics. 2012; 13:48. PMC: 3514201. DOI: 10.1186/1471-2105-13-48. View