» Articles » PMID: 33823902

Simplitigs As an Efficient and Scalable Representation of De Bruijn Graphs

Overview
Journal Genome Biol
Specialties Biology
Genetics
Date 2021 Apr 7
PMID 33823902
Citations 19
Authors
Affiliations
Soon will be listed here.
Abstract

de Bruijn graphs play an essential role in bioinformatics, yet they lack a universal scalable representation. Here, we introduce simplitigs as a compact, efficient, and scalable representation, and ProphAsm, a fast algorithm for their computation. For the example of assemblies of model organisms and two bacterial pan-genomes, we compare simplitigs to unitigs, the best existing representation, and demonstrate that simplitigs provide a substantial improvement in the cumulative sequence length and their number. When combined with the commonly used Burrows-Wheeler Transform index, simplitigs reduce memory, and index loading and query times, as demonstrated with large-scale examples of GenBank bacterial pan-genomes.

Citing Articles

Fractional hitting sets for efficient multiset sketching.

Rouze T, Martayan I, Marchet C, Limasset A Algorithms Mol Biol. 2025; 20(1):1.

PMID: 39923117 PMC: 11807336. DOI: 10.1186/s13015-024-00268-0.


Applications of de Bruijn graphs in microbiome research.

Dufault-Thompson K, Jiang X Imeta. 2024; 1(1):e4.

PMID: 38867733 PMC: 10989854. DOI: 10.1002/imt2.4.


Compression algorithm for colored de Bruijn graphs.

Rahman A, Dufresne Y, Medvedev P Algorithms Mol Biol. 2024; 19(1):20.

PMID: 38797858 PMC: 11129398. DOI: 10.1186/s13015-024-00254-6.


Compression Algorithm for Colored de Bruijn Graphs.

Rahman A, Dufresne Y, Medvedev P Lebniz Int Proc Inform. 2024; 273.

PMID: 38712341 PMC: 11071130. DOI: 10.4230/LIPIcs.WABI.2023.17.


Space-efficient computation of k-mer dictionaries for large values of k.

Diaz-Dominguez D, Leinonen M, Salmela L Algorithms Mol Biol. 2024; 19(1):14.

PMID: 38581000 PMC: 10996146. DOI: 10.1186/s13015-024-00259-1.


References
1.
Kokot M, Dlugosz M, Deorowicz S . KMC 3: counting and manipulating k-mer statistics. Bioinformatics. 2017; 33(17):2759-2761. DOI: 10.1093/bioinformatics/btx304. View

2.
Souvorov A, Agarwala R, Lipman D . SKESA: strategic k-mer extension for scrupulous assemblies. Genome Biol. 2018; 19(1):153. PMC: 6172800. DOI: 10.1186/s13059-018-1540-z. View

3.
. Computational pan-genomics: status, promises and challenges. Brief Bioinform. 2016; 19(1):118-135. PMC: 5862344. DOI: 10.1093/bib/bbw089. View

4.
Paten B, Novak A, Eizenga J, Garrison E . Genome graphs and the evolution of genome inference. Genome Res. 2017; 27(5):665-676. PMC: 5411762. DOI: 10.1101/gr.214155.116. View

5.
Rahman A, Medevedev P . Representation of -Mer Sets Using Spectrum-Preserving String Sets. J Comput Biol. 2020; 28(4):381-394. PMC: 8066325. DOI: 10.1089/cmb.2020.0431. View