» Articles » PMID: 28373894

Gerbil: a Fast and Memory-efficient -mer Counter with GPU-support

Overview
Publisher Biomed Central
Date 2017 Apr 5
PMID 28373894
Citations 14
Authors
Affiliations
Soon will be listed here.
Abstract

Background: A basic task in bioinformatics is the counting of -mers in genome sequences. Existing -mer counting tools are most often optimized for small  < 32 and suffer from excessive memory resource consumption or degrading performance for large . However, given the technology trend towards long reads of next-generation sequencers, support for large  becomes increasingly important.

Results: We present the open source -mer counting software that has been designed for the efficient counting of -mers for  ≥ 32. Our software is the result of an intensive process of algorithm engineering. It implements a two-step approach. In the first step, genome reads are loaded from disk and redistributed to temporary files. In a second step, the -mers of each temporary file are counted via a hash table approach. In addition to its basic functionality, can optionally use GPUs to accelerate the counting step. In a set of experiments with real-world genome data sets, we show that is able to efficiently support both small and large .

Conclusions: While 's performance is comparable to existing state-of-the-art open source -mer counting tools for small  < 32, it vastly outperforms its competitors for large , thereby enabling new applications which require large values of .

Citing Articles

MAFcounter: An efficient tool for counting the occurrences of k-mers in MAF files.

Patsakis M, Provatas K, Mouratidis I, Georgakopoulos-Soares I ArXiv. 2024; .

PMID: 39650609 PMC: 11623707.


When less is more: sketching with minimizers in genomics.

Ndiaye M, Prieto-Banos S, Fitzgerald L, Yazdizadeh Kharrazi A, Oreshkov S, Dessimoz C Genome Biol. 2024; 25(1):270.

PMID: 39402664 PMC: 11472564. DOI: 10.1186/s13059-024-03414-4.


A survey of k-mer methods and applications in bioinformatics.

Moeckel C, Mareboina M, Konnaris M, Chan C, Mouratidis I, Montgomery A Comput Struct Biotechnol J. 2024; 23:2289-2303.

PMID: 38840832 PMC: 11152613. DOI: 10.1016/j.csbj.2024.05.025.


Space-efficient computation of k-mer dictionaries for large values of k.

Diaz-Dominguez D, Leinonen M, Salmela L Algorithms Mol Biol. 2024; 19(1):14.

PMID: 38581000 PMC: 10996146. DOI: 10.1186/s13015-024-00259-1.


Density and Conservation Optimization of the Generalized Masked-Minimizer Sketching Scheme.

Hoang M, Marcais G, Kingsford C J Comput Biol. 2023; 31(1):2-20.

PMID: 37975802 PMC: 10794853. DOI: 10.1089/cmb.2023.0212.


References
1.
Roberts M, Hunt B, Yorke J, Bolanos R, Delcher A . A preprocessor for shotgun assembly of large genomes. J Comput Biol. 2004; 11(4):734-52. DOI: 10.1089/cmb.2004.11.734. View

2.
Roy R, Bhattacharya D, Schliep A . Turtle: identifying frequent k-mers with cache-efficient algorithms. Bioinformatics. 2014; 30(14):1950-7. DOI: 10.1093/bioinformatics/btu132. View

3.
Mamun A, Pal S, Rajasekaran S . KCMBT: a k-mer Counter based on Multiple Burst Trees. Bioinformatics. 2016; 32(18):2783-90. PMC: 5939891. DOI: 10.1093/bioinformatics/btw345. View

4.
Perez N, Gutierrez M, Vera N . Computational Performance Assessment of k-mer Counting Algorithms. J Comput Biol. 2016; 23(4):248-55. DOI: 10.1089/cmb.2015.0199. View

5.
Marcais G, Kingsford C . A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011; 27(6):764-70. PMC: 3051319. DOI: 10.1093/bioinformatics/btr011. View