Computational Performance Assessment of K-mer Counting Algorithms
Overview
Molecular Biology
Authors
Affiliations
This article is about the assessment of several tools for k-mer counting, with the purpose to create a reference framework for bioinformatics researchers to identify computational requirements, parallelizing, advantages, disadvantages, and bottlenecks of each of the algorithms proposed in the tools. The k-mer counters evaluated in this article were BFCounter, DSK, Jellyfish, KAnalyze, KHMer, KMC2, MSPKmerCounter, Tallymer, and Turtle. Measured parameters were the following: RAM occupied space, processing time, parallelization, and read and write disk access. A dataset consisting of 36,504,800 reads was used corresponding to the 14th human chromosome. The assessment was performed for two k-mer lengths: 31 and 55. Obtained results were the following: pure Bloom filter-based tools and disk-partitioning techniques showed a lesser RAM use. The tools that took less execution time were the ones that used disk-partitioning techniques. The techniques that made the major parallelization were the ones that used disk partitioning, hash tables with lock-free approach, or multiple hash tables.
PanKA: Leveraging population pangenome to predict antibiotic resistance.
Do V, Nguyen V, Nguyen S, Le D, Nguyen T, Nguyen C iScience. 2024; 27(9):110623.
PMID: 39228791 PMC: 11369404. DOI: 10.1016/j.isci.2024.110623.
A survey of k-mer methods and applications in bioinformatics.
Moeckel C, Mareboina M, Konnaris M, Chan C, Mouratidis I, Montgomery A Comput Struct Biotechnol J. 2024; 23:2289-2303.
PMID: 38840832 PMC: 11152613. DOI: 10.1016/j.csbj.2024.05.025.
Chromosome-scale assembly and high-density genetic map of the yellow drum, Nibea albiflora.
Xu D, Zhang W, Chen R, Song H, Tian L, Tan P Sci Data. 2021; 8(1):268.
PMID: 34654820 PMC: 8521588. DOI: 10.1038/s41597-021-01045-z.
Manekar S, Sathe S Curr Genomics. 2019; 20(1):2-15.
PMID: 31015787 PMC: 6446480. DOI: 10.2174/1389202919666181026101326.
A benchmark study of k-mer counting methods for high-throughput sequencing.
Manekar S, Sathe S Gigascience. 2018; 7(12).
PMID: 30346548 PMC: 6280066. DOI: 10.1093/gigascience/giy125.