A Benchmark Study of K-mer Counting Methods for High-throughput Sequencing
Overview
Authors
Affiliations
The rapid development of high-throughput sequencing technologies means that hundreds of gigabytes of sequencing data can be produced in a single study. Many bioinformatics tools require counts of substrings of length k in DNA/RNA sequencing reads obtained for applications such as genome and transcriptome assembly, error correction, multiple sequence alignment, and repeat detection. Recently, several techniques have been developed to count k-mers in large sequencing datasets, with a trade-off between the time and memory required to perform this function. We assessed several k-mer counting programs and evaluated their relative performance, primarily on the basis of runtime and memory usage. We also considered additional parameters such as disk usage, accuracy, parallelism, the impact of compressed input, performance in terms of counting large k values and the scalability of the application to larger datasets.We make specific recommendations for the setup of a current state-of-the-art program and suggestions for further development.
Abdullah M, Furtado A, Masouleh A, Okemo P, Henry R BMC Genomics. 2025; 26(1):54.
PMID: 39838314 PMC: 11748844. DOI: 10.1186/s12864-025-11246-0.
Luleci H, Ari Yuka S, Yilmaz A Interdiscip Sci. 2024; .
PMID: 39432054 DOI: 10.1007/s12539-024-00659-2.
The genomes of Australian wild limes.
Nakandala U, Furtado A, Masouleh A, Smith M, Mason P, Williams D Plant Mol Biol. 2024; 114(5):102.
PMID: 39316221 PMC: 11422456. DOI: 10.1007/s11103-024-01502-4.
A survey of k-mer methods and applications in bioinformatics.
Moeckel C, Mareboina M, Konnaris M, Chan C, Mouratidis I, Montgomery A Comput Struct Biotechnol J. 2024; 23:2289-2303.
PMID: 38840832 PMC: 11152613. DOI: 10.1016/j.csbj.2024.05.025.
The genome of Citrus australasica reveals disease resistance and other species specific genes.
Nakandala U, Furtado A, Masouleh A, Smith M, Williams D, Henry R BMC Plant Biol. 2024; 24(1):260.
PMID: 38594608 PMC: 11005238. DOI: 10.1186/s12870-024-04988-8.