A Fast, Lock-free Approach for Efficient Parallel Counting of Occurrences of K-mers
Overview
Authors
Affiliations
Motivation: Counting the number of occurrences of every k-mer (substring of length k) in a long string is a central subproblem in many applications, including genome assembly, error correction of sequencing reads, fast multiple sequence alignment and repeat detection. Recently, the deep sequence coverage generated by next-generation sequencing technologies has caused the amount of sequence to be processed during a genome project to grow rapidly, and has rendered current k-mer counting tools too slow and memory intensive. At the same time, large multicore computers have become commonplace in research facilities allowing for a new parallel computational paradigm.
Results: We propose a new k-mer counting algorithm and associated implementation, called Jellyfish, which is fast and memory efficient. It is based on a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length. Due to their flexibility, suffix arrays have been the data structure of choice for solving many string problems. For the task of k-mer counting, important in many biological applications, Jellyfish offers a much faster and more memory-efficient solution.
Availability: The Jellyfish software is written in C++ and is GPL licensed. It is available for download at http://www.cbcb.umd.edu/software/jellyfish.
Chromosome-level genome assembly of a critically endangered species Leuciscus chuanchicus.
Wang Q, Zhou Q, Liu H, Li J, Jiang Y Sci Data. 2025; 12(1):441.
PMID: 40089515 DOI: 10.1038/s41597-025-04787-2.
A near-complete genome assembly of Fragaria iinumae.
Du H, He Y, Chen M, Zheng X, Gui D, Tang J BMC Genomics. 2025; 26(1):253.
PMID: 40087556 DOI: 10.1186/s12864-025-11440-0.
Kim J, Lee J, Kang J, Shim H, Kang D, Lee S Sci Rep. 2025; 15(1):8764.
PMID: 40082484 PMC: 11906797. DOI: 10.1038/s41598-025-92115-4.
Chen C, Hu H, Guo H, Xia X, Zhang Z, Nong B Rice (N Y). 2025; 18(1):15.
PMID: 40082317 PMC: 11906960. DOI: 10.1186/s12284-025-00769-5.
Evaluating long-read assemblers to assemble several aphididae genomes.
Burger N, Nicolis V, Botha A Brief Bioinform. 2025; 26(2).
PMID: 40079265 PMC: 11904405. DOI: 10.1093/bib/bbaf105.