» Articles » PMID: 21217122

A Fast, Lock-free Approach for Efficient Parallel Counting of Occurrences of K-mers

Overview
Journal Bioinformatics
Specialty Biology
Date 2011 Jan 11
PMID 21217122
Citations 2019
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Counting the number of occurrences of every k-mer (substring of length k) in a long string is a central subproblem in many applications, including genome assembly, error correction of sequencing reads, fast multiple sequence alignment and repeat detection. Recently, the deep sequence coverage generated by next-generation sequencing technologies has caused the amount of sequence to be processed during a genome project to grow rapidly, and has rendered current k-mer counting tools too slow and memory intensive. At the same time, large multicore computers have become commonplace in research facilities allowing for a new parallel computational paradigm.

Results: We propose a new k-mer counting algorithm and associated implementation, called Jellyfish, which is fast and memory efficient. It is based on a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length. Due to their flexibility, suffix arrays have been the data structure of choice for solving many string problems. For the task of k-mer counting, important in many biological applications, Jellyfish offers a much faster and more memory-efficient solution.

Availability: The Jellyfish software is written in C++ and is GPL licensed. It is available for download at http://www.cbcb.umd.edu/software/jellyfish.

Citing Articles

Chromosome-level genome assembly of a critically endangered species Leuciscus chuanchicus.

Wang Q, Zhou Q, Liu H, Li J, Jiang Y Sci Data. 2025; 12(1):441.

PMID: 40089515 DOI: 10.1038/s41597-025-04787-2.


A near-complete genome assembly of Fragaria iinumae.

Du H, He Y, Chen M, Zheng X, Gui D, Tang J BMC Genomics. 2025; 26(1):253.

PMID: 40087556 DOI: 10.1186/s12864-025-11440-0.


Contributions of interspecific hybrids to genetic variability in Glycyrrhiza uralensis and G. glabra.

Kim J, Lee J, Kang J, Shim H, Kang D, Lee S Sci Rep. 2025; 15(1):8764.

PMID: 40082484 PMC: 11906797. DOI: 10.1038/s41598-025-92115-4.


Revealing Genomic Traits and Evolutionary Insights of Oryza officinalis from Southern China Through Genome Assembly and Transcriptome Analysis.

Chen C, Hu H, Guo H, Xia X, Zhang Z, Nong B Rice (N Y). 2025; 18(1):15.

PMID: 40082317 PMC: 11906960. DOI: 10.1186/s12284-025-00769-5.


Evaluating long-read assemblers to assemble several aphididae genomes.

Burger N, Nicolis V, Botha A Brief Bioinform. 2025; 26(2).

PMID: 40079265 PMC: 11904405. DOI: 10.1093/bib/bbaf105.


References
1.
Dalloul R, Long J, Zimin A, Aslam L, Beal K, Blomberg L . Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol. 2010; 8(9). PMC: 2935454. DOI: 10.1371/journal.pbio.1000475. View

2.
Miller J, Delcher A, Koren S, Venter E, Walenz B, Brownley A . Aggressive assembly of pyrosequencing reads with mates. Bioinformatics. 2008; 24(24):2818-24. PMC: 2639302. DOI: 10.1093/bioinformatics/btn548. View

3.
Schatz M, Delcher A, Salzberg S . Assembly of large genomes using second-generation sequencing. Genome Res. 2010; 20(9):1165-73. PMC: 2928494. DOI: 10.1101/gr.101360.109. View

4.
Li R, Fan W, Tian G, Zhu H, He L, Cai J . The sequence and de novo assembly of the giant panda genome. Nature. 2009; 463(7279):311-7. PMC: 3951497. DOI: 10.1038/nature08696. View

5.
Healy J, Thomas E, Schwartz J, Wigler M . Annotating large genomes with exact word matches. Genome Res. 2003; 13(10):2306-15. PMC: 403711. DOI: 10.1101/gr.1350803. View