» Articles » PMID: 38800997

Klumpy: A Tool to Evaluate the Integrity of Long-read Genome Assemblies and Illusive Sequence Motifs

Overview
Authors
Affiliations
Soon will be listed here.
Abstract

The improvement and decreasing costs of third-generation sequencing technologies has widened the scope of biological questions researchers can address with de novo genome assemblies. With the increasing number of reference genomes, validating their integrity with minimal overhead is vital for establishing confident results in their applications. Here, we present Klumpy, a tool for detecting and visualizing both misassembled regions in a genome assembly and genetic elements (e.g. genes) of interest in a set of sequences. By leveraging the initial raw reads in combination with their respective genome assembly, we illustrate Klumpy's utility by investigating antifreeze glycoprotein (afgp) loci across two icefishes, by searching for a reported absent gene in the northern snakehead fish, and by scanning the reference genomes of a mudskipper and bumblebee for misassembled regions. In the two former cases, we were able to provide support for the noncanonical placement of an afgp locus in the icefishes and locate the missing snakehead gene. Furthermore, our genome scans were able identify an unmappable locus in the mudskipper reference genome and identify a putative repetitive element shared among several species of bees.

Citing Articles

Common Ancestry of the Id Locus: Chromosomal Rearrangement and Polygenic Possibilities.

Sharma A, Vijay N J Mol Evol. 2025; 93(1):163-180.

PMID: 39821315 DOI: 10.1007/s00239-025-10233-z.


The genome of the cryopelagic Antarctic bald notothen, Trematomus borchgrevinki.

Rayamajhi N, Rivera-Colon A, Minhas B, Cheng C, Catchen J G3 (Bethesda). 2024; 15(1.

PMID: 39549265 PMC: 11708224. DOI: 10.1093/g3journal/jkae267.


Klumpy: A tool to evaluate the integrity of long-read genome assemblies and illusive sequence motifs.

Madrigal G, Minhas B, Catchen J Mol Ecol Resour. 2024; 25(1):e13982.

PMID: 38800997 PMC: 11646305. DOI: 10.1111/1755-0998.13982.

References
1.
Huddleston J, Ranade S, Malig M, Antonacci F, Chaisson M, Hon L . Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res. 2014; 24(4):688-96. PMC: 3975067. DOI: 10.1101/gr.168450.113. View

2.
Bian C, Huang Y, Li R, Xu P, You X, Lv Y . Genomics comparisons of three chromosome-level mudskipper genome assemblies reveal molecular clues for water-to-land evolution and adaptation. J Adv Res. 2023; 58:93-104. PMC: 10982859. DOI: 10.1016/j.jare.2023.05.005. View

3.
Kelley D, Salzberg S . Detection and correction of false segmental duplications caused by genome mis-assembly. Genome Biol. 2010; 11(3):R28. PMC: 2864568. DOI: 10.1186/gb-2010-11-3-r28. View

4.
Zhu X, Leung H, Wang R, Chin F, Yiu S, Quan G . misFinder: identify mis-assemblies in an unbiased manner using reference and paired-end reads. BMC Bioinformatics. 2015; 16:386. PMC: 4647709. DOI: 10.1186/s12859-015-0818-3. View

5.
Xu J, Strange J, Welker D, James R . Detoxification and stress response genes expressed in a western North American bumble bee, Bombus huntii (Hymenoptera: Apidae). BMC Genomics. 2013; 14:874. PMC: 3878831. DOI: 10.1186/1471-2164-14-874. View