TGS-GapCloser: A Fast and Accurate Gap Closer for Large Genomes with Low Coverage of Error-prone Long Reads
Overview
Authors
Affiliations
Background: Analyses that use genome assemblies are critically affected by the contiguity, completeness, and accuracy of those assemblies. In recent years single-molecule sequencing techniques generating long-read information have become available and enabled substantial improvement in contig length and genome completeness, especially for large genomes (>100 Mb), although bioinformatic tools for these applications are still limited.
Findings: We developed a software tool to close sequence gaps in genome assemblies, TGS-GapCloser, that uses low-depth (∼10×) long single-molecule reads. The algorithm extracts reads that bridge gap regions between 2 contigs within a scaffold, error corrects only the candidate reads, and assigns the best sequence data to each gap. As a demonstration, we used TGS-GapCloser to improve the scaftig NG50 value of 3 human genome assemblies by 24-fold on average with only ∼10× coverage of Oxford Nanopore or Pacific Biosciences reads, covering with sequence data up to 94.8% gaps with 97.7% positive predictive value. These improved assemblies achieve 99.998% (Q46) single-base accuracy with final inserted sequences having 99.97% (Q35) accuracy, despite the high raw error rate of single-molecule reads, enabling high-quality downstream analyses, including up to a 31-fold increase in the scaftig NGA50 and up to 13.1% more complete BUSCO genes. Additionally, we show that even in ultra-large genome assemblies, such as the ginkgo (∼12 Gb), TGS-GapCloser can cover 71.6% of gaps with sequence data.
Conclusions: TGS-GapCloser can close gaps in large genome assemblies using raw long reads quickly and cost-effectively. The final assemblies generated by TGS-GapCloser have improved contiguity and completeness while maintaining high accuracy. The software is available at https://github.com/BGI-Qingdao/TGS-GapCloser.
Chromosome-level genome assembly of a specialist walnut pest Atrijuglans aristata.
Feng D, Sun C, Li Y, Gao Q, Wang G, Li H Sci Data. 2025; 12(1):434.
PMID: 40075062 PMC: 11904212. DOI: 10.1038/s41597-025-04754-x.
A chromosomal-level genome assembly of Begonia fimbristipula (Begoniaceae).
Xiao T, Wang Z, Yan H Sci Data. 2025; 12(1):429.
PMID: 40074751 PMC: 11904028. DOI: 10.1038/s41597-025-04768-5.
Origin and de novo domestication of sweet orange.
Liu S, Xu Y, Yang K, Huang Y, Lu Z, Chen S Nat Genet. 2025; 57(3):754-762.
PMID: 40045092 PMC: 11906365. DOI: 10.1038/s41588-025-02122-4.
Guo L, Wang X, Ayhan D, Rhaman M, Yan M, Jiang J Nat Genet. 2025; 57(3):741-753.
PMID: 40011682 DOI: 10.1038/s41588-025-02111-7.
Peng Y, He S, Chen B, Li T Insects. 2025; 16(2).
PMID: 40003846 PMC: 11856612. DOI: 10.3390/insects16020217.