» Articles » PMID: 32893860

TGS-GapCloser: A Fast and Accurate Gap Closer for Large Genomes with Low Coverage of Error-prone Long Reads

Overview
Journal Gigascience
Specialties Biology
Genetics
Date 2020 Sep 7
PMID 32893860
Citations 141
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Analyses that use genome assemblies are critically affected by the contiguity, completeness, and accuracy of those assemblies. In recent years single-molecule sequencing techniques generating long-read information have become available and enabled substantial improvement in contig length and genome completeness, especially for large genomes (>100 Mb), although bioinformatic tools for these applications are still limited.

Findings: We developed a software tool to close sequence gaps in genome assemblies, TGS-GapCloser, that uses low-depth (∼10×) long single-molecule reads. The algorithm extracts reads that bridge gap regions between 2 contigs within a scaffold, error corrects only the candidate reads, and assigns the best sequence data to each gap. As a demonstration, we used TGS-GapCloser to improve the scaftig NG50 value of 3 human genome assemblies by 24-fold on average with only ∼10× coverage of Oxford Nanopore or Pacific Biosciences reads, covering with sequence data up to 94.8% gaps with 97.7% positive predictive value. These improved assemblies achieve 99.998% (Q46) single-base accuracy with final inserted sequences having 99.97% (Q35) accuracy, despite the high raw error rate of single-molecule reads, enabling high-quality downstream analyses, including up to a 31-fold increase in the scaftig NGA50 and up to 13.1% more complete BUSCO genes. Additionally, we show that even in ultra-large genome assemblies, such as the ginkgo (∼12 Gb), TGS-GapCloser can cover 71.6% of gaps with sequence data.

Conclusions: TGS-GapCloser can close gaps in large genome assemblies using raw long reads quickly and cost-effectively. The final assemblies generated by TGS-GapCloser have improved contiguity and completeness while maintaining high accuracy. The software is available at https://github.com/BGI-Qingdao/TGS-GapCloser.

Citing Articles

Chromosome-level genome assembly of a specialist walnut pest Atrijuglans aristata.

Feng D, Sun C, Li Y, Gao Q, Wang G, Li H Sci Data. 2025; 12(1):434.

PMID: 40075062 PMC: 11904212. DOI: 10.1038/s41597-025-04754-x.


A chromosomal-level genome assembly of Begonia fimbristipula (Begoniaceae).

Xiao T, Wang Z, Yan H Sci Data. 2025; 12(1):429.

PMID: 40074751 PMC: 11904028. DOI: 10.1038/s41597-025-04768-5.


Origin and de novo domestication of sweet orange.

Liu S, Xu Y, Yang K, Huang Y, Lu Z, Chen S Nat Genet. 2025; 57(3):754-762.

PMID: 40045092 PMC: 11906365. DOI: 10.1038/s41588-025-02122-4.


Super pangenome of Vitis empowers identification of downy mildew resistance genes for grapevine improvement.

Guo L, Wang X, Ayhan D, Rhaman M, Yan M, Jiang J Nat Genet. 2025; 57(3):741-753.

PMID: 40011682 DOI: 10.1038/s41588-025-02111-7.


An Integrative Phylogenetic Analysis of the Genus Spinola (Hymenoptera: Vespidae: Eumeninae) from China Based on Morphology, Genomic Data and Geographical Distribution.

Peng Y, He S, Chen B, Li T Insects. 2025; 16(2).

PMID: 40003846 PMC: 11856612. DOI: 10.3390/insects16020217.


References
1.
Zook J, Chapman B, Wang J, Mittelman D, Hofmann O, Hide W . Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat Biotechnol. 2014; 32(3):246-51. DOI: 10.1038/nbt.2835. View

2.
Catasti P, Chen X, Mariappan S, Bradbury E, Gupta G . DNA repeats in the human genome. Genetica. 2000; 106(1-2):15-36. DOI: 10.1023/a:1003716509180. View

3.
Guan R, Zhao Y, Zhang H, Fan G, Liu X, Zhou W . Draft genome of the living fossil Ginkgo biloba. Gigascience. 2016; 5(1):49. PMC: 5118899. DOI: 10.1186/s13742-016-0154-1. View

4.
Simao F, Waterhouse R, Ioannidis P, Kriventseva E, Zdobnov E . BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015; 31(19):3210-2. DOI: 10.1093/bioinformatics/btv351. View

5.
Li Y, Hu Y, Bolund L, Wang J . State of the art de novo assembly of human genomes from massively parallel sequencing data. Hum Genomics. 2010; 4(4):271-7. PMC: 3525208. DOI: 10.1186/1479-7364-4-4-271. View