» Articles » PMID: 27573208

DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies

Overview
Journal Sci Rep
Specialty Science
Date 2016 Aug 31
PMID 27573208
Citations 167
Authors
Affiliations
Soon will be listed here.
Abstract

The highly anticipated transition from next generation sequencing (NGS) to third generation sequencing (3GS) has been difficult primarily due to high error rates and excessive sequencing cost. The high error rates make the assembly of long erroneous reads of large genomes challenging because existing software solutions are often overwhelmed by error correction tasks. Here we report a hybrid assembly approach that simultaneously utilizes NGS and 3GS data to address both issues. We gain advantages from three general and basic design principles: (i) Compact representation of the long reads leads to efficient alignments. (ii) Base-level errors can be skipped; structural errors need to be detected and corrected. (iii) Structurally correct 3GS reads are assembled and polished. In our implementation, preassembled NGS contigs are used to derive the compact representation of the long reads, motivating an algorithmic conversion from a de Bruijn graph to an overlap graph, the two major assembly paradigms. Moreover, since NGS and 3GS data can compensate for each other, our hybrid assembly approach reduces both of their sequencing requirements. Experiments show that our software is able to assemble mammalian-sized genomes orders of magnitude more quickly than existing methods without consuming a lot of memory, while saving about half of the sequencing cost.

Citing Articles

Whole genome sequencing, assembly and annotation of the Southern Ground Hornbill - Bucorvus leadbeateri.

Patel J, Botes A, Mollett J, De Maayer P Sci Data. 2025; 12(1):58.

PMID: 39799121 PMC: 11724890. DOI: 10.1038/s41597-025-04412-2.


Chromosome-level genome assemblies and genetic maps reveal heterochiasmy and macrosynteny in endangered Atlantic Acropora.

Locatelli N, Kitchen S, Stankiewicz K, Osborne C, Dellaert Z, Elder H BMC Genomics. 2024; 25(1):1119.

PMID: 39567907 PMC: 11577847. DOI: 10.1186/s12864-024-11025-3.


Chromosome level assemblies of Nakaseomyces (Candida) bracarensis uncover two distinct clades and define its adhesin repertoire.

Marcet-Houben M, Ksiezopolska E, Gabaldon T BMC Genomics. 2024; 25(1):1053.

PMID: 39511470 PMC: 11542307. DOI: 10.1186/s12864-024-10979-8.


Fundamental Patterns of Structural Evolution Revealed by Chromosome-Length Genomes of Cactophilic Drosophila.

Benowitz K, Allan C, Jaworski C, Sanderson M, Diaz F, Chen X Genome Biol Evol. 2024; 16(9).

PMID: 39228294 PMC: 11411373. DOI: 10.1093/gbe/evae191.


Chromosome-level genome assembly of the glass catfish ( ) reveals molecular clues to its transparent phenotype.

Bian C, Li R, Ruan Z, Chen W, Huang Y, Liu L Zool Res. 2024; 45(5):1027-1036.

PMID: 39147717 PMC: 11491783. DOI: 10.24272/j.issn.2095-8137.2023.396.


References
1.
Batzoglou S, Jaffe D, Stanley K, Butler J, Gnerre S, Mauceli E . ARACHNE: a whole-genome shotgun assembler. Genome Res. 2002; 12(1):177-89. PMC: 155255. DOI: 10.1101/gr.208902. View

2.
Salmela L, Rivals E . LoRDEC: accurate and efficient long read error correction. Bioinformatics. 2014; 30(24):3506-14. PMC: 4253826. DOI: 10.1093/bioinformatics/btu538. View

3.
Miller J, Delcher A, Koren S, Venter E, Walenz B, Brownley A . Aggressive assembly of pyrosequencing reads with mates. Bioinformatics. 2008; 24(24):2818-24. PMC: 2639302. DOI: 10.1093/bioinformatics/btn548. View

4.
Ribeiro F, Przybylski D, Yin S, Sharpe T, Gnerre S, Abouelleil A . Finished bacterial genomes from shotgun sequence data. Genome Res. 2012; 22(11):2270-7. PMC: 3483556. DOI: 10.1101/gr.141515.112. View

5.
Hackl T, Hedrich R, Schultz J, Forster F . proovread: large-scale high-accuracy PacBio correction through iterative short read consensus. Bioinformatics. 2014; 30(21):3004-11. PMC: 4609002. DOI: 10.1093/bioinformatics/btu392. View