» Articles » PMID: 29949989

A Graph-based Approach to Diploid Genome Assembly

Overview
Journal Bioinformatics
Specialty Biology
Date 2018 Jun 29
PMID 29949989
Citations 28
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Constructing high-quality haplotype-resolved de novo assemblies of diploid genomes is important for revealing the full extent of structural variation and its role in health and disease. Current assembly approaches often collapse the two sequences into one haploid consensus sequence and, therefore, fail to capture the diploid nature of the organism under study. Thus, building an assembler capable of producing accurate and complete diploid assemblies, while being resource-efficient with respect to sequencing costs, is a key challenge to be addressed by the bioinformatics community.

Results: We present a novel graph-based approach to diploid assembly, which combines accurate Illumina data and long-read Pacific Biosciences (PacBio) data. We demonstrate the effectiveness of our method on a pseudo-diploid yeast genome and show that we require as little as 50× coverage Illumina data and 10× PacBio data to generate accurate and complete assemblies. Additionally, we show that our approach has the ability to detect and phase structural variants.

Availability And Implementation: https://github.com/whatshap/whatshap.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Graphasing: phasing diploid genome assembly graphs with single-cell strand sequencing.

Henglin M, Ghareghani M, Harvey W, Porubsky D, Koren S, Eichler E Genome Biol. 2024; 25(1):265.

PMID: 39390579 PMC: 11466045. DOI: 10.1186/s13059-024-03409-1.


Phasing Diploid Genome Assembly Graphs with Single-Cell Strand Sequencing.

Henglin M, Ghareghani M, Harvey W, Porubsky D, Koren S, Eichler E bioRxiv. 2024; .

PMID: 38529499 PMC: 10962706. DOI: 10.1101/2024.02.15.580432.


STAR+WASP reduces reference bias in the allele-specific mapping of RNA-seq reads.

Asiimwe R, Alexander D bioRxiv. 2024; .

PMID: 38370773 PMC: 10871176. DOI: 10.1101/2024.01.21.576391.


Co-linear chaining on pangenome graphs.

Rajput J, Chandra G, Jain C Algorithms Mol Biol. 2024; 19(1):4.

PMID: 38279113 PMC: 11288099. DOI: 10.1186/s13015-024-00250-w.


Decoding the fibromelanosis locus complex chromosomal rearrangement of black-bone chicken: genetic differentiation, selective sweeps and protein-coding changes in Kadaknath chicken.

Shinde S, Sharma A, Vijay N Front Genet. 2023; 14:1180658.

PMID: 37424723 PMC: 10325862. DOI: 10.3389/fgene.2023.1180658.


References
1.
Nagarajan N, Pop M . Sequence assembly demystified. Nat Rev Genet. 2013; 14(3):157-67. DOI: 10.1038/nrg3367. View

2.
Antipov D, Korobeynikov A, McLean J, Pevzner P . hybridSPAdes: an algorithm for hybrid assembly of short and long reads. Bioinformatics. 2015; 32(7):1009-15. PMC: 4907386. DOI: 10.1093/bioinformatics/btv688. View

3.
Simpson J, Durbin R . Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 2011; 22(3):549-56. PMC: 3290790. DOI: 10.1101/gr.126953.111. View

4.
Paten B, Eizenga J, Rosen Y, Novak A, Garrison E, Hickey G . Superbubbles, Ultrabubbles, and Cacti. J Comput Biol. 2018; 25(7):649-663. PMC: 6067107. DOI: 10.1089/cmb.2017.0251. View

5.
Mostovoy Y, Levy-Sakin M, Lam J, Lam E, Hastie A, Marks P . A hybrid approach for de novo human genome sequence assembly and phasing. Nat Methods. 2016; 13(7):587-90. PMC: 4927370. DOI: 10.1038/nmeth.3865. View