» Articles » PMID: 35822882

Targeted De Novo Phasing and Long-range Assembly by Template Mutagenesis

Overview
Specialty Biochemistry
Date 2022 Jul 13
PMID 35822882
Authors
Affiliations
Soon will be listed here.
Abstract

Short-read sequencers provide highly accurate reads at very low cost. Unfortunately, short reads are often inadequate for important applications such as assembly in complex regions or phasing across distant heterozygous sites. In this study, we describe novel bench protocols and algorithms to obtain haplotype-phased sequence assemblies with ultra-low error for regions 10 kb and longer using short reads only. We accomplish this by imprinting each template strand from a target region with a dense and unique mutation pattern. The mutation process randomly and independently converts ∼50% of cytosines to uracils. Sequencing libraries are made from both mutated and unmutated templates. Using de Bruijn graphs and paired-end read information, we assemble each mutated template and use the unmutated library to correct the mutated bases. Templates are partitioned into two or more haplotypes, and the final haplotypes are assembled and corrected for residual template mutations and PCR errors. With sufficient template coverage, the final assemblies have per-base error rates below 10-9. We demonstrate this method on a four-member nuclear family, correctly assembling and phasing three genomic intervals, including the highly polymorphic HLA-B gene.

Citing Articles

Accurate measurement of microsatellite length by disrupting its tandem repeat structure.

Wang Z, Moffitt A, Andrews P, Wigler M, Levy D Nucleic Acids Res. 2022; 50(20):e116.

PMID: 36095132 PMC: 9723644. DOI: 10.1093/nar/gkac723.

References
1.
Roberts S, Lawrence M, Klimczak L, Grimm S, Fargo D, Stojanov P . An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers. Nat Genet. 2013; 45(9):970-6. PMC: 3789062. DOI: 10.1038/ng.2702. View

2.
Marks P, Garcia S, Martinez Barrio A, Belhocine K, Bernate J, Bharadwaj R . Resolving the full spectrum of human genome variation using Linked-Reads. Genome Res. 2019; 29(4):635-645. PMC: 6442396. DOI: 10.1101/gr.234443.118. View

3.
Beale R, Petersen-Mahrt S, Watt I, Harris R, Rada C, Neuberger M . Comparison of the differential context-dependence of DNA deamination by APOBEC enzymes: correlation with mutation spectra in vivo. J Mol Biol. 2004; 337(3):585-96. DOI: 10.1016/j.jmb.2004.01.046. View

4.
Ebler J, Haukness M, Pesout T, Marschall T, Paten B . Haplotype-aware diplotyping from noisy long reads. Genome Biol. 2019; 20(1):116. PMC: 6547545. DOI: 10.1186/s13059-019-1709-0. View

5.
Jain M, Olsen H, Paten B, Akeson M . The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 2016; 17(1):239. PMC: 5124260. DOI: 10.1186/s13059-016-1103-0. View