» Articles » PMID: 34725481

Haplotype-aware Variant Calling with PEPPER-Margin-DeepVariant Enables High Accuracy in Nanopore Long-reads

Abstract

Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read-based phasing. Third-generation nanopore sequence data have demonstrated a long read length, but current interpretation methods for their novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline, PEPPER-Margin-DeepVariant, that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single-nucleotide-variant identification method at the whole-genome scale and produces high-quality single-nucleotide variants in segmental duplications and low-mappability regions where short-read-based genotyping fails. We show that our pipeline can provide highly contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% and 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance over the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio HiFi-polished).

Citing Articles

Diagnostic utility of single-locus DNA methylation mark in Sotos syndrome developed by nanopore sequencing-based episignature.

Mizuguchi T, Okamoto N, Hara T, Nishimura N, Sakamoto M, Fu L Clin Epigenetics. 2025; 17(1):27.

PMID: 39966947 PMC: 11837588. DOI: 10.1186/s13148-025-01832-0.


NAVIP: Unraveling the influence of neighboring small sequence variants on functional impact prediction.

Baasner J, Rempel A, Howard D, Pucker B PLoS Comput Biol. 2025; 21(2):e1012732.

PMID: 39964984 PMC: 11849982. DOI: 10.1371/journal.pcbi.1012732.


Establishment of a high-risk pediatric AML-derived cell line YCU-AML2 with genetic and metabolic vulnerabilities.

Ikeda J, Shiba N, Kato S, Kunimoto H, Saito Y, Sagisaka M Int J Hematol. 2025; .

PMID: 39891826 DOI: 10.1007/s12185-025-03929-x.


Evaluation of long-read sequencing for Ostreid herpesvirus type 1 genome characterization from infected tissues.

Dotto-Maurel A, Pelletier C, Degremont L, Heurtebise S, Arzul I, Morga B Microbiol Spectr. 2025; 13(3):e0208224.

PMID: 39846760 PMC: 11878034. DOI: 10.1128/spectrum.02082-24.


Long-read sequencing reveals novel genetic polymorphisms in the major histocompatibility complex region and their impacts on the Han Chinese population.

Zhou C, Gong T, Li S, Jin L, Fan S Sci China Life Sci. 2025; .

PMID: 39821835 DOI: 10.1007/s11427-024-2742-y.


References
1.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A . The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010; 20(9):1297-303. PMC: 2928508. DOI: 10.1101/gr.107524.110. View

2.
Miga K, Koren S, Rhie A, Vollger M, Gershman A, Bzikadze A . Telomere-to-telomere assembly of a complete human X chromosome. Nature. 2020; 585(7823):79-84. PMC: 7484160. DOI: 10.1038/s41586-020-2547-7. View

3.
Patterson M, Marschall T, Pisanti N, van Iersel L, Stougie L, Klau G . WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing Reads. J Comput Biol. 2015; 22(6):498-509. DOI: 10.1089/cmb.2014.0157. View

4.
Eichler E, Clark R, She X . An assessment of the sequence gaps: unfinished business in a finished human genome. Nat Rev Genet. 2004; 5(5):345-54. DOI: 10.1038/nrg1322. View

5.
Wenger A, Peluso P, Rowell W, Chang P, Hall R, Concepcion G . Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol. 2019; 37(10):1155-1162. PMC: 6776680. DOI: 10.1038/s41587-019-0217-9. View