» Articles » PMID: 32131723

Family Reunion Via Error Correction: an Efficient Analysis of Duplex Sequencing Data

Overview
Publisher Biomed Central
Specialty Biology
Date 2020 Mar 6
PMID 32131723
Citations 7
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Duplex sequencing is the most accurate approach for identification of sequence variants present at very low frequencies. Its power comes from pooling together multiple descendants of both strands of original DNA molecules, which allows distinguishing true nucleotide substitutions from PCR amplification and sequencing artifacts. This strategy comes at a cost-sequencing the same molecule multiple times increases dynamic range but significantly diminishes coverage, making whole genome duplex sequencing prohibitively expensive. Furthermore, every duplex experiment produces a substantial proportion of singleton reads that cannot be used in the analysis and are thrown away.

Results: In this paper we demonstrate that a significant fraction of these reads contains PCR or sequencing errors within duplex tags. Correction of such errors allows "reuniting" these reads with their respective families increasing the output of the method and making it more cost effective.

Conclusions: We combine an error correction strategy with a number of algorithmic improvements in a new version of the duplex analysis software, Du Novo 2.0. It is written in Python, C, AWK, and Bash. It is open source and readily available through Galaxy, Bioconda, and Github: https://github.com/galaxyproject/dunovo.

Citing Articles

Mitochondrial DNA mutations in human oocytes undergo frequency-dependent selection but do not increase with age.

Arbeithuber B, Anthony K, Higgins B, Oppelt P, Shebl O, Tiemann-Boege I bioRxiv. 2024; .

PMID: 39713397 PMC: 11661235. DOI: 10.1101/2024.12.09.627454.


Estimating somatic mutation rates by bottlenecked duplex sequencing in non-model organisms: as a case study.

Sobel E, Coate J, Schaack S J Biol Methods. 2023; 9(3):e165.

PMID: 36992917 PMC: 10040303. DOI: 10.14440/jbm.2022.391.


Advanced age increases frequencies of de novo mitochondrial mutations in macaque oocytes and somatic tissues.

Arbeithuber B, Cremona M, Hester J, Barrett A, Higgins B, Anthony K Proc Natl Acad Sci U S A. 2022; 119(15):e2118740119.

PMID: 35394879 PMC: 9169796. DOI: 10.1073/pnas.2118740119.


Discovery of an unusually high number of de novo mutations in sperm of older men using duplex sequencing.

Salazar R, Arbeithuber B, Ivankovic M, Heinzl M, Moura S, Hartl I Genome Res. 2022; 32(3):499-511.

PMID: 35210354 PMC: 8896467. DOI: 10.1101/gr.275695.121.


Physiological magnesium concentrations increase fidelity of diverse reverse transcriptases from HIV-1, HIV-2, and foamy virus, but not MuLV or AMV.

Wang R, Belew A, Achuthan V, Sayed N, DeStefano J J Gen Virol. 2021; 102(12).

PMID: 34904939 PMC: 10019084. DOI: 10.1099/jgv.0.001708.


References
1.
Langmead B, Trapnell C, Pop M, Salzberg S . Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009; 10(3):R25. PMC: 2690996. DOI: 10.1186/gb-2009-10-3-r25. View

2.
Xu C, Gu X, Padmanabhan R, Wu Z, Peng Q, Dicarlo J . smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers. Bioinformatics. 2018; 35(8):1299-1309. PMC: 6477992. DOI: 10.1093/bioinformatics/bty790. View

3.
Larionov A, Krause A, Miller W . A standard curve based method for relative real time PCR data processing. BMC Bioinformatics. 2005; 6:62. PMC: 1274258. DOI: 10.1186/1471-2105-6-62. View

4.
Mei H, Arbeithuber B, Cremona M, DeGiorgio M, Nekrutenko A . A High-Resolution View of Adaptive Event Dynamics in a Plasmid. Genome Biol Evol. 2019; 11(10):3022-3034. PMC: 6827461. DOI: 10.1093/gbe/evz197. View

5.
Stoler N, Arbeithuber B, Guiblet W, Makova K, Nekrutenko A . Streamlined analysis of duplex sequencing data with Du Novo. Genome Biol. 2016; 17(1):180. PMC: 5000403. DOI: 10.1186/s13059-016-1039-4. View