» Articles » PMID: 28396522

De Novo Assembly of Viral Quasispecies Using Overlap Graphs

Overview
Journal Genome Res
Specialty Genetics
Date 2017 Apr 12
PMID 28396522
Citations 46
Authors
Affiliations
Soon will be listed here.
Abstract

A viral quasispecies, the ensemble of viral strains populating an infected person, can be highly diverse. For optimal assessment of virulence, pathogenesis, and therapy selection, determining the haplotypes of the individual strains can play a key role. As many viruses are subject to high mutation and recombination rates, high-quality reference genomes are often not available at the time of a new disease outbreak. We present SAVAGE, a computational tool for reconstructing individual haplotypes of intra-host virus strains without the need for a high-quality reference genome. SAVAGE makes use of either FM-index-based data structures or ad hoc consensus reference sequence for constructing overlap graphs from patient sample data. In this overlap graph, nodes represent reads and/or contigs, while edges reflect that two reads/contigs, based on sound statistical considerations, represent identical haplotypic sequence. Following an iterative scheme, a new overlap assembly algorithm that is based on the enumeration of statistically well-calibrated groups of reads/contigs then efficiently reconstructs the individual haplotypes from this overlap graph. In benchmark experiments on simulated and on real deep-coverage data, SAVAGE drastically outperforms generic de novo assemblers as well as the only specialized de novo viral quasispecies assembler available so far. When run on ad hoc consensus reference sequence, SAVAGE performs very favorably in comparison with state-of-the-art reference genome-guided tools. We also apply SAVAGE on two deep-coverage samples of patients infected by the Zika and the hepatitis C virus, respectively, which sheds light on the genetic structures of the respective viral quasispecies.

Citing Articles

Accurate assembly of full-length consensus for viral quasispecies.

Tian J, Gao Z, Li M, Bao E, Zhao J BMC Bioinformatics. 2025; 26(1):36.

PMID: 39893441 PMC: 11787740. DOI: 10.1186/s12859-025-06045-z.


Telomere-to-telomere assembly by preserving contained reads.

Kamath S, Bindra M, Pal D, Jain C Genome Res. 2024; 34(11):1908-1918.

PMID: 39406502 PMC: 11610600. DOI: 10.1101/gr.279311.124.


HyLight: Strain aware assembly of low coverage metagenomes.

Kang X, Zhang W, Li Y, Luo X, Schonhuth A Nat Commun. 2024; 15(1):8665.

PMID: 39375348 PMC: 11458758. DOI: 10.1038/s41467-024-52907-0.


Strain-resolved de-novo metagenomic assembly of viral genomes and microbial 16S rRNAs.

Jochheim A, Jochheim F, Kolodyazhnaya A, Morice E, Steinegger M, Soding J Microbiome. 2024; 12(1):187.

PMID: 39354646 PMC: 11443906. DOI: 10.1186/s40168-024-01904-y.


Unlocking plant genetics with telomere-to-telomere genome assemblies.

Garg V, Bohra A, Mascher M, Spannagl M, Xu X, Bevan M Nat Genet. 2024; 56(9):1788-1799.

PMID: 39048791 DOI: 10.1038/s41588-024-01830-7.


References
1.
Astrovskaya I, Tork B, Mangul S, Westbrooks K, Mandoiu I, Balfe P . Inferring viral quasispecies spectra from 454 pyrosequencing reads. BMC Bioinformatics. 2011; 12 Suppl 6:S1. PMC: 3194189. DOI: 10.1186/1471-2105-12-S6-S1. View

2.
Di Giallonardo F, Topfer A, Rey M, Prabhakaran S, Duport Y, Leemann C . Full-length haplotype reconstruction to infer the structure of heterogeneous virus populations. Nucleic Acids Res. 2014; 42(14):e115. PMC: 4132706. DOI: 10.1093/nar/gku537. View

3.
Quince C, Lanzen A, Davenport R, Turnbaugh P . Removing noise from pyrosequenced amplicons. BMC Bioinformatics. 2011; 12:38. PMC: 3045300. DOI: 10.1186/1471-2105-12-38. View

4.
Bankevich A, Nurk S, Antipov D, Gurevich A, Dvorkin M, Kulikov A . SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012; 19(5):455-77. PMC: 3342519. DOI: 10.1089/cmb.2012.0021. View

5.
Gurevich A, Saveliev V, Vyahhi N, Tesler G . QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013; 29(8):1072-5. PMC: 3624806. DOI: 10.1093/bioinformatics/btt086. View