» Articles » PMID: 24932001

Accurate Viral Population Assembly from Ultra-deep Sequencing Data

Overview
Journal Bioinformatics
Specialty Biology
Date 2014 Jun 17
PMID 24932001
Citations 20
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Next-generation sequencing technologies sequence viruses with ultra-deep coverage, thus promising to revolutionize our understanding of the underlying diversity of viral populations. While the sequencing coverage is high enough that even rare viral variants are sequenced, the presence of sequencing errors makes it difficult to distinguish between rare variants and sequencing errors.

Results: In this article, we present a method to overcome the limitations of sequencing technologies and assemble a diverse viral population that allows for the detection of previously undiscovered rare variants. The proposed method consists of a high-fidelity sequencing protocol and an accurate viral population assembly method, referred to as Viral Genome Assembler (VGA). The proposed protocol is able to eliminate sequencing errors by using individual barcodes attached to the sequencing fragments. Highly accurate data in combination with deep coverage allow VGA to assemble rare variants. VGA uses an expectation-maximization algorithm to estimate abundances of the assembled viral variants in the population. RESULTS on both synthetic and real datasets show that our method is able to accurately assemble an HIV viral population and detect rare variants previously undetectable due to sequencing errors. VGA outperforms state-of-the-art methods for genome-wide viral assembly. Furthermore, our method is the first viral assembly method that scales to millions of sequencing reads.

Availability: Our tool VGA is freely available at http://genetics.cs.ucla.edu/vga/

Citing Articles

Accurate assembly of minority viral haplotypes from next-generation sequencing through efficient noise reduction.

Knyazev S, Tsyvina V, Shankar A, Melnyk A, Artyomenko A, Malygina T Nucleic Acids Res. 2021; 49(17):e102.

PMID: 34214168 PMC: 8464054. DOI: 10.1093/nar/gkab576.


Alphaherpesvirus Genomics: Past, Present and Future.

Kuny C, Szpara M Curr Issues Mol Biol. 2020; 42:41-80.

PMID: 33159012 PMC: 7946737. DOI: 10.21775/cimb.042.041.


Streamlined Subpopulation, Subtype, and Recombination Analysis of HIV-1 Half-Genome Sequences Generated by High-Throughput Sequencing.

Hora B, Gulzar N, Chen Y, Karagiannis K, Cai F, Su C mSphere. 2020; 5(5).

PMID: 33055255 PMC: 7565892. DOI: 10.1128/mSphere.00551-20.


High-Quality Resolution of the Outbreak-Related Zika Virus Genome and Discovery of New Viruses Using Ion Torrent-Based Metatranscriptomics.

Sardi S, H Carvalho R, Pacheco L, P D Almeida J, M D A Belitardo E, Pinheiro C Viruses. 2020; 12(7).

PMID: 32708079 PMC: 7411838. DOI: 10.3390/v12070782.


Epidemiological data analysis of viral quasispecies in the next-generation sequencing era.

Knyazev S, Hughes L, Skums P, Zelikovsky A Brief Bioinform. 2020; 22(1):96-108.

PMID: 32568371 PMC: 8485218. DOI: 10.1093/bib/bbaa101.


References
1.
Liu J, Miller M, Danovich R, Vandergrift N, Cai F, Hicks C . Analysis of low-frequency mutations associated with drug resistance to raltegravir before antiretroviral treatment. Antimicrob Agents Chemother. 2010; 55(3):1114-9. PMC: 3067114. DOI: 10.1128/AAC.01492-10. View

2.
Ndungu T, Weiss R . On HIV diversity. AIDS. 2012; 26(10):1255-60. DOI: 10.1097/QAD.0b013e32835461b5. View

3.
Hormozdiari F, Alkan C, Eichler E, Sahinalp S . Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res. 2009; 19(7):1270-8. PMC: 2704429. DOI: 10.1101/gr.088633.108. View

4.
Zagordi O, Geyrhofer L, Roth V, Beerenwinkel N . Deep sequencing of a genetically heterogeneous sample: local haplotype reconstruction and read error correction. J Comput Biol. 2010; 17(3):417-28. DOI: 10.1089/cmb.2009.0164. View

5.
Metzker M . Sequencing technologies - the next generation. Nat Rev Genet. 2009; 11(1):31-46. DOI: 10.1038/nrg2626. View