» Articles » PMID: 29949976

Viral Quasispecies Reconstruction Via Tensor Factorization with Successive Read Removal

Overview
Journal Bioinformatics
Specialty Biology
Date 2018 Jun 29
PMID 29949976
Citations 10
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: As RNA viruses mutate and adapt to environmental changes, often developing resistance to anti-viral vaccines and drugs, they form an ensemble of viral strains--a viral quasispecies. While high-throughput sequencing (HTS) has enabled in-depth studies of viral quasispecies, sequencing errors and limited read lengths render the problem of reconstructing the strains and estimating their spectrum challenging. Inference of viral quasispecies is difficult due to generally non-uniform frequencies of the strains, and is further exacerbated when the genetic distances between the strains are small.

Results: This paper presents TenSQR, an algorithm that utilizes tensor factorization framework to analyze HTS data and reconstruct viral quasispecies characterized by highly uneven frequencies of its components. Fundamentally, TenSQR performs clustering with successive data removal to infer strains in a quasispecies in order from the most to the least abundant one; every time a strain is inferred, sequencing reads generated from that strain are removed from the dataset. The proposed successive strain reconstruction and data removal enables discovery of rare strains in a population and facilitates detection of deletions in such strains. Results on simulated datasets demonstrate that TenSQR can reconstruct full-length strains having widely different abundances, generally outperforming state-of-the-art methods at diversities 1-10% and detecting long deletions even in rare strains. A study on a real HIV-1 dataset demonstrates that TenSQR outperforms competing methods in experimental settings as well. Finally, we apply TenSQR to analyze a Zika virus sample and reconstruct the full-length strains it contains.

Availability And Implementation: TenSQR is available at https://github.com/SoYeonA/TenSQR.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Incipient functional SARS-CoV-2 diversification identified through neural network haplotype maps.

Delgado S, Somovilla P, Ferrer-Orta C, Martinez-Gonzalez B, Vazquez-Monteagudo S, Munoz-Flores J Proc Natl Acad Sci U S A. 2024; 121(10):e2317851121.

PMID: 38416684 PMC: 10927536. DOI: 10.1073/pnas.2317851121.


Quasispecies Fitness Partition to Characterize the Molecular Status of a Viral Population. Negative Effect of Early Ribavirin Discontinuation in a Chronically Infected HEV Patient.

Gregori J, Colomer-Castell S, Campos C, Ibanez-Lligona M, Garcia-Cehic D, Rando-Segura A Int J Mol Sci. 2022; 23(23).

PMID: 36498981 PMC: 9739305. DOI: 10.3390/ijms232314654.


HaploDMF: viral haplotype reconstruction from long reads via deep matrix factorization.

Cai D, Shang J, Sun Y Bioinformatics. 2022; 38(24):5360-5367.

PMID: 36308467 PMC: 9750122. DOI: 10.1093/bioinformatics/btac708.


SARS-CoV-2 Mutant Spectra at Different Depth Levels Reveal an Overwhelming Abundance of Low Frequency Mutations.

Martinez-Gonzalez B, Soria M, Vazquez-Sirvent L, Ferrer-Orta C, Lobo-Vega R, Minguez P Pathogens. 2022; 11(6).

PMID: 35745516 PMC: 9227345. DOI: 10.3390/pathogens11060662.


VirStrain: a strain identification tool for RNA viruses.

Liao H, Cai D, Sun Y Genome Biol. 2022; 23(1):38.

PMID: 35101081 PMC: 8801933. DOI: 10.1186/s13059-022-02609-x.


References
1.
Astrovskaya I, Tork B, Mangul S, Westbrooks K, Mandoiu I, Balfe P . Inferring viral quasispecies spectra from 454 pyrosequencing reads. BMC Bioinformatics. 2011; 12 Suppl 6:S1. PMC: 3194189. DOI: 10.1186/1471-2105-12-S6-S1. View

2.
Zagordi O, Bhattacharya A, Eriksson N, Beerenwinkel N . ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data. BMC Bioinformatics. 2011; 12:119. PMC: 3113935. DOI: 10.1186/1471-2105-12-119. View

3.
Topfer A, Zagordi O, Prabhakaran S, Roth V, Halperin E, Beerenwinkel N . Probabilistic inference of viral quasispecies subject to recombination. J Comput Biol. 2013; 20(2):113-23. PMC: 3576916. DOI: 10.1089/cmb.2012.0232. View

4.
Di Giallonardo F, Topfer A, Rey M, Prabhakaran S, Duport Y, Leemann C . Full-length haplotype reconstruction to infer the structure of heterogeneous virus populations. Nucleic Acids Res. 2014; 42(14):e115. PMC: 4132706. DOI: 10.1093/nar/gku537. View

5.
Chaisson M, Mukherjee S, Kannan S, Eichler E . Resolving multicopy duplications using polyploid phasing. Res Comput Mol Biol. 2017; 10229:117-133. PMC: 5553120. DOI: 10.1007/978-3-319-56970-3_8. View