Easing Genomic Surveillance: A Comprehensive Performance Evaluation of Long-read Assemblers Across Multi-strain Mixture Data of HIV-1 and Other Pathogenic Viruses for Constructing a User-friendly Bioinformatic Pipeline

Overview

Journal F1000Res

Specialties Biomedical Engineering
Science

Date 2024 Jul 10

PMID 38984017

Authors

Sara Wattanasombat

Siripong Tongjai

Affiliations

Soon will be listed here.

Abstract

Background: Determining the appropriate computational requirements and software performance is essential for efficient genomic surveillance. The lack of standardized benchmarking complicates software selection, especially with limited resources.

Methods: We developed a containerized benchmarking pipeline to evaluate seven long-read assemblers-Canu, GoldRush, MetaFlye, Strainline, HaploDMF, iGDA, and RVHaplo-for viral haplotype reconstruction, using both simulated and experimental Oxford Nanopore sequencing data of HIV-1 and other viruses. Benchmarking was conducted on three computational systems to assess each assembler's performance, utilizing QUAST and BLASTN for quality assessment.

Results: Our findings show that assembler choice significantly impacts assembly time, with CPU and memory usage having minimal effect. Assembler selection also influences the size of the contigs, with a minimum read length of 2,000 nucleotides required for quality assembly. A 4,000-nucleotide read length improves quality further. Canu was efficient among assemblers but not suitable for multi-strain mixtures, while GoldRush produced only consensus assemblies. Strainline and MetaFlye were suitable for metagenomic sequencing data, with Strainline requiring high memory and MetaFlye operable on low-specification machines. Among reference-based assemblers, iGDA had high error rates, RVHaplo showed the best runtime and accuracy but became ineffective with similar sequences, and HaploDMF, utilizing machine learning, had fewer errors with a slightly longer runtime.

Conclusions: The HIV-64148 pipeline, containerized using Docker, facilitates easy deployment and offers flexibility to select from a range of assemblers to match computational systems or study requirements. This tool aids in genome assembly and provides valuable information on HIV-1 sequences, enhancing viral evolution monitoring and understanding.

Citing Articles

Easing genomic surveillance: A comprehensive performance evaluation of long-read assemblers across multi-strain mixture data of HIV-1 and Other pathogenic viruses for constructing a user-friendly bioinformatic pipeline.

Wattanasombat S, Tongjai S F1000Res. 2024; 13:556.

PMID: 38984017 PMC: 11231628. DOI: 10.12688/f1000research.149577.1.

References

Metzner K . HIV Whole-Genome Sequencing Now: Answering Still-Open Questions. J Clin Microbiol. 2016; 54(4):834-5. PMC: 4809914. DOI: 10.1128/JCM.03265-15. View

Preston B, Poiesz B, Loeb L . Fidelity of HIV-1 reverse transcriptase. Science. 1988; 242(4882):1168-71. DOI: 10.1126/science.2460924. View

Mori M, Ode H, Kubota M, Nakata Y, Kasahara T, Shigemi U . Nanopore Sequencing for Characterization of HIV-1 Recombinant Forms. Microbiol Spectr. 2022; 10(4):e0150722. PMC: 9431566. DOI: 10.1128/spectrum.01507-22. View

Das K, Arnold E . HIV-1 reverse transcriptase and antiviral drug resistance. Part 1. Curr Opin Virol. 2013; 3(2):111-8. PMC: 4097814. DOI: 10.1016/j.coviro.2013.03.012. View

Yamashita T, Takeda H, Takai A, Arasawa S, Nakamura F, Mashimo Y . Single-molecular real-time deep sequencing reveals the dynamics of multi-drug resistant haplotypes and structural variations in the hepatitis C virus genome. Sci Rep. 2020; 10(1):2651. PMC: 7021670. DOI: 10.1038/s41598-020-59397-2. View

Rhee S, Kantor R, Katzenstein D, Camacho R, Morris L, Sirivichayakul S . HIV-1 pol mutation frequency by subtype and treatment experience: extension of the HIVseq program to seven non-B subtypes. AIDS. 2006; 20(5):643-51. PMC: 2551321. DOI: 10.1097/01.aids.0000216363.36786.2b. View

Yang C, Lo T, Nip K, Hafezqorani S, Warren R, Birol I . Characterization and simulation of metagenomic nanopore sequencing data with Meta-NanoSim. Gigascience. 2023; 12. PMC: 10025935. DOI: 10.1093/gigascience/giad013. View

Anyansi C, Straub T, Manson A, Earl A, Abeel T . Computational Methods for Strain-Level Microbial Detection in Colony and Metagenome Sequencing Data. Front Microbiol. 2020; 11:1925. PMC: 7507117. DOI: 10.3389/fmicb.2020.01925. View

Wright I, Delaney K, Katusiime M, Botha J, Engelbrecht S, Kearney M . NanoHIV: A Bioinformatics Pipeline for Producing Accurate, Near Full-Length HIV Proviral Genomes Sequenced Using the Oxford Nanopore Technology. Cells. 2021; 10(10). PMC: 8534097. DOI: 10.3390/cells10102577. View

10.

Gaudin M, Desnues C . Hybrid Capture-Based Next Generation Sequencing and Its Application to Human Infectious Diseases. Front Microbiol. 2018; 9:2924. PMC: 6277869. DOI: 10.3389/fmicb.2018.02924. View

11.

Wong J, Coombe L, Nikolic V, Zhang E, Nip K, Sidhu P . Linear time complexity de novo long read genome assembly with GoldRush. Nat Commun. 2023; 14(1):2906. PMC: 10202940. DOI: 10.1038/s41467-023-38716-x. View

12.

Monaco D, Zapata L, Hunter E, Salomon H, Dilernia D . Resistance profile of HIV-1 quasispecies in patients under treatment failure using single molecule, real-time sequencing. AIDS. 2020; 34(15):2201-2210. DOI: 10.1097/QAD.0000000000002697. View

13.

Luo X, Kang X, Schonhuth A . Enhancing Long-Read-Based Strain-Aware Metagenome Assembly. Front Genet. 2022; 13:868280. PMC: 9136235. DOI: 10.3389/fgene.2022.868280. View

14.

Kirchhoff F . Immune evasion and counteraction of restriction factors by HIV-1 and other primate lentiviruses. Cell Host Microbe. 2010; 8(1):55-67. DOI: 10.1016/j.chom.2010.06.004. View

15.

Hill V, Githinji G, Vogels C, Bento A, Chaguza C, Carrington C . Toward a global virus genomic surveillance network. Cell Host Microbe. 2023; 31(6):861-873. PMC: 9986120. DOI: 10.1016/j.chom.2023.03.003. View

16.

Ni Y, Liu X, Simeneh Z, Yang M, Li R . Benchmarking of Nanopore R10.4 and R9.4.1 flow cells in single-cell whole-genome amplification and whole-genome shotgun sequencing. Comput Struct Biotechnol J. 2023; 21:2352-2364. PMC: 10070092. DOI: 10.1016/j.csbj.2023.03.038. View

17.

Ng T, Su J, Lao H, Lui W, Chan C, Leung A . Long-Read Sequencing with Hierarchical Clustering for Antiretroviral Resistance Profiling of Mixed Human Immunodeficiency Virus Quasispecies. Clin Chem. 2023; 69(10):1174-1185. DOI: 10.1093/clinchem/hvad108. View

18.

Bohn P, Gribling-Burrer A, Ambi U, Smyth R . Nano-DMS-MaP allows isoform-specific RNA structure determination. Nat Methods. 2023; 20(6):849-859. PMC: 10250195. DOI: 10.1038/s41592-023-01862-7. View

19.

Deeks S, Archin N, Cannon P, Collins S, Jones R, de Jong M . Research priorities for an HIV cure: International AIDS Society Global Scientific Strategy 2021. Nat Med. 2021; 27(12):2085-2098. DOI: 10.1038/s41591-021-01590-5. View

20.

Gifford R, Liu T, Rhee S, Kiuchi M, Hue S, Pillay D . The calibrated population resistance tool: standardized genotypic estimation of transmitted HIV-1 drug resistance. Bioinformatics. 2009; 25(9):1197-8. PMC: 2672634. DOI: 10.1093/bioinformatics/btp134. View