» Articles » PMID: 38984017

Easing Genomic Surveillance: A Comprehensive Performance Evaluation of Long-read Assemblers Across Multi-strain Mixture Data of HIV-1 and Other Pathogenic Viruses for Constructing a User-friendly Bioinformatic Pipeline

Overview
Journal F1000Res
Date 2024 Jul 10
PMID 38984017
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Determining the appropriate computational requirements and software performance is essential for efficient genomic surveillance. The lack of standardized benchmarking complicates software selection, especially with limited resources.

Methods: We developed a containerized benchmarking pipeline to evaluate seven long-read assemblers-Canu, GoldRush, MetaFlye, Strainline, HaploDMF, iGDA, and RVHaplo-for viral haplotype reconstruction, using both simulated and experimental Oxford Nanopore sequencing data of HIV-1 and other viruses. Benchmarking was conducted on three computational systems to assess each assembler's performance, utilizing QUAST and BLASTN for quality assessment.

Results: Our findings show that assembler choice significantly impacts assembly time, with CPU and memory usage having minimal effect. Assembler selection also influences the size of the contigs, with a minimum read length of 2,000 nucleotides required for quality assembly. A 4,000-nucleotide read length improves quality further. Canu was efficient among assemblers but not suitable for multi-strain mixtures, while GoldRush produced only consensus assemblies. Strainline and MetaFlye were suitable for metagenomic sequencing data, with Strainline requiring high memory and MetaFlye operable on low-specification machines. Among reference-based assemblers, iGDA had high error rates, RVHaplo showed the best runtime and accuracy but became ineffective with similar sequences, and HaploDMF, utilizing machine learning, had fewer errors with a slightly longer runtime.

Conclusions: The HIV-64148 pipeline, containerized using Docker, facilitates easy deployment and offers flexibility to select from a range of assemblers to match computational systems or study requirements. This tool aids in genome assembly and provides valuable information on HIV-1 sequences, enhancing viral evolution monitoring and understanding.

References
1.
Metzner K . HIV Whole-Genome Sequencing Now: Answering Still-Open Questions. J Clin Microbiol. 2016; 54(4):834-5. PMC: 4809914. DOI: 10.1128/JCM.03265-15. View

2.
Preston B, Poiesz B, Loeb L . Fidelity of HIV-1 reverse transcriptase. Science. 1988; 242(4882):1168-71. DOI: 10.1126/science.2460924. View

3.
Mori M, Ode H, Kubota M, Nakata Y, Kasahara T, Shigemi U . Nanopore Sequencing for Characterization of HIV-1 Recombinant Forms. Microbiol Spectr. 2022; 10(4):e0150722. PMC: 9431566. DOI: 10.1128/spectrum.01507-22. View

4.
Das K, Arnold E . HIV-1 reverse transcriptase and antiviral drug resistance. Part 1. Curr Opin Virol. 2013; 3(2):111-8. PMC: 4097814. DOI: 10.1016/j.coviro.2013.03.012. View

5.
Yamashita T, Takeda H, Takai A, Arasawa S, Nakamura F, Mashimo Y . Single-molecular real-time deep sequencing reveals the dynamics of multi-drug resistant haplotypes and structural variations in the hepatitis C virus genome. Sci Rep. 2020; 10(1):2651. PMC: 7021670. DOI: 10.1038/s41598-020-59397-2. View