» Articles » PMID: 34561697

Vulcan: Improved Long-read Mapping and Structural Variant Calling Via Dual-mode Alignment

Overview
Journal Gigascience
Specialties Biology
Genetics
Date 2021 Sep 25
PMID 34561697
Citations 11
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Long-read sequencing has enabled unprecedented surveys of structural variation across the entire human genome. To maximize the potential of long-read sequencing in this context, novel mapping methods have emerged that have primarily focused on either speed or accuracy. Various heuristics and scoring schemas have been implemented in widely used read mappers (minimap2 and NGMLR) to optimize for speed or accuracy, which have variable performance across different genomic regions and for specific structural variants. Our hypothesis is that constraining read mapping to the use of a single gap penalty across distinct mutational hot spots reduces read alignment accuracy and impedes structural variant detection.

Findings: We tested our hypothesis by implementing a read-mapping pipeline called Vulcan that uses two distinct gap penalty modes, which we refer to as dual-mode alignment. The high-level idea is that Vulcan leverages the computed normalized edit distance of the mapped reads via minimap2 to identify poorly aligned reads and realigns them using the more accurate yet computationally more expensive long-read mapper (NGMLR). In support of our hypothesis, we show that Vulcan improves the alignments for Oxford Nanopore Technology long reads for both simulated and real datasets. These improvements, in turn, lead to improved accuracy for structural variant calling performance on human genome datasets compared to either of the read-mapping methods alone.

Conclusions: Vulcan is the first long-read mapping framework that combines two distinct gap penalty modes for improved structural variant recall and precision. Vulcan is open-source and available under the MIT License at https://gitlab.com/treangenlab/vulcan.

Citing Articles

Performance of somatic structural variant calling in lung cancer using Oxford Nanopore sequencing technology.

Liu L, Zhang J, Wood S, Newell F, Leonard C, Koufariotis L BMC Genomics. 2024; 25(1):898.

PMID: 39350042 PMC: 11441263. DOI: 10.1186/s12864-024-10792-3.


Chromosome-Level Genome Assembly of the Viviparous Eelpout Zoarces viviparus.

Fuhrmann N, Brasseur M, Bakowski C, Podsiadlowski L, Prost S, Krehenwinkel H Genome Biol Evol. 2024; 16(8).

PMID: 39018026 PMC: 11331339. DOI: 10.1093/gbe/evae155.


Analysis and benchmarking of small and large genomic variants across tandem repeats.

English A, Dolzhenko E, Ziaei Jam H, McKenzie S, Olson N, De Coster W Nat Biotechnol. 2024; .

PMID: 38671154 DOI: 10.1038/s41587-024-02225-z.


Genomic variant benchmark: if you cannot measure it, you cannot improve it.

Majidian S, Agustinho D, Chin C, Sedlazeck F, Mahmoud M Genome Biol. 2023; 24(1):221.

PMID: 37798733 PMC: 10552390. DOI: 10.1186/s13059-023-03061-1.


A Non-Polar Mutant Confirms the Role of the Two-Component System BvrR/BvrS in Virulence and Membrane Integrity.

Rivas-Solano O, Nunez-Montero K, Altamirano-Silva P, Ruiz-Villalobos N, Barquero-Calvo E, Moreno E Microorganisms. 2023; 11(8).

PMID: 37630574 PMC: 10459465. DOI: 10.3390/microorganisms11082014.


References
1.
Chin C, Wagner J, Zeng Q, Garrison E, Garg S, Fungtammasan A . A diploid assembly-based benchmark for variants in the major histocompatibility complex. Nat Commun. 2020; 11(1):4794. PMC: 7508831. DOI: 10.1038/s41467-020-18564-9. View

2.
Alonge M, Wang X, Benoit M, Soyk S, Pereira L, Zhang L . Major Impacts of Widespread Structural Variation on Gene Expression and Crop Improvement in Tomato. Cell. 2020; 182(1):145-161.e23. PMC: 7354227. DOI: 10.1016/j.cell.2020.05.021. View

3.
Kielbasa S, Wan R, Sato K, Horton P, Frith M . Adaptive seeds tame genomic sequence comparison. Genome Res. 2011; 21(3):487-93. PMC: 3044862. DOI: 10.1101/gr.113985.110. View

4.
Sahlin K, Medvedev P . De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm. J Comput Biol. 2020; 27(4):472-484. PMC: 8884114. DOI: 10.1089/cmb.2019.0299. View

5.
Jiang T, Liu B, Li J, Wang Y . rMETL: sensitive mobile element insertion detection with long read realignment. Bioinformatics. 2019; 35(18):3484-3486. DOI: 10.1093/bioinformatics/btz106. View