» Articles » PMID: 27182962

SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data

Overview
Journal PLoS One
Date 2016 May 17
PMID 27182962
Citations 24
Authors
Affiliations
Soon will be listed here.
Abstract

Next-generation sequencing (NGS) technologies have led to a huge amount of genomic data that need to be analyzed and interpreted. This fact has a huge impact on the DNA sequence alignment process, which nowadays requires the mapping of billions of small DNA sequences onto a reference genome. In this way, sequence alignment remains the most time-consuming stage in the sequence analysis workflow. To deal with this issue, state of the art aligners take advantage of parallelization strategies. However, the existent solutions show limited scalability and have a complex implementation. In this work we introduce SparkBWA, a new tool that exploits the capabilities of a big data technology as Spark to boost the performance of one of the most widely adopted aligner, the Burrows-Wheeler Aligner (BWA). The design of SparkBWA uses two independent software layers in such a way that no modifications to the original BWA source code are required, which assures its compatibility with any BWA version (future or legacy). SparkBWA is evaluated in different scenarios showing noticeable results in terms of performance and scalability. A comparison to other parallel BWA-based aligners validates the benefits of our approach. Finally, an intuitive and flexible API is provided to NGS professionals in order to facilitate the acceptance and adoption of the new tool. The source code of the software described in this paper is publicly available at https://github.com/citiususc/SparkBWA, with a GPL3 license.

Citing Articles

SeQual-Stream: approaching stream processing to quality control of NGS datasets.

Castellanos-Rodriguez O, Exposito R, Tourino J BMC Bioinformatics. 2023; 24(1):403.

PMID: 37891497 PMC: 10612204. DOI: 10.1186/s12859-023-05530-7.


SparkEC: speeding up alignment-based DNA error correction tools.

Exposito R, Martinez-Sanchez M, Tourino J BMC Bioinformatics. 2022; 23(1):464.

PMID: 36344928 PMC: 9639292. DOI: 10.1186/s12859-022-05013-1.


Bracovirus Sneaks Into Apoptotic Bodies Transmitting Immunosuppressive Signaling Driven by Integration-Mediated eIF5A Hypusination.

Zhou G, Chen C, Cai Q, Yan X, Peng N, Li X Front Immunol. 2022; 13:901593.

PMID: 35664011 PMC: 9156803. DOI: 10.3389/fimmu.2022.901593.


QTL mapping and identification of genes associated with the resistance to Acanthoscelides obtectus in cultivated common bean using a high-density genetic linkage map.

Li X, Tang Y, Wang L, Chang Y, Wu J, Wang S BMC Plant Biol. 2022; 22(1):260.

PMID: 35610573 PMC: 9131570. DOI: 10.1186/s12870-022-03635-4.


SaAlign: Multiple DNA/RNA sequence alignment and phylogenetic tree construction tool for ultra-large datasets and ultra-long sequences based on suffix array.

Wang Z, Tan J, Long Y, Liu Y, Lei W, Cai J Comput Struct Biotechnol J. 2022; 20:1487-1493.

PMID: 35422971 PMC: 8976100. DOI: 10.1016/j.csbj.2022.03.018.


References
1.
Abuin J, Pichel J, Pena T, Amigo J . BigBWA: approaching the Burrows-Wheeler aligner to Big Data technologies. Bioinformatics. 2015; 31(24):4003-5. DOI: 10.1093/bioinformatics/btv506. View

2.
Luo R, Wong T, Zhu J, Liu C, Zhu X, Wu E . SOAP3-dp: fast, accurate and sensitive GPU-based short read aligner. PLoS One. 2013; 8(5):e65632. PMC: 3669295. DOI: 10.1371/journal.pone.0065632. View

3.
Decap D, Reumers J, Herzeel C, Costanza P, Fostier J . Halvade: scalable sequence analysis with MapReduce. Bioinformatics. 2015; 31(15):2482-8. PMC: 4514927. DOI: 10.1093/bioinformatics/btv179. View

4.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N . The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009; 25(16):2078-9. PMC: 2723002. DOI: 10.1093/bioinformatics/btp352. View

5.
Li H, Durbin R . Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010; 26(5):589-95. PMC: 2828108. DOI: 10.1093/bioinformatics/btp698. View