PipeMEM: A Framework to Speed Up BWA-MEM in Spark with Low Overhead
Overview
Authors
Affiliations
(1) Background: DNA sequence alignment process is an essential step in genome analysis. BWA-MEM has been a prevalent single-node tool in genome alignment because of its high speed and accuracy. The exponentially generated genome data requiring a multi-node solution to handle large volumes of data currently remains a challenge. Spark is a ubiquitous big data platform that has been exploited to assist genome alignment in handling this challenge. Nonetheless, existing works that utilize Spark to optimize BWA-MEM suffer from higher overhead. (2) Methods: In this paper, we presented PipeMEM, a framework to accelerate BWA-MEM with lower overhead with the help of the pipe operation in Spark. We additionally proposed to use a pipeline structure and in-memory-computation to accelerate PipeMEM. (3) Results: Our experiments showed that, on paired-end alignment tasks, our framework had low overhead. In a multi-node environment, our framework, on average, was 2.27× faster compared with BWASpark (an alignment tool in Genome Analysis Toolkit (GATK)), and 2.33× faster compared with SparkBWA. (4) Conclusions: PipeMEM could accelerate BWA-MEM in the Spark environment with high performance and low overhead.
Clay S, Evans A, Zambrano R, Otohinoyi D, Hicks C, Tsien F Front Pediatr. 2024; 12:1299341.
PMID: 38450295 PMC: 10915201. DOI: 10.3389/fped.2024.1299341.
Zhu B, Kang Z, Zhu S, Zhang Y, Lai X, Zhou L Front Cell Dev Biol. 2022; 9:772534.
PMID: 35071227 PMC: 8777291. DOI: 10.3389/fcell.2021.772534.
CircRNA expression profiling of PBMCs from patients with hepatocellular carcinoma by RNA-sequencing.
Han Z, Feng W, Hu R, Ge Q, Sun X, Ma W Exp Ther Med. 2021; 22(6):1467.
PMID: 34737807 PMC: 8561760. DOI: 10.3892/etm.2021.10902.
VC@Scale: Scalable and high-performance variant calling on cluster environments.
Ahmad T, Al Ars Z, Hofstee H Gigascience. 2021; 10(9).
PMID: 34494101 PMC: 8424057. DOI: 10.1093/gigascience/giab057.
Bioinformatics Accelerates the Major Tetrad: A Real Boost for the Pharmaceutical Industry.
Behl T, Kaur I, Sehgal A, Singh S, Bhatia S, Al-Harrasi A Int J Mol Sci. 2021; 22(12).
PMID: 34201152 PMC: 8227524. DOI: 10.3390/ijms22126184.