» Articles » PMID: 33214604

Accuracy and Efficiency of Germline Variant Calling Pipelines for Human Genome Data

Overview
Journal Sci Rep
Specialty Science
Date 2020 Nov 20
PMID 33214604
Citations 41
Authors
Affiliations
Soon will be listed here.
Abstract

Advances in next-generation sequencing technology have enabled whole genome sequencing (WGS) to be widely used for identification of causal variants in a spectrum of genetic-related disorders, and provided new insight into how genetic polymorphisms affect disease phenotypes. The development of different bioinformatics pipelines has continuously improved the variant analysis of WGS data. However, there is a necessity for a systematic performance comparison of these pipelines to provide guidance on the application of WGS-based scientific and clinical genomics. In this study, we evaluated the performance of three variant calling pipelines (GATK, DRAGEN and DeepVariant) using the Genome in a Bottle Consortium, "synthetic-diploid" and simulated WGS datasets. DRAGEN and DeepVariant show better accuracy in SNP and indel calling, with no significant differences in their F1-score. DRAGEN platform offers accuracy, flexibility and a highly-efficient execution speed, and therefore superior performance in the analysis of WGS data on a large scale. The combination of DRAGEN and DeepVariant also suggests a good balance of accuracy and efficiency as an alternative solution for germline variant detection in further applications. Our results facilitate the standardization of benchmarking analysis of bioinformatics pipelines for reliable variant detection, which is critical in genetics-based medical research and clinical applications.

Citing Articles

Neoadjuvant triplet immune checkpoint blockade in newly diagnosed glioblastoma.

Long G, Shklovskaya E, Satgunaseelan L, Mao Y, Pires da Silva I, Perry K Nat Med. 2025; .

PMID: 40016450 DOI: 10.1038/s41591-025-03512-1.


Whole exome sequencing reveals ABCD1 variant as a potential contributor to male infertility.

Redouane S, Harmak H, El Hamouchi A, Charoute H, Louanjli N, Malki A Mol Biol Rep. 2025; 52(1):148.

PMID: 39841288 DOI: 10.1007/s11033-025-10234-7.


Case report: A case study of variant calling pipeline selection effect on the molecular diagnostics outcome.

Skitchenko R, Smirnov S, Krapivin M, Smirnova A, Artomov M, Loboda A Front Oncol. 2024; 14:1422811.

PMID: 39544296 PMC: 11560904. DOI: 10.3389/fonc.2024.1422811.


Dissecting the Reduced Penetrance of Putative Loss-of-Function Variants in Population-Scale Biobanks.

Blair D, Risch N medRxiv. 2024; .

PMID: 39399029 PMC: 11469360. DOI: 10.1101/2024.09.23.24314008.


NCBench: providing an open, reproducible, transparent, adaptable, and continuous benchmark approach for DNA-sequencing-based variant calling.

Hanssen F, Gabernet G, Bauerle F, Stocker B, Wiegand F, Smith N F1000Res. 2024; 12:1125.

PMID: 39345270 PMC: 11428021. DOI: 10.12688/f1000research.140344.1.


References
1.
Hwang S, Kim E, Lee I, Marcotte E . Systematic comparison of variant calling pipelines using gold standard personal exome variants. Sci Rep. 2015; 5:17875. PMC: 4671096. DOI: 10.1038/srep17875. View

2.
Radder J, Zhang Y, Gregory A, Yu S, Kelly N, Leader J . Extreme Trait Whole-Genome Sequencing Identifies PTPRO as a Novel Candidate Gene in Emphysema with Severe Airflow Obstruction. Am J Respir Crit Care Med. 2017; 196(2):159-171. PMC: 5519967. DOI: 10.1164/rccm.201606-1147OC. View

3.
Li H, Durbin R . Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010; 26(5):589-95. PMC: 2828108. DOI: 10.1093/bioinformatics/btp698. View

4.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A . The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010; 20(9):1297-303. PMC: 2928508. DOI: 10.1101/gr.107524.110. View

5.
Yu X, Sun S . Comparing a few SNP calling algorithms using low-coverage sequencing data. BMC Bioinformatics. 2013; 14:274. PMC: 3848615. DOI: 10.1186/1471-2105-14-274. View