» Articles » PMID: 27604516

From Wet-Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing

Overview
Journal Hum Mutat
Specialty Genetics
Date 2016 Sep 9
PMID 27604516
Citations 27
Authors
Affiliations
Soon will be listed here.
Abstract

As whole genome sequencing becomes cheaper and faster, it will progressively substitute targeted next-generation sequencing as standard practice in research and diagnostics. However, computing cost-performance ratio is not advancing at an equivalent rate. Therefore, it is essential to evaluate the robustness of the variant detection process taking into account the computing resources required. We have benchmarked six combinations of state-of-the-art read aligners (BWA-MEM and GEM3) and variant callers (FreeBayes, GATK HaplotypeCaller, SAMtools) on whole genome and whole exome sequencing data from the NA12878 human sample. Results have been compared between them and against the NIST Genome in a Bottle (GIAB) variants reference dataset. We report differences in speed of up to 20 times in some steps of the process and have observed that SNV, and to a lesser extent InDel, detection is highly consistent in 70% of the genome. SNV, and especially InDel, detection is less reliable in 20% of the genome, and almost unfeasible in the remaining 10%. These findings will aid in choosing the appropriate tools bearing in mind objectives, workload, and computing infrastructure available.

Citing Articles

Genomic reanalysis of a pan-European rare-disease resource yields new diagnoses.

Laurie S, Steyaert W, de Boer E, Polavarapu K, Schuermans N, Sommer A Nat Med. 2025; 31(2):478-489.

PMID: 39825153 PMC: 11835725. DOI: 10.1038/s41591-024-03420-w.


Phenotype-driven genomics enhance diagnosis in children with unresolved neuromuscular diseases.

Estevez-Arias B, Matalonga L, Yubero D, Polavarapu K, Codina A, Ortez C Eur J Hum Genet. 2024; 33(2):239-247.

PMID: 39333429 PMC: 11840105. DOI: 10.1038/s41431-024-01699-4.


An interconnected data infrastructure to support large-scale rare disease research.

Johansson L, Laurie S, Spalding D, Gibson S, Ruvolo D, Thomas C Gigascience. 2024; 13.

PMID: 39302238 PMC: 11413801. DOI: 10.1093/gigascience/giae058.


Identification and characterization of a new pathologic mutation in a large Leber hereditary optic neuropathy pedigree.

Emperador S, Habbane M, Lopez-Gallardo E, Del Rio A, Llobet L, Mateo J Orphanet J Rare Dis. 2024; 19(1):148.

PMID: 38582886 PMC: 10999093. DOI: 10.1186/s13023-024-03165-2.


Variants in mitochondrial disease genes are common causes of inherited peripheral neuropathies.

Ferreira T, Polavarapu K, Olimpio C, Paramonov I, Lochmuller H, Horvath R J Neurol. 2024; 271(6):3546-3553.

PMID: 38549004 PMC: 11136726. DOI: 10.1007/s00415-024-12319-y.


References
1.
Fang H, Wu Y, Narzisi G, ORawe J, Barron L, Rosenbaum J . Reducing INDEL calling errors in whole genome and exome sequencing data. Genome Med. 2014; 6(10):89. PMC: 4240813. DOI: 10.1186/s13073-014-0089-z. View

2.
Biesecker L, Green R . Diagnostic clinical genome and exome sequencing. N Engl J Med. 2014; 370(25):2418-25. DOI: 10.1056/NEJMra1312543. View

3.
Boeva V, Popova T, Bleakley K, Chiche P, Cappo J, Schleiermacher G . Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics. 2011; 28(3):423-5. PMC: 3268243. DOI: 10.1093/bioinformatics/btr670. View

4.
Derrien T, Estelle J, Marco Sola S, Knowles D, Raineri E, Guigo R . Fast computation and applications of genome mappability. PLoS One. 2012; 7(1):e30377. PMC: 3261895. DOI: 10.1371/journal.pone.0030377. View

5.
Li H . A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011; 27(21):2987-93. PMC: 3198575. DOI: 10.1093/bioinformatics/btr509. View