» Articles » PMID: 29850774

Geck: Trio-based Comparative Benchmarking of Variant Calls

Overview
Journal Bioinformatics
Specialty Biology
Date 2018 Jun 1
PMID 29850774
Citations 6
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Classical methods of comparing the accuracies of variant calling pipelines are based on truth sets of variants whose genotypes are previously determined with high confidence. An alternative way of performing benchmarking is based on Mendelian constraints between related individuals. Statistical analysis of Mendelian violations can provide truth set-independent benchmarking information, and enable benchmarking less-studied variants and diverse populations.

Results: We introduce a statistical mixture model for comparing two variant calling pipelines from genotype data they produce after running on individual members of a trio. We determine the accuracy of our model by comparing the precision and recall of GATK Unified Genotyper and Haplotype Caller on the high-confidence SNPs of the NIST Ashkenazim trio and the two independent Platinum Genome trios. We show that our method is able to estimate differential precision and recall between the two pipelines with 10-3 uncertainty.

Availability And Implementation: The Python library geck, and usage examples are available at the following URL: https://github.com/sbg/geck, under the GNU General Public License v3.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Understanding Mendelian errors in SNP arrays data using a Gochu Asturcelta pig pedigree: genomic alterations, family size and calling errors.

Arias K, Alvarez I, Gutierrez J, Fernandez I, Menendez J, Menendez-Arias N Sci Rep. 2022; 12(1):19686.

PMID: 36385499 PMC: 9668983. DOI: 10.1038/s41598-022-24340-0.


Next Generation Sequencing and Bioinformatics Analysis of Family Genetic Inheritance.

Kanzi A, San J, Chimukangara B, Wilkinson E, Fish M, Ramsuran V Front Genet. 2020; 11:544162.

PMID: 33193618 PMC: 7649788. DOI: 10.3389/fgene.2020.544162.


PedMiner: a tool for linkage analysis-based identification of disease-associated variants using family based whole-exome sequencing data.

Zhou J, Gao J, Zhang H, Zhao D, Li A, Iqbal F Brief Bioinform. 2020; 22(3).

PMID: 32393981 PMC: 8138824. DOI: 10.1093/bib/bbaa077.


Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism-calling pipelines.

Bush S, Foster D, Eyre D, Clark E, De Maio N, Shaw L Gigascience. 2020; 9(2).

PMID: 32025702 PMC: 7002876. DOI: 10.1093/gigascience/giaa007.


Mendelian Inconsistent Signatures from 1314 Ancestrally Diverse Family Trios Distinguish Biological Variation from Sequencing Error.

Kothiyal P, Wong W, Bodian D, Niederhuber J J Comput Biol. 2019; 26(5):405-419.

PMID: 30942611 PMC: 6533806. DOI: 10.1089/cmb.2018.0253.


References
1.
Browning B, Browning S . Detecting identity by descent and estimating genotype error rates in sequence data. Am J Hum Genet. 2013; 93(5):840-51. PMC: 3824133. DOI: 10.1016/j.ajhg.2013.09.014. View

2.
Kojima K, Nariai N, Mimori T, Takahashi M, Yamaguchi-Kabata Y, Sato Y . A statistical variant calling approach from pedigree information and local haplotyping with phase informative reads. Bioinformatics. 2013; 29(22):2835-43. DOI: 10.1093/bioinformatics/btt503. View

3.
Auton A, Brooks L, Durbin R, Garrison E, Kang H, Korbel J . A global reference for human genetic variation. Nature. 2015; 526(7571):68-74. PMC: 4750478. DOI: 10.1038/nature15393. View

4.
Shringarpure S, Carroll A, De La Vega F, Bustamante C . Inexpensive and Highly Reproducible Cloud-Based Variant Calling of 2,535 Human Genomes. PLoS One. 2015; 10(6):e0129277. PMC: 4482534. DOI: 10.1371/journal.pone.0129277. View

5.
Sobel E, Papp J, Lange K . Detection and integration of genotyping errors in statistical genetics. Am J Hum Genet. 2002; 70(2):496-508. PMC: 384922. DOI: 10.1086/338920. View