» Articles » PMID: 24531798

Integrating Human Sequence Data Sets Provides a Resource of Benchmark SNP and Indel Genotype Calls

Overview
Journal Nat Biotechnol
Specialty Biotechnology
Date 2014 Feb 18
PMID 24531798
Citations 398
Authors
Affiliations
Soon will be listed here.
Abstract

Clinical adoption of human genome sequencing requires methods that output genotypes with known accuracy at millions or billions of positions across a genome. Because of substantial discordance among calls made by existing sequencing methods and algorithms, there is a need for a highly accurate set of genotypes across a genome that can be used as a benchmark. Here we present methods to make high-confidence, single-nucleotide polymorphism (SNP), indel and homozygous reference genotype calls for NA12878, the pilot genome for the Genome in a Bottle Consortium. We minimize bias toward any method by integrating and arbitrating between 14 data sets from five sequencing technologies, seven read mappers and three variant callers. We identify regions for which no confident genotype call could be made, and classify them into different categories based on reasons for uncertainty. Our genotype calls are publicly available on the Genome Comparison and Analytic Testing website to enable real-time benchmarking of any method.

Citing Articles

Nivolumab plus chemotherapy or ipilimumab in gastroesophageal cancer: exploratory biomarker analyses of a randomized phase 3 trial.

Shitara K, Janjigian Y, Ajani J, Moehler M, Yao J, Wang X Nat Med. 2025; .

PMID: 40055521 DOI: 10.1038/s41591-025-03575-0.


Reference Materials for Improving Reliability of Multiomics Profiling.

Ren L, Shi L, Zheng Y Phenomics. 2024; 4(5):487-521.

PMID: 39723231 PMC: 11666855. DOI: 10.1007/s43657-023-00153-7.


A robust benchmark for detecting low-frequency variants in the HG002 Genome In A Bottle NIST reference material.

Daniels C, Abdulkadir A, Cleveland M, McDaniel J, Jaspez D, Rubio-Rodriguez L bioRxiv. 2024; .

PMID: 39677813 PMC: 11642750. DOI: 10.1101/2024.12.02.625685.


Benchmarking nanopore sequencing and rapid genomics feasibility: validation at a quaternary hospital in New Zealand.

Nyaga D, Tsai P, Gebbie C, Phua H, Yap P, Le Quesne Stabej P NPJ Genom Med. 2024; 9(1):57.

PMID: 39516456 PMC: 11549486. DOI: 10.1038/s41525-024-00445-5.


The GIAB genomic stratifications resource for human reference genomes.

Dwarshuis N, Kalra D, McDaniel J, Sanio P, Jerez P, Jadhav B Nat Commun. 2024; 15(1):9029.

PMID: 39424793 PMC: 11489684. DOI: 10.1038/s41467-024-53260-y.


References
1.
Abecasis G, Altshuler D, Auton A, Brooks L, Durbin R, Gibbs R . A map of human genome variation from population-scale sequencing. Nature. 2010; 467(7319):1061-73. PMC: 3042601. DOI: 10.1038/nature09534. View

2.
Boland J, Chung C, Roberson D, Mitchell J, Zhang X, Im K . The new sequencer on the block: comparison of Life Technology's Proton sequencer to an Illumina HiSeq for whole-exome sequencing. Hum Genet. 2013; 132(10):1153-63. PMC: 4564298. DOI: 10.1007/s00439-013-1321-4. View

3.
Banerji S, Cibulskis K, Rangel-Escareno C, Brown K, Carter S, Frederick A . Sequence analysis of mutations and translocations across breast cancer subtypes. Nature. 2012; 486(7403):405-9. PMC: 4148686. DOI: 10.1038/nature11154. View

4.
DePristo M, Banks E, Poplin R, Garimella K, Maguire J, Hartl C . A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011; 43(5):491-8. PMC: 3083463. DOI: 10.1038/ng.806. View

5.
Meacham F, Boffelli D, Dhahbi J, Martin D, Singer M, Pachter L . Identification and correction of systematic error in high-throughput sequence data. BMC Bioinformatics. 2011; 12:451. PMC: 3295828. DOI: 10.1186/1471-2105-12-451. View