» Articles » PMID: 32963235

A Diploid Assembly-based Benchmark for Variants in the Major Histocompatibility Complex

Abstract

Most human genomes are characterized by aligning individual reads to the reference genome, but accurate long reads and linked reads now enable us to construct accurate, phased de novo assemblies. We focus on a medically important, highly variable, 5 million base-pair (bp) region where diploid assembly is particularly useful - the Major Histocompatibility Complex (MHC). Here, we develop a human genome benchmark derived from a diploid assembly for the openly-consented Genome in a Bottle sample HG002. We assemble a single contig for each haplotype, align them to the reference, call phased small and structural variants, and define a small variant benchmark for the MHC, covering 94% of the MHC and 22368 variants smaller than 50 bp, 49% more variants than a mapping-based benchmark. This benchmark reliably identifies errors in mapping-based callsets, and enables performance assessment in regions with much denser, complex variation than regions covered by previous benchmarks.

Citing Articles

Diversity and consequences of structural variation in the human genome.

Collins R, Talkowski M Nat Rev Genet. 2025; .

PMID: 39838028 DOI: 10.1038/s41576-024-00808-9.


Long-read sequencing reveals novel genetic polymorphisms in the major histocompatibility complex region and their impacts on the Han Chinese population.

Zhou C, Gong T, Li S, Jin L, Fan S Sci China Life Sci. 2025; .

PMID: 39821835 DOI: 10.1007/s11427-024-2742-y.


Pangenome graphs and their applications in biodiversity genomics.

Secomandi S, Gallo G, Rossi R, Rodriguez Fernandes C, Jarvis E, Bonisoli-Alquati A Nat Genet. 2025; 57(1):13-26.

PMID: 39779953 DOI: 10.1038/s41588-024-02029-6.


Small variant benchmark from a complete assembly of X and Y chromosomes.

Wagner J, Olson N, McDaniel J, Harris L, Pinto B, Jaspez D Nat Commun. 2025; 16(1):497.

PMID: 39779690 PMC: 11711550. DOI: 10.1038/s41467-024-55710-z.


Reference Materials for Improving Reliability of Multiomics Profiling.

Ren L, Shi L, Zheng Y Phenomics. 2024; 4(5):487-521.

PMID: 39723231 PMC: 11666855. DOI: 10.1007/s43657-023-00153-7.


References
1.
Wagner J, Olson N, Harris L, Khan Z, Farek J, Mahmoud M . Benchmarking challenging small variants with linked and long reads. Cell Genom. 2022; 2(5). PMC: 9706577. DOI: 10.1016/j.xgen.2022.100128. View

2.
Ebler J, Haukness M, Pesout T, Marschall T, Paten B . Haplotype-aware diplotyping from noisy long reads. Genome Biol. 2019; 20(1):116. PMC: 6547545. DOI: 10.1186/s13059-019-1709-0. View

3.
Porubsky D, Garg S, Sanders A, Korbel J, Guryev V, Lansdorp P . Dense and accurate whole-chromosome haplotyping of individual genomes. Nat Commun. 2017; 8(1):1293. PMC: 5670131. DOI: 10.1038/s41467-017-01389-4. View

4.
Eberle M, Fritzilas E, Krusche P, Kallberg M, Moore B, Bekritsky M . A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res. 2016; 27(1):157-164. PMC: 5204340. DOI: 10.1101/gr.210500.116. View

5.
Patterson M, Marschall T, Pisanti N, van Iersel L, Stougie L, Klau G . WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing Reads. J Comput Biol. 2015; 22(6):498-509. DOI: 10.1089/cmb.2014.0157. View