» Articles » PMID: 17924348

Rapid and Accurate Haplotype Phasing and Missing-data Inference for Whole-genome Association Studies by Use of Localized Haplotype Clustering

Overview
Journal Am J Hum Genet
Publisher Cell Press
Specialty Genetics
Date 2007 Oct 10
PMID 17924348
Citations 1718
Authors
Affiliations
Soon will be listed here.
Abstract

Whole-genome association studies present many new statistical and computational challenges due to the large quantity of data obtained. One of these challenges is haplotype inference; methods for haplotype inference designed for small data sets from candidate-gene studies do not scale well to the large number of individuals genotyped in whole-genome association studies. We present a new method and software for inference of haplotype phase and missing data that can accurately phase data from whole-genome association studies, and we present the first comparison of haplotype-inference methods for real and simulated data sets with thousands of genotyped individuals. We find that our method outperforms existing methods in terms of both speed and accuracy for large data sets with thousands of individuals and densely spaced genetic markers, and we use our method to phase a real data set of 3,002 individuals genotyped for 490,032 markers in 3.1 days of computing time, with 99% of masked alleles imputed correctly. Our method is implemented in the Beagle software package, which is freely available.

Citing Articles

Quantifying the effects of computational filter criteria on the accurate identification of de novo mutations at varying levels of sequencing coverage.

Milhaven M, Garg A, Versoza C, Pfeifer S Heredity (Edinb). 2025; .

PMID: 40082647 DOI: 10.1038/s41437-025-00754-0.


Investigation of selection signatures of dairy goats using whole-genome sequencing data.

Peng W, Zhang Y, Gao L, Wang S, Liu M, Sun E BMC Genomics. 2025; 26(1):234.

PMID: 40069586 PMC: 11899394. DOI: 10.1186/s12864-025-11437-9.


Genome-wide association analysis of Septoria tritici blotch for adult plant resistance in elite bread wheat (Triticum aestivum L) genotypes.

Kassie M, Abebe T, Desta E, Tadesse W PLoS One. 2025; 20(3):e0317603.

PMID: 40063614 PMC: 11892845. DOI: 10.1371/journal.pone.0317603.


Telomere-to-telomere, gap-free genome of mung bean () provides insights into domestication under structural variation.

Jia K, Li G, Wang L, Liu M, Wang Z, Li R Hortic Res. 2025; 12(3):uhae337.

PMID: 40061812 PMC: 11886820. DOI: 10.1093/hr/uhae337.


Integrative multi-environmental genomic prediction in apple.

Jung M, Quesada-Traver C, Roth M, Aranzana M, Muranty H, Rymenants M Hortic Res. 2025; 12(2):uhae319.

PMID: 40041603 PMC: 11879405. DOI: 10.1093/hr/uhae319.


References
1.
Stephens M, Scheet P . Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation. Am J Hum Genet. 2005; 76(3):449-62. PMC: 1196397. DOI: 10.1086/428594. View

2.
Kong A, Gudbjartsson D, Sainz J, Jonsdottir G, Gudjonsson S, Richardsson B . A high-resolution recombination map of the human genome. Nat Genet. 2002; 31(3):241-7. DOI: 10.1038/ng917. View

3.
Hawley M, Kidd K . HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes. J Hered. 1995; 86(5):409-11. DOI: 10.1093/oxfordjournals.jhered.a111613. View

4.
Browning S . Multilocus association mapping using variable-length Markov chains. Am J Hum Genet. 2006; 78(6):903-13. PMC: 1474089. DOI: 10.1086/503876. View

5.
Qin Z, Niu T, Liu J . Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms. Am J Hum Genet. 2002; 71(5):1242-7. PMC: 385113. DOI: 10.1086/344207. View