» Articles » PMID: 36198314

MagicalRsq: Machine-learning-based Genotype Imputation Quality Calibration

Abstract

Whole-genome sequencing (WGS) is the gold standard for fully characterizing genetic variation but is still prohibitively expensive for large samples. To reduce costs, many studies sequence only a subset of individuals or genomic regions, and genotype imputation is used to infer genotypes for the remaining individuals or regions without sequencing data. However, not all variants can be well imputed, and the current state-of-the-art imputation quality metric, denoted as standard Rsq, is poorly calibrated for lower-frequency variants. Here, we propose MagicalRsq, a machine-learning-based method that integrates variant-level imputation and population genetics statistics, to provide a better calibrated imputation quality metric. Leveraging WGS data from the Cystic Fibrosis Genome Project (CFGP), and whole-exome sequence data from UK BioBank (UKB), we performed comprehensive experiments to evaluate the performance of MagicalRsq compared to standard Rsq for partially sequenced studies. We found that MagicalRsq aligns better with true R than standard Rsq in almost every situation evaluated, for both European and African ancestry samples. For example, when applying models trained from 1,992 CFGP sequenced samples to an independent 3,103 samples with no sequencing but TOPMed imputation from array genotypes, MagicalRsq, compared to standard Rsq, achieved net gains of 1.4 million rare, 117k low-frequency, and 18k common variants, where net gains were gained numbers of correctly distinguished variants by MagicalRsq over standard Rsq. MagicalRsq can serve as an improved post-imputation quality metric and will benefit downstream analysis by better distinguishing well-imputed variants from those poorly imputed. MagicalRsq is freely available on GitHub.

Citing Articles

Polygenic Scores of Cardiometabolic Risk Factors in American Indian Adults.

Sun Q, Du J, Tang Y, Best L, Haack K, Zhang Y JAMA Netw Open. 2025; 8(3):e250535.

PMID: 40072435 PMC: 11904716. DOI: 10.1001/jamanetworkopen.2025.0535.


Empirical versus estimated accuracy of imputation: optimising filtering thresholds for sequence imputation.

Nguyen T, Bolormaa S, Reich C, Chamberlain A, Vander Jagt C, Daetwyler H Genet Sel Evol. 2024; 56(1):72.

PMID: 39548370 PMC: 11566673. DOI: 10.1186/s12711-024-00942-2.


A genome-wide association study of alloimmunization in the TOPMed OMG-SCD cohort identifies a locus on chromosome 12.

Sun Q, Karafin M, Garrett M, Li Y, Ashley-Koch A, Telen M Transfusion. 2024; 64(9):1772-1783.

PMID: 38966903 PMC: 11499043. DOI: 10.1111/trf.17944.


MagicalRsq-X: A cross-cohort transferable genotype imputation quality metric.

Sun Q, Yang Y, Rosen J, Chen J, Li X, Guan W Am J Hum Genet. 2024; 111(5):990-995.

PMID: 38636510 PMC: 11080605. DOI: 10.1016/j.ajhg.2024.04.001.


Imputation accuracy across global human populations.

Cahoon J, Rui X, Tang E, Simons C, Langie J, Chen M Am J Hum Genet. 2024; 111(5):979-989.

PMID: 38604166 PMC: 11080279. DOI: 10.1016/j.ajhg.2024.03.011.


References
1.
Das S, Abecasis G, Browning B . Genotype Imputation from Large Reference Panels. Annu Rev Genomics Hum Genet. 2018; 19:73-96. DOI: 10.1146/annurev-genom-083117-021602. View

2.
Backman J, Li A, Marcketta A, Sun D, Mbatchou J, Kessler M . Exome sequencing and analysis of 454,787 UK Biobank participants. Nature. 2021; 599(7886):628-634. PMC: 8596853. DOI: 10.1038/s41586-021-04103-z. View

3.
Schurz H, Muller S, van Helden P, Tromp G, Hoal E, Kinnear C . Evaluating the Accuracy of Imputation Methods in a Five-Way Admixed Population. Front Genet. 2019; 10:34. PMC: 6370942. DOI: 10.3389/fgene.2019.00034. View

4.
Knapp E, Fink A, Goss C, Sewall A, Ostrenga J, Dowd C . The Cystic Fibrosis Foundation Patient Registry. Design and Methods of a National Observational Disease Registry. Ann Am Thorac Soc. 2016; 13(7):1173-9. DOI: 10.1513/AnnalsATS.201511-781OC. View

5.
Li Y, Willer C, Sanna S, Abecasis G . Genotype imputation. Annu Rev Genomics Hum Genet. 2009; 10:387-406. PMC: 2925172. DOI: 10.1146/annurev.genom.9.081307.164242. View