» Articles » PMID: 38872030

Synthetic Surrogates Improve Power for Genome-wide Association Studies of Partially Missing Phenotypes in Population Biobanks

Overview
Journal Nat Genet
Specialty Genetics
Date 2024 Jun 13
PMID 38872030
Authors
Affiliations
Soon will be listed here.
Abstract

Within population biobanks, incomplete measurement of certain traits limits the power for genetic discovery. Machine learning is increasingly used to impute the missing values from the available data. However, performing genome-wide association studies (GWAS) on imputed traits can introduce spurious associations, identifying genetic variants that are not associated with the original trait. Here we introduce a new method, synthetic surrogate (SynSurr) analysis, which makes GWAS on imputed phenotypes robust to imputation errors. Rather than replacing missing values, SynSurr jointly analyzes the original and imputed traits. We show that SynSurr estimates the same genetic effect as standard GWAS and improves power in proportion to the quality of the imputations. SynSurr requires a commonly made missing-at-random assumption but relaxes the requirements of existing imputation methods by not requiring correct model specification. We present extensive simulations and ablation analyses to validate SynSurr and apply it to empower the GWAS of dual-energy X-ray absorptiometry traits within the UK Biobank.

Citing Articles

Valid inference for machine learning-assisted genome-wide association studies.

Miao J, Wu Y, Sun Z, Miao X, Lu T, Zhao J Nat Genet. 2024; 56(11):2361-2369.

PMID: 39349818 DOI: 10.1038/s41588-024-01934-0.


A statistical framework for powerful multi-trait rare variant analysis in large-scale whole-genome sequencing studies.

Li X, Chen H, Selvaraj M, Van Buren E, Zhou H, Wang Y bioRxiv. 2023; .

PMID: 37961350 PMC: 10634938. DOI: 10.1101/2023.10.30.564764.

References
1.
Kurki M, Karjalainen J, Palta P, Sipila T, Kristiansson K, Donner K . FinnGen provides genetic insights from a well-phenotyped isolated population. Nature. 2023; 613(7944):508-518. PMC: 9849126. DOI: 10.1038/s41586-022-05473-8. View

2.
Gaziano J, Concato J, Brophy M, Fiore L, Pyarajan S, Breeling J . Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J Clin Epidemiol. 2015; 70:214-23. DOI: 10.1016/j.jclinepi.2015.09.016. View

3.
Bycroft C, Freeman C, Petkova D, Band G, Elliott L, Sharp K . The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018; 562(7726):203-209. PMC: 6786975. DOI: 10.1038/s41586-018-0579-z. View

4.
Beesley L, Salvatore M, Fritsche L, Pandit A, Rao A, Brummett C . The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities. Stat Med. 2019; 39(6):773-800. PMC: 7983809. DOI: 10.1002/sim.8445. View

5.
Tan V, Timpson N . The UK Biobank: A Shining Example of Genome-Wide Association Study Science with the Power to Detect the Murky Complications of Real-World Epidemiology. Annu Rev Genomics Hum Genet. 2022; 23:569-589. DOI: 10.1146/annurev-genom-121321-093606. View