» Articles » PMID: 38260403

A One-Shot Lossless Algorithm for Cross-Cohort Learning in Mixed-Outcomes Analysis

Overview
Journal medRxiv
Date 2024 Jan 23
PMID 38260403
Authors
Affiliations
Soon will be listed here.
Abstract

In cross-cohort studies, integrating diverse datasets, such as electronic health records (EHRs), is both essential and challenging due to cohort-specific variations, distributed data storage, and data privacy concerns. Traditional methods often require data pooling or complex data harmonization, which can reduce efficiency and limit the scope of cross-cohort learning. We introduce mixWAS, a one-shot, lossless algorithm that efficiently integrates distributed EHR datasets via summary statistics. Unlike existing approaches, mixWAS preserves cohort-specific covariate associations and supports simultaneous mixed-outcome analyses. Simulations demonstrate that mixWAS outperforms conventional methods in accuracy and efficiency across various scenarios. Applied to EHR data from seven cohorts in the US, mixWAS identified 4,534 significant cross-cohort genetic associations among traits such as blood lipids, BMI, and circulatory diseases. Validation with an independent UK EHR dataset confirmed 97.7% of these associations, underscoring the algorithm's robustness. By enabling lossless cross-cohort integration, mixWAS improves the precision of multi-outcome analyses and expands the potential for actionable insights in healthcare research.

References
1.
Li R, Duan R, Zhang X, Lumley T, Pendergrass S, Bauer C . Lossless integration of multiple electronic health records for identifying pleiotropy using summary statistics. Nat Commun. 2021; 12(1):168. PMC: 7794298. DOI: 10.1038/s41467-020-20211-2. View

2.
Abdellaoui A, Yengo L, Verweij K, Visscher P . 15 years of GWAS discovery: Realizing the promise. Am J Hum Genet. 2023; 110(2):179-194. PMC: 9943775. DOI: 10.1016/j.ajhg.2022.12.011. View

3.
Liberzon A, Subramanian A, Pinchback R, Thorvaldsdottir H, Tamayo P, Mesirov J . Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011; 27(12):1739-40. PMC: 3106198. DOI: 10.1093/bioinformatics/btr260. View

4.
Bellou E, Stevenson-Hoare J, Escott-Price V . Polygenic risk and pleiotropy in neurodegenerative diseases. Neurobiol Dis. 2020; 142:104953. PMC: 7378564. DOI: 10.1016/j.nbd.2020.104953. View

5.
Rankinen T, Sarzynski M, Ghosh S, Bouchard C . Are there genetic paths common to obesity, cardiovascular disease outcomes, and cardiovascular risk factors?. Circ Res. 2015; 116(5):909-22. PMC: 4416656. DOI: 10.1161/CIRCRESAHA.116.302888. View