» Articles » PMID: 39351341

Multi-Task Learning with Summary Statistics

Overview
Date 2024 Oct 1
PMID 39351341
Authors
Affiliations
Soon will be listed here.
Abstract

Multi-task learning has emerged as a powerful machine learning paradigm for integrating data from multiple sources, leveraging similarities between tasks to improve overall model performance. However, the application of multi-task learning to real-world settings is hindered by data-sharing constraints, especially in healthcare settings. To address this challenge, we propose a flexible multi-task learning framework utilizing summary statistics from various sources. Additionally, we present an adaptive parameter selection approach based on a variant of Lepski's method, allowing for data-driven tuning parameter selection when only summary statistics are available. Our systematic non-asymptotic analysis characterizes the performance of the proposed methods under various regimes of the sample complexity and overlap. We demonstrate our theoretical findings and the performance of the method through extensive simulations. This work offers a more flexible tool for training related models across various domains, with practical implications in genetic risk prediction and many other fields.

References
1.
Yang C, Xiao Y, Zhang Y, Sun Y, Han J . Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark. IEEE Trans Knowl Data Eng. 2023; 34(10):4854-4873. PMC: 10619966. DOI: 10.1109/tkde.2020.3045924. View

2.
Chatterjee N, Shi J, Garcia-Closas M . Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat Rev Genet. 2016; 17(7):392-406. PMC: 6021129. DOI: 10.1038/nrg.2016.27. View

3.
Duan R, Boland M, Liu Z, Liu Y, Chang H, Xu H . Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm. J Am Med Inform Assoc. 2019; 27(3):376-385. PMC: 7025371. DOI: 10.1093/jamia/ocz199. View

4.
McCarty C, Chisholm R, Chute C, Kullo I, Jarvik G, Larson E . The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med Genomics. 2011; 4:13. PMC: 3038887. DOI: 10.1186/1755-8794-4-13. View

5.
Yan Z, Zachrison K, Schwamm L, Estrada J, Duan R . A privacy-preserving and computation-efficient federated algorithm for generalized linear mixed models to analyze correlated electronic health records data. PLoS One. 2023; 18(1):e0280192. PMC: 9844867. DOI: 10.1371/journal.pone.0280192. View