» Articles » PMID: 29500022

Federated Learning of Predictive Models from Federated Electronic Health Records

Overview
Date 2018 Mar 4
PMID 29500022
Citations 101
Authors
Affiliations
Soon will be listed here.
Abstract

Background: In an era of "big data," computationally efficient and privacy-aware solutions for large-scale machine learning problems become crucial, especially in the healthcare domain, where large amounts of data are stored in different locations and owned by different entities. Past research has been focused on centralized algorithms, which assume the existence of a central data repository (database) which stores and can process the data from all participants. Such an architecture, however, can be impractical when data are not centrally located, it does not scale well to very large datasets, and introduces single-point of failure risks which could compromise the integrity and privacy of the data. Given scores of data widely spread across hospitals/individuals, a decentralized computationally scalable methodology is very much in need.

Objective: We aim at solving a binary supervised classification problem to predict hospitalizations for cardiac events using a distributed algorithm. We seek to develop a general decentralized optimization framework enabling multiple data holders to collaborate and converge to a common predictive model, without explicitly exchanging raw data.

Methods: We focus on the soft-margin l-regularized sparse Support Vector Machine (sSVM) classifier. We develop an iterative cluster Primal Dual Splitting (cPDS) algorithm for solving the large-scale sSVM problem in a decentralized fashion. Such a distributed learning scheme is relevant for multi-institutional collaborations or peer-to-peer applications, allowing the data holders to collaborate, while keeping every participant's data private.

Results: We test cPDS on the problem of predicting hospitalizations due to heart diseases within a calendar year based on information in the patients Electronic Health Records prior to that year. cPDS converges faster than centralized methods at the cost of some communication between agents. It also converges faster and with less communication overhead compared to an alternative distributed algorithm. In both cases, it achieves similar prediction accuracy measured by the Area Under the Receiver Operating Characteristic Curve (AUC) of the classifier. We extract important features discovered by the algorithm that are predictive of future hospitalizations, thus providing a way to interpret the classification results and inform prevention efforts.

Citing Articles

Addressing contemporary threats in anonymised healthcare data using privacy engineering.

Narayan S, Kohli N, Martin M NPJ Digit Med. 2025; 8(1):145.

PMID: 40050672 PMC: 11885643. DOI: 10.1038/s41746-025-01520-6.


Convergence of nanotechnology and artificial intelligence in the fight against liver cancer: a comprehensive review.

Bhange M, Telange D Discov Oncol. 2025; 16(1):77.

PMID: 39841330 PMC: 11754566. DOI: 10.1007/s12672-025-01821-y.


The role of artificial intelligence in pandemic responses: from epidemiological modeling to vaccine development.

Gawande M, Zade N, Kumar P, Gundewar S, Weerarathna I, Verma P Mol Biomed. 2025; 6(1):1.

PMID: 39747786 PMC: 11695538. DOI: 10.1186/s43556-024-00238-3.


Decades in the Making: The Evolution of Digital Health Research Infrastructure Through Synthetic Data, Common Data Models, and Federated Learning.

Austin J, Lobo E, Samadbeik M, Engstrom T, Philip R, Pole J J Med Internet Res. 2024; 26:e58637.

PMID: 39705072 PMC: 11699496. DOI: 10.2196/58637.


Integrating deep learning for visual question answering in Agricultural Disease Diagnostics: Case Study of Wheat Rust.

Nanavaty A, Sharma R, Pandita B, Goyal O, Rallapalli S, Mandal M Sci Rep. 2024; 14(1):28203.

PMID: 39548249 PMC: 11568177. DOI: 10.1038/s41598-024-79793-2.


References
1.
Dai W, Brisimi T, Adams W, Mela T, Saligrama V, Paschalidis I . Prediction of hospitalization due to heart diseases by supervised learning methods. Int J Med Inform. 2014; 84(3):189-97. PMC: 4314395. DOI: 10.1016/j.ijmedinf.2014.10.002. View

2.
Son Y, Kim H, Kim E, Choi S, Lee S . Application of support vector machine for prediction of medication adherence in heart failure patients. Healthc Inform Res. 2011; 16(4):253-9. PMC: 3092139. DOI: 10.4258/hir.2010.16.4.253. View

3.
DAgostino Sr R, Vasan R, Pencina M, Wolf P, Cobain M, Massaro J . General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation. 2008; 117(6):743-53. DOI: 10.1161/CIRCULATIONAHA.107.699579. View

4.
Collins F, Varmus H . A new initiative on precision medicine. N Engl J Med. 2015; 372(9):793-5. PMC: 5101938. DOI: 10.1056/NEJMp1500523. View

5.
Khandoker A, Palaniswami M, Karmakar C . Support vector machines for automated recognition of obstructive sleep apnea syndrome from ECG recordings. IEEE Trans Inf Technol Biomed. 2009; 13(1):37-48. DOI: 10.1109/TITB.2008.2004495. View