» Articles » PMID: 38836701

Sharing Sensitive Data in Life Sciences: an Overview of Centralized and Federated Approaches

Abstract

Biomedical data are generated and collected from various sources, including medical imaging, laboratory tests and genome sequencing. Sharing these data for research can help address unmet health needs, contribute to scientific breakthroughs, accelerate the development of more effective treatments and inform public health policy. Due to the potential sensitivity of such data, however, privacy concerns have led to policies that restrict data sharing. In addition, sharing sensitive data requires a secure and robust infrastructure with appropriate storage solutions. Here, we examine and compare the centralized and federated data sharing models through the prism of five large-scale and real-world use cases of strategic significance within the European data sharing landscape: the French Health Data Hub, the BBMRI-ERIC Colorectal Cancer Cohort, the federated European Genome-phenome Archive, the Observational Medical Outcomes Partnership/OHDSI network and the EBRAINS Medical Informatics Platform. Our analysis indicates that centralized models facilitate data linkage, harmonization and interoperability, while federated models facilitate scaling up and legal compliance, as the data typically reside on the data generator's premises, allowing for better control of how data are shared. This comparative study thus offers guidance on the selection of the most appropriate sharing strategy for sensitive datasets and provides key insights for informed decision-making in data sharing efforts.

Citing Articles

Publicly available continuously updated topic specific databases of randomised clinical trials: A scoping review.

Boesen K, Hemkens L, Janiaud P, Hirt J medRxiv. 2024; .

PMID: 39606403 PMC: 11601726. DOI: 10.1101/2024.11.18.24317477.

References
1.
Lee D, ONeill T, Pye S, Silman A, Finn J, Pendleton N . The European Male Ageing Study (EMAS): design, methods and recruitment. Int J Androl. 2008; 32(1):11-24. DOI: 10.1111/j.1365-2605.2008.00879.x. View

2.
Rehm H, Page A, Smith L, Adams J, Alterovitz G, Babb L . GA4GH: International policies and standards for data sharing across genomic research and healthcare. Cell Genom. 2022; 1(2). PMC: 8774288. DOI: 10.1016/j.xgen.2021.100029. View

3.
Ahmadi N, Peng Y, Wolfien M, Zoch M, Sedlmayr M . OMOP CDM Can Facilitate Data-Driven Studies for Cancer Prediction: A Systematic Review. Int J Mol Sci. 2022; 23(19). PMC: 9569469. DOI: 10.3390/ijms231911834. View

4.
Voss E, Makadia R, Matcho A, Ma Q, Knoll C, Schuemie M . Feasibility and utility of applications of the common data model to multiple, disparate observational health databases. J Am Med Inform Assoc. 2015; 22(3):553-64. PMC: 4457111. DOI: 10.1093/jamia/ocu023. View

5.
Athar A, Fullgrabe A, George N, Iqbal H, Huerta L, Ali A . ArrayExpress update - from bulk to single-cell expression data. Nucleic Acids Res. 2018; 47(D1):D711-D715. PMC: 6323929. DOI: 10.1093/nar/gky964. View