» Articles » PMID: 20630989

DataSHIELD: Resolving a Conflict in Contemporary Bioscience--performing a Pooled Analysis of Individual-level Data Without Sharing the Data

Overview
Journal Int J Epidemiol
Specialty Public Health
Date 2010 Jul 16
PMID 20630989
Citations 83
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Contemporary bioscience sometimes demands vast sample sizes and there is often then no choice but to synthesize data across several studies and to undertake an appropriate pooled analysis. This same need is also faced in health-services and socio-economic research. When a pooled analysis is required, analytic efficiency and flexibility are often best served by combining the individual-level data from all sources and analysing them as a single large data set. But ethico-legal constraints, including the wording of consent forms and privacy legislation, often prohibit or discourage the sharing of individual-level data, particularly across national or other jurisdictional boundaries. This leads to a fundamental conflict in competing public goods: individual-level analysis is desirable from a scientific perspective, but is prevented by ethico-legal considerations that are entirely valid.

Methods: Data aggregation through anonymous summary-statistics from harmonized individual-level databases (DataSHIELD), provides a simple approach to analysing pooled data that circumvents this conflict. This is achieved via parallelized analysis and modern distributed computing and, in one key setting, takes advantage of the properties of the updating algorithm for generalized linear models (GLMs).

Results: The conceptual use of DataSHIELD is illustrated in two different settings.

Conclusions: As the study of the aetiological architecture of chronic diseases advances to encompass more complex causal pathways-e.g. to include the joint effects of genes, lifestyle and environment-sample size requirements will increase further and the analysis of pooled individual-level data will become ever more important. An aim of this conceptual article is to encourage others to address the challenges and opportunities that DataSHIELD presents, and to explore potential extensions, for example to its use when different data sources hold different data on the same individuals.

Citing Articles

Advancements in Umbilical Cord Biobanking: A Comprehensive Review of Current Trends and Future Prospects.

AlOraibi S, Taurin S, Alshammary S Stem Cells Cloning. 2024; 17:41-58.

PMID: 39655226 PMC: 11626973. DOI: 10.2147/SCCAA.S481072.


Privacy-friendly evaluation of patient data with secure multiparty computation in a European pilot study.

Ballhausen H, Corradini S, Belka C, Bogdanov D, Boldrini L, Bono F NPJ Digit Med. 2024; 7(1):280.

PMID: 39397162 PMC: 11471812. DOI: 10.1038/s41746-024-01293-4.


A roadmap to advance exposomics through federation of data.

Schmitt C, Stingone J, Rajasekar A, Cui Y, Du X, Duncan C Exposome. 2024; 3(1).

PMID: 39267798 PMC: 11391905. DOI: 10.1093/exposome/osad010.


Sharing sensitive data in life sciences: an overview of centralized and federated approaches.

Rujano M, Boiten J, Ohmann C, Canham S, Contrino S, David R Brief Bioinform. 2024; 25(4).

PMID: 38836701 PMC: 11151787. DOI: 10.1093/bib/bbae262.


Online causal inference with application to near real-time post-market vaccine safety surveillance.

Luo L, Risk M, Shi X Stat Med. 2024; 43(14):2734-2746.

PMID: 38693559 PMC: 11218898. DOI: 10.1002/sim.10095.


References
1.
Malfroy M, Llewelyn C, Johnson T, Williamson L . Using patient-identifiable data for epidemiological research. Transfus Med. 2004; 14(4):275-9. DOI: 10.1111/j.0958-7578.2004.00514.x. View

2.
Slimani N, Deharveng G, Charrondiere R, van Kappel A, Ocke M, Welch A . Structure of the standardized computerized 24-h diet recall interview used as reference method in the 22 centers participating in the EPIC project. European Prospective Investigation into Cancer and Nutrition. Comput Methods Programs Biomed. 1999; 58(3):251-66. DOI: 10.1016/s0169-2607(98)00088-1. View

3.
Frayling T, Timpson N, Weedon M, Zeggini E, Freathy R, Lindgren C . A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science. 2007; 316(5826):889-94. PMC: 2646098. DOI: 10.1126/science.1141634. View

4.
Hindorff L, Sethupathy P, Junkins H, Ramos E, Mehta J, Collins F . Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A. 2009; 106(23):9362-7. PMC: 2687147. DOI: 10.1073/pnas.0903103106. View

5.
. Grinding to a halt: the effects of the increasing regulatory burden on research and quality improvement efforts. Clin Infect Dis. 2009; 49(3):328-35. DOI: 10.1086/605454. View