Policy-aware Data Lakes: a Flexible Approach to Achieve Legal Interoperability for Global Research Collaborations
Overview
Authors
Affiliations
A popular model for global scientific repositories is the data commons, which pools or connects many datasets alongside supporting infrastructure. A data commons must establish legally interoperability between datasets to ensure researchers can aggregate and reuse them. This is usually achieved by establishing a shared governance structure. Unfortunately, governance often takes years to negotiate and involves a trade-off between data inclusion and data availability. It can also be difficult for repositories to modify governance structures in response to changing scientific priorities, data sharing practices, or legal frameworks. This problem has been laid bare by the sudden shock of the COVID-19 pandemic. This paper proposes a rapid and flexible strategy for scientific repositories to achieve legal interoperability: the policy-aware data lake. This strategy draws on technical concepts of modularity, metadata, and data lakes. Datasets are treated as independent modules, which can be subject to distinctive legal requirements. Each module must, however, be described using standard legal metadata. This allows legally compatible datasets to be rapidly combined and made available on a just-in-time basis to certain researchers for certain purposes. Global scientific repositories increasingly need such flexibility to manage scientific, organizational, and legal complexity, and to improve their responsiveness to global pandemics.
Bernier A, Knoppers B, Bermudez P, Beauvais M, Thorogood A, Evans A Gigascience. 2024; 13.
PMID: 38217404 PMC: 10787360. DOI: 10.1093/gigascience/giad114.
The Data Use Ontology to streamline responsible access to human biomedical datasets.
Lawson J, Cabili M, Kerry G, Boughtwood T, Thorogood A, Alper P Cell Genom. 2021; 1(2):None.
PMID: 34820659 PMC: 8591903. DOI: 10.1016/j.xgen.2021.100028.
Health informatics and EHR to support clinical research in the COVID-19 pandemic: an overview.
Dagliati A, Malovini A, Tibollo V, Bellazzi R Brief Bioinform. 2021; 22(2):812-822.
PMID: 33454728 PMC: 7929411. DOI: 10.1093/bib/bbaa418.