» Articles » PMID: 31509880

Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC

Abstract

Background: High-quality clinical data and biological specimens are key for medical research and personalized medicine. The Biobanking and Biomolecular Resources Research Infrastructure-European Research Infrastructure Consortium (BBMRI-ERIC) aims to facilitate access to such biological resources. The accompanying ADOPT BBMRI-ERIC project kick-started BBMRI-ERIC by collecting colorectal cancer data from European biobanks.

Objectives: To transform these data into a common representation, a uniform approach for data integration and harmonization had to be developed. This article describes the design and the implementation of a toolset for this task.

Methods: Based on the semantics of a metadata repository, we developed a lexical bag-of-words matcher, capable of semiautomatically mapping local biobank terms to the central ADOPT BBMRI-ERIC terminology. Its algorithm supports fuzzy matching, utilization of synonyms, and sentiment tagging. To process the anonymized instance data based on these mappings, we also developed a data transformation application.

Results: The implementation was used to process the data from 10 European biobanks. The lexical matcher automatically and correctly mapped 78.48% of the 1,492 local biobank terms, and human experts were able to complete the remaining mappings. We used the expert-curated mappings to successfully process 147,608 data records from 3,415 patients.

Conclusion: A generic harmonization approach was created and successfully used for cross-institutional data harmonization across 10 European biobanks. The software tools were made available as open source.

Citing Articles

Data Management in Biobanking: Strategies, Challenges, and Future Directions.

Alkhatib R, Gaede K BioTech (Basel). 2024; 13(3).

PMID: 39311336 PMC: 11417763. DOI: 10.3390/biotech13030034.


Unlocking the potential of big data and AI in medicine: insights from biobanking.

Akyuz K, Abadia M, Goisauf M, Mayrhofer M Front Med (Lausanne). 2024; 11:1336588.

PMID: 38357641 PMC: 10864616. DOI: 10.3389/fmed.2024.1336588.


The Future of Biobanking: What Is Next?.

Caenazzo L, Tozzo P BioTech (Basel). 2022; 9(4).

PMID: 35822826 PMC: 9258311. DOI: 10.3390/biotech9040023.


Understanding the Nature of Metadata: Systematic Review.

Ulrich H, Kock-Schoppenhauer A, Deppenwiese N, Gott R, Kern J, Lablans M J Med Internet Res. 2022; 24(1):e25440.

PMID: 35014967 PMC: 8790684. DOI: 10.2196/25440.


Guidelines for Biobanking of Bone Marrow Adipose Tissue and Related Cell Types: Report of the Biobanking Working Group of the International Bone Marrow Adiposity Society.

Lucas S, Tencerova M, von der Weid B, Andersen T, Attane C, Behler-Janbeck F Front Endocrinol (Lausanne). 2021; 12:744527.

PMID: 34646237 PMC: 8503265. DOI: 10.3389/fendo.2021.744527.


References
1.
Kinkorova J . Biobanks in the era of personalized medicine: objectives, challenges, and innovation: Overview. EPMA J. 2016; 7:4. PMC: 4762166. DOI: 10.1186/s13167-016-0053-7. View

2.
Lablans M, Kadioglu D, Muscholl M, Uckert F . Exploiting Distributed, Heterogeneous and Sensitive Data Stocks while Maintaining the Owner's Data Sovereignty. Methods Inf Med. 2015; 54(4):346-52. DOI: 10.3414/ME14-01-0137. View

3.
Storf H, Schaaf J, Kadioglu D, Gobel J, Wagner T, Uckert F . [Registries for rare diseases : OSSE - An open-source framework for technical implementation]. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2017; 60(5):523-531. DOI: 10.1007/s00103-017-2536-7. View

4.
Yu A . Methods in biomedical ontology. J Biomed Inform. 2006; 39(3):252-66. DOI: 10.1016/j.jbi.2005.11.006. View

5.
Bodenreider O, Nelson S, Hole W, Chang H . Beyond synonymy: exploiting the UMLS semantics in mapping vocabularies. Proc AMIA Symp. 1999; :815-9. PMC: 2232139. View