» Articles » PMID: 36227072

Transforming and Evaluating the UK Biobank to the OMOP Common Data Model for COVID-19 Research and Beyond

Abstract

Objective: The coronavirus disease 2019 (COVID-19) pandemic has demonstrated the value of real-world data for public health research. International federated analyses are crucial for informing policy makers. Common data models (CDMs) are critical for enabling these studies to be performed efficiently. Our objective was to convert the UK Biobank, a study of 500 000 participants with rich genetic and phenotypic data to the Observational Medical Outcomes Partnership (OMOP) CDM.

Materials And Methods: We converted UK Biobank data to OMOP CDM v. 5.3. We transformedparticipant research data on diseases collected at recruitment and electronic health records (EHRs) from primary care, hospitalizations, cancer registrations, and mortality from providers in England, Scotland, and Wales. We performed syntactic and semantic validations and compared comorbidities and risk factors between source and transformed data.

Results: We identified 502 505 participants (3086 with COVID-19) and transformed 690 fields (1 373 239 555 rows) to the OMOP CDM using 8 different controlled clinical terminologies and bespoke mappings. Specifically, we transformed self-reported noncancer illnesses 946 053 (83.91% of all source entries), cancers 37 802 (70.81%), medications 1 218 935 (88.25%), and prescriptions 864 788 (86.96%). In EHR, we transformed 13 028 182 (99.95%) hospital diagnoses, 6 465 399 (89.2%) procedures, 337 896 333 primary care diagnoses (CTV3, SNOMED-CT), 139 966 587 (98.74%) prescriptions (dm+d) and 77 127 (99.95%) deaths (ICD-10). We observed good concordance across demographic, risk factor, and comorbidity factors between source and transformed data.

Discussion And Conclusion: Our study demonstrated that the OMOP CDM can be successfully leveraged to harmonize complex large-scale biobanked studies combining rich multimodal phenotypic data. Our study uncovered several challenges when transforming data from questionnaires to the OMOP CDM which require further research. The transformed UK Biobank resource is a valuable tool that can enable federated research, like COVID-19 studies.

Citing Articles

A machine learning approach to leveraging electronic health records for enhanced omics analysis.

Mataraso S, Espinosa C, Seong D, Reincke S, Berson E, Reiss J Nat Mach Intell. 2025; 7(2):293-306.

PMID: 40008295 PMC: 11847705. DOI: 10.1038/s42256-024-00974-9.


Informatics assessment of COVID-19 data collection: an analysis of UK Biobank questionnaire data.

Mayer C BMC Med Inform Decis Mak. 2024; 24(1):321.

PMID: 39482694 PMC: 11529153. DOI: 10.1186/s12911-024-02743-5.


Applying an ELSI lens to real-world data and novel genomic insights for personalized mental healthcare.

Hendricks-Sturrup R, Yankah S, Lu C Front Genet. 2024; 15:1444084.

PMID: 39205938 PMC: 11349570. DOI: 10.3389/fgene.2024.1444084.


Transforming Primary Care Data Into the Observational Medical Outcomes Partnership Common Data Model: Development and Usability Study.

Fruchart M, Quindroit P, Jacquemont C, Beuscart J, Calafiore M, Lamer A JMIR Med Inform. 2024; 12:e49542.

PMID: 39140273 PMC: 11337138. DOI: 10.2196/49542.


Artificial intelligence-enhanced patient evaluation: bridging art and science.

Oikonomou E, Khera R Eur Heart J. 2024; 45(35):3204-3218.

PMID: 38976371 PMC: 11400875. DOI: 10.1093/eurheartj/ehae415.


References
1.
Kostka K, Duarte-Salles T, Prats-Uribe A, Sena A, Pistillo A, Khalid S . Unraveling COVID-19: A Large-Scale Characterization of 4.5 Million COVID-19 Cases Using CHARYBDIS. Clin Epidemiol. 2022; 14:369-384. PMC: 8957305. DOI: 10.2147/CLEP.S323292. View

2.
Li X, Raventos B, Roel E, Pistillo A, Martinez-Hernandez E, Delmestri A . Association between covid-19 vaccination, SARS-CoV-2 infection, and risk of immune mediated neurological events: population based cohort and self-controlled case series analysis. BMJ. 2022; 376:e068373. PMC: 8924704. DOI: 10.1136/bmj-2021-068373. View

3.
Williams R, Markus A, Yang C, Duarte-Salles T, DuVall S, Falconer T . Seek COVER: using a disease proxy to rapidly develop and validate a personalized risk calculator for COVID-19 outcomes in an international network. BMC Med Res Methodol. 2022; 22(1):35. PMC: 8801189. DOI: 10.1186/s12874-022-01505-z. View

4.
Li X, Ostropolets A, Makadia R, Shoaibi A, Rao G, Sena A . Characterising the background incidence rates of adverse events of special interest for covid-19 vaccines in eight countries: multinational network cohort study. BMJ. 2022; 373:n1435. PMC: 8193077. DOI: 10.1136/bmj.n1435. View

5.
Shoaibi A, Rao G, Voss E, Ostropolets A, Mayer M, Ramirez-Anguita J . Phenotype Algorithms for the Identification and Characterization of Vaccine-Induced Thrombotic Thrombocytopenia in Real World Data: A Multinational Network Cohort Study. Drug Saf. 2022; 45(6):685-698. PMC: 9160850. DOI: 10.1007/s40264-022-01187-y. View