» Articles » PMID: 27580049

Identifying Cases of Type 2 Diabetes in Heterogeneous Data Sources: Strategy from the EMIF Project

Abstract

Due to the heterogeneity of existing European sources of observational healthcare data, data source-tailored choices are needed to execute multi-data source, multi-national epidemiological studies. This makes transparent documentation paramount. In this proof-of-concept study, a novel standard data derivation procedure was tested in a set of heterogeneous data sources. Identification of subjects with type 2 diabetes (T2DM) was the test case. We included three primary care data sources (PCDs), three record linkage of administrative and/or registry data sources (RLDs), one hospital and one biobank. Overall, data from 12 million subjects from six European countries were extracted. Based on a shared event definition, sixteeen standard algorithms (components) useful to identify T2DM cases were generated through a top-down/bottom-up iterative approach. Each component was based on one single data domain among diagnoses, drugs, diagnostic test utilization and laboratory results. Diagnoses-based components were subclassified considering the healthcare setting (primary, secondary, inpatient care). The Unified Medical Language System was used for semantic harmonization within data domains. Individual components were extracted and proportion of population identified was compared across data sources. Drug-based components performed similarly in RLDs and PCDs, unlike diagnoses-based components. Using components as building blocks, logical combinations with AND, OR, AND NOT were tested and local experts recommended their preferred data source-tailored combination. The population identified per data sources by resulting algorithms varied from 3.5% to 15.7%, however, age-specific results were fairly comparable. The impact of individual components was assessed: diagnoses-based components identified the majority of cases in PCDs (93-100%), while drug-based components were the main contributors in RLDs (81-100%). The proposed data derivation procedure allowed the generation of data source-tailored case-finding algorithms in a standardized fashion, facilitated transparent documentation of the process and benchmarking of data sources, and provided bases for interpretation of possible inter-data source inconsistency of findings in future studies.

Citing Articles

Validity of Italian administrative healthcare data in describing the real-world utilization of infusive antineoplastic drugs: the study case of rituximab use in patients treated at the University Hospital of Siena for onco-haematological indications.

Bartolini C, Roberto G, Girardi A, Moscatelli V, Spini A, Barchielli A Front Oncol. 2023; 13:1059109.

PMID: 37324023 PMC: 10264685. DOI: 10.3389/fonc.2023.1059109.


Methodology of the brodalumab assessment of hazards: a multicentre observational safety (BRAHMS) study.

Reilev M, Jensen P, Ranch L, Egeberg A, Furu K, Gembert K BMJ Open. 2023; 13(2):e066057.

PMID: 36725094 PMC: 9896233. DOI: 10.1136/bmjopen-2022-066057.


From Inception to ConcePTION: Genesis of a Network to Support Better Monitoring and Communication of Medication Safety During Pregnancy and Breastfeeding.

Thurin N, Pajouheshnia R, Roberto G, Dodd C, Hyeraci G, Bartolini C Clin Pharmacol Ther. 2021; 111(1):321-331.

PMID: 34826340 PMC: 9299060. DOI: 10.1002/cpt.2476.


Real-World Utilization of Target- and Immunotherapies for Lung Cancer: A Scoping Review of Studies Based on Routinely Collected Electronic Healthcare Data.

Spini A, Hyeraci G, Bartolini C, Donnini S, Rosellini P, Gini R Int J Environ Res Public Health. 2021; 18(14).

PMID: 34300130 PMC: 8305284. DOI: 10.3390/ijerph18147679.


Vascular and metabolic risk factor differences prior to dementia diagnosis: a multidatabase case-control study using European electronic health records.

Perera G, Rijnbeek P, Alexander M, Ansell D, Avillach P, Duarte-Salles T BMJ Open. 2020; 10(11):e038753.

PMID: 33191253 PMC: 7668358. DOI: 10.1136/bmjopen-2020-038753.


References
1.
Richesson R, Rusincovitch S, Wixted D, Batch B, Feinglos M, Miranda M . A comparison of phenotype definitions for diabetes mellitus. J Am Med Inform Assoc. 2013; 20(e2):e319-26. PMC: 3861928. DOI: 10.1136/amiajnl-2013-001952. View

2.
Ryden L, Grant P, Anker S, Berne C, Cosentino F, Danchin N . ESC Guidelines on diabetes, pre-diabetes, and cardiovascular diseases developed in collaboration with the EASD: the Task Force on diabetes, pre-diabetes, and cardiovascular diseases of the European Society of Cardiology (ESC) and developed in.... Eur Heart J. 2013; 34(39):3035-87. DOI: 10.1093/eurheartj/eht108. View

3.
Morley K, Wallace J, Denaxas S, Hunter R, Patel R, Perel P . Defining disease phenotypes using national linked electronic health records: a case study of atrial fibrillation. PLoS One. 2014; 9(11):e110900. PMC: 4219705. DOI: 10.1371/journal.pone.0110900. View

4.
Johannesdottir S, Horvath-Puho E, Ehrenstein V, Schmidt M, Pedersen L, Sorensen H . Existing data sources for clinical epidemiology: The Danish National Database of Reimbursed Prescriptions. Clin Epidemiol. 2012; 4:303-13. PMC: 3508607. DOI: 10.2147/CLEP.S37587. View

5.
Avillach P, Coloma P, Gini R, Schuemie M, Mougin F, Dufour J . Harmonization process for the identification of medical events in eight European healthcare databases: the experience from the EU-ADR project. J Am Med Inform Assoc. 2012; 20(1):184-92. PMC: 3555316. DOI: 10.1136/amiajnl-2012-000933. View