Analyzing Real-World Use of Research Common Data Elements
Overview
Authors
Affiliations
Common Data Elements (CDEs) are defined as "data elements that are common to multiple data sets across different studies" and provide structured, standardized definitions so that data may be collected and used across different datasets. CDE collections are traditionally developed prospectively by subject-matter and domain experts. However, there has been little systematic research and evidence to demonstrate how CDEs are used in real-world datasets and the subsequent impact on data discoverability. Our study builds upon previous mapping work to investigate the number of CDEs that could be identified using a varying level of commonness threshold in a real-world data repository, the Database of Phenotypes and Genotypes (dbGaP). In an analyzed collection of mapped variables from 426 dbGaP studies, only 1,414 PhenX variables (PHENotypes and eXposures; a CDE initiative) are observed out of all 24,938 defined PhenX variables. Results include CDEs that are identified with varying levels of commonness thresholds. After the semantic grouping of 68 PhenX variables collected in at least 15 studies (n=15), we observed 32 truly "common" common data elements. We discuss benefits of post-hoc mapping of study data to a CDE framework for purposes of findability and reuse, as well as the informatics challenges of pre-populating clinical research case report forms with data from Electronic Health Record that are typically coded in terminologies aimed at routine healthcare needs.
Informatics assessment of COVID-19 data collection: an analysis of UK Biobank questionnaire data.
Mayer C BMC Med Inform Decis Mak. 2024; 24(1):321.
PMID: 39482694 PMC: 11529153. DOI: 10.1186/s12911-024-02743-5.
Development of a Uniform Apheresis Case Report Form for Standardized Collection of Apheresis Data.
Johnson A, Szczepiorkowski Z, Balogun R, Karam O, Nellis M, Schneiderman J J Clin Apher. 2024; 39(5):e22146.
PMID: 39420527 PMC: 11523286. DOI: 10.1002/jca.22146.
Layard Horsfall H, Loh R, Venkatesh A, Khan D, Lawrence A, Jayapalan R Pituitary. 2023; 26(6):645-652.
PMID: 37843726 PMC: 10665258. DOI: 10.1007/s11102-023-01357-w.
Heo S, Yu J, Kang E, Shin H, Ryu K, Kim C Healthc Inform Res. 2023; 29(3):246-255.
PMID: 37591680 PMC: 10440200. DOI: 10.4258/hir.2023.29.3.246.
Learning important common data elements from shared study data: The All of Us program analysis.
Mayer C, Huser V PLoS One. 2023; 18(7):e0283601.
PMID: 37418391 PMC: 10328251. DOI: 10.1371/journal.pone.0283601.