» Articles » PMID: 28398525

A Longitudinal Analysis of Data Quality in a Large Pediatric Data Research Network

Abstract

Objective: PEDSnet is a clinical data research network (CDRN) that aggregates electronic health record data from multiple children's hospitals to enable large-scale research. Assessing data quality to ensure suitability for conducting research is a key requirement in PEDSnet. This study presents a range of data quality issues identified over a period of 18 months and interprets them to evaluate the research capacity of PEDSnet.

Materials And Methods: Results were generated by a semiautomated data quality assessment workflow. Two investigators reviewed programmatic data quality issues and conducted discussions with the data partners' extract-transform-load analysts to determine the cause for each issue.

Results: The results include a longitudinal summary of 2182 data quality issues identified across 9 data submission cycles. The metadata from the most recent cycle includes annotations for 850 issues: most frequent types, including missing data (>300) and outliers (>100); most complex domains, including medications (>160) and lab measurements (>140); and primary causes, including source data characteristics (83%) and extract-transform-load errors (9%).

Discussion: The longitudinal findings demonstrate the network's evolution from identifying difficulties with aligning the data to a common data model to learning norms in clinical pediatrics and determining research capability.

Conclusion: While data quality is recognized as a critical aspect in establishing and utilizing a CDRN, the findings from data quality assessments are largely unpublished. This paper presents a real-world account of studying and interpreting data quality findings in a pediatric CDRN, and the lessons learned could be used by other CDRNs.

Citing Articles

The challenges and opportunities of continuous data quality improvement for healthcare administration data.

Zhang Y, Callaghan-Koru J, Koru G JAMIA Open. 2024; 7(3):ooae058.

PMID: 39091510 PMC: 11293638. DOI: 10.1093/jamiaopen/ooae058.


Electronic health records identify timely trends in childhood mental health conditions.

Elia J, Pajer K, Prasad R, Pumariega A, Maltenfort M, Utidjian L Child Adolesc Psychiatry Ment Health. 2023; 17(1):107.

PMID: 37710303 PMC: 10503059. DOI: 10.1186/s13034-023-00650-7.


Electronic health record data quality assessment and tools: a systematic review.

Lewis A, Weiskopf N, Abrams Z, Foraker R, Lai A, Payne P J Am Med Inform Assoc. 2023; 30(10):1730-1740.

PMID: 37390812 PMC: 10531113. DOI: 10.1093/jamia/ocad120.


Targeted Data Quality Analysis for a Clinical Decision Support System for SIRS Detection in Critically Ill Pediatric Patients.

Tute E, Mast M, Wulff A Methods Inf Med. 2023; 62(S 01):e1-e9.

PMID: 36630987 PMC: 10306443. DOI: 10.1055/s-0042-1760238.


Using set visualisation to find and explain patterns of missing values: a case study with NHS hospital episode statistics data.

Ruddle R, Adnan M, Hall M BMJ Open. 2022; 12(11):e064887.

PMID: 36410820 PMC: 9680176. DOI: 10.1136/bmjopen-2022-064887.


References
1.
Huser V, Defalco F, Schuemie M, Ryan P, Shang N, Velez M . Multisite Evaluation of a Data Quality Tool for Patient-Level Clinical Data Sets. EGEMS (Wash DC). 2017; 4(1):1239. PMC: 5226382. DOI: 10.13063/2327-9214.1239. View

2.
Hersh W, Weiner M, Embi P, Logan J, Payne P, Bernstam E . Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care. 2013; 51(8 Suppl 3):S30-7. PMC: 3748381. DOI: 10.1097/MLR.0b013e31829b1dbd. View

3.
Kahn M, Raebel M, Glanz J, Riedlinger K, Steiner J . A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research. Med Care. 2012; 50 Suppl:S21-9. PMC: 3833692. DOI: 10.1097/MLR.0b013e318257dd67. View

4.
Kahn M, Brown J, Chun A, Davidson B, Meeker D, Ryan P . Transparent reporting of data quality in distributed data networks. EGEMS (Wash DC). 2015; 3(1):1052. PMC: 4434997. DOI: 10.13063/2327-9214.1052. View

5.
Kahn M, Callahan T, Barnard J, Bauck A, Brown J, Davidson B . A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data. EGEMS (Wash DC). 2016; 4(1):1244. PMC: 5051581. DOI: 10.13063/2327-9214.1244. View