» Articles » PMID: 24923281

Detecting Contaminated Birthdates Using Generalized Additive Models

Overview
Publisher Biomed Central
Specialty Biology
Date 2014 Jun 14
PMID 24923281
Citations 1
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Erroneous patient birthdates are common in health databases. Detection of these errors usually involves manual verification, which can be resource intensive and impractical. By identifying a frequent manifestation of birthdate errors, this paper presents a principled and statistically driven procedure to identify erroneous patient birthdates.

Results: Generalized additive models (GAM) enabled explicit incorporation of known demographic trends and birth patterns. With false positive rates controlled, the method identified birthdate contamination with high accuracy. In the health data set used, of the 58 actual incorrect birthdates manually identified by the domain expert, the GAM-based method identified 51, with 8 false positives (resulting in a positive predictive value of 86.0% (51/59) and a false negative rate of 12.0% (7/58)). These results outperformed linear time-series models.

Conclusions: The GAM-based method is an effective approach to identify systemic birthdate errors, a common data quality issue in both clinical and administrative databases, with high accuracy.

Citing Articles

Data cleaning process for HIV-indicator data extracted from DHIS2 national reporting system: a case study of Kenya.

Gesicho M, Were M, Babic A BMC Med Inform Decis Mak. 2020; 20(1):293.

PMID: 33187520 PMC: 7664027. DOI: 10.1186/s12911-020-01315-7.

References
1.
Ensor T, Cooper S, Davidson L, Fitzmaurice A, Graham W . The impact of economic recession on maternal and infant mortality: lessons from history. BMC Public Health. 2010; 10:727. PMC: 3002333. DOI: 10.1186/1471-2458-10-727. View

2.
Cohen J . Human population: the next half century. Science. 2003; 302(5648):1172-5. DOI: 10.1126/science.1088665. View

3.
Just B, Proffitt K . Do you know who's who in your EHR?. Healthc Financ Manage. 2009; 63(8):68-73. View

4.
Arts D, de Keizer N, Scheffer G . Defining and improving data quality in medical registries: a literature review, case study, and generic framework. J Am Med Inform Assoc. 2002; 9(6):600-11. PMC: 349377. DOI: 10.1197/jamia.m1087. View

5.
Harvei S, Tretli S, Langmark F . Quality of prostate cancer data in the cancer registry of Norway. Eur J Cancer. 1996; 32A(1):104-10. DOI: 10.1016/0959-8049(95)00501-3. View