Accuracy of Medical Billing Data Against the Electronic Health Record in the Measurement of Colorectal Cancer Screening Rates
Overview
Authors
Affiliations
Objective: Medical billing data are an attractive source of secondary analysis because of their ease of use and potential to answer population-health questions with statistical power. Although these datasets have known susceptibilities to biases, the degree to which they can distort the assessment of quality measures such as colorectal cancer screening rates are not widely appreciated, nor are their causes and possible solutions.
Methods: Using a billing code database derived from our institution's electronic health records, we estimated the colorectal cancer screening rate of average-risk patients aged 50-74 years seen in primary care or gastroenterology clinic in 2016-2017. 200 records (150 unscreened, 50 screened) were sampled to quantify the accuracy against manual review.
Results: Out of 4611 patients, an analysis of billing data suggested a 61% screening rate, an estimate that matches the estimate by the Centers for Disease Control. Manual review revealed a positive predictive value of 96% (86%-100%), negative predictive value of 21% (15%-29%) and a corrected screening rate of 85% (81%-90%). Most false negatives occurred due to examinations performed outside the scope of the database-both within and outside of our institution-but 21% of false negatives fell within the database's scope. False positives occurred due to incomplete examinations and inadequate bowel preparation. Reasons for screening failure include ordered but incomplete examinations (48%), lack of or incorrect documentation by primary care (29%) including incorrect screening intervals (13%) and patients declining screening (13%).
Conclusions: Billing databases are prone to substantial bias that may go undetected even in the presence of confirmatory external estimates. Caution is recommended when performing population-level inference from these data. We propose several solutions to improve the use of these data for the assessment of healthcare quality.
Prediction Tools in Spine Surgery: A Narrative Review.
Jadresic M, Baker J Spine Surg Relat Res. 2025; 9(1):1-10.
PMID: 39935977 PMC: 11808232. DOI: 10.22603/ssrr.2024-0189.
Relative sparsity for medical decision problems.
Weisenthal S, Thurston S, Ertefaie A Stat Med. 2023; 42(18):3067-3092.
PMID: 37315949 PMC: 10524900. DOI: 10.1002/sim.9755.
Value and Implementation of the Aggregate Safety Assessment Plan.
Hendrickson B, Agarwal A, Bennett D, Kubler J, McShea C, Tremmel L Pharmaceut Med. 2023; 37(3):171-181.
PMID: 37072647 DOI: 10.1007/s40290-023-00470-2.
Data Challenges in Identifying Patients Due for Colorectal Cancer Screening in Rural Clinics.
Petrik A, Coury J, Larson J, Badicke B, Coronado G, Davis M J Am Board Fam Med. 2023; 36(1):118-129.
PMID: 36759133 PMC: 10187985. DOI: 10.3122/jabfm.2022.220216R1.
Rethinking PICO in the Machine Learning Era: ML-PICO.
Liu X, Anstey J, Li R, Sarabu C, Sono R, Butte A Appl Clin Inform. 2021; 12(2):407-416.
PMID: 34010977 PMC: 8133838. DOI: 10.1055/s-0041-1729752.