Can a Novel Natural Language Processing Model and Artificial Intelligence Automatically Generate Billing Codes From Spine Surgical Operative Notes?
Overview
Authors
Affiliations
Study Design: Retrospective cohort.
Objective: Billing and coding-related administrative tasks are a major source of healthcare expenditure in the United States. We aim to show that a second-iteration Natural Language Processing (NLP) machine learning algorithm, XLNet, can automate the generation of CPT codes from operative notes in ACDF, PCDF, and CDA procedures.
Methods: We collected 922 operative notes from patients who underwent ACDF, PCDF, or CDA from 2015 to 2020 and included CPT codes generated by the billing code department. We trained XLNet, a generalized autoregressive pretraining method, on this dataset and tested its performance by calculating AUROC and AUPRC.
Results: The performance of the model approached human accuracy. Trial 1 (ACDF) achieved an AUROC of .82 (range: .48-.93), an AUPRC of .81 (range: .45-.97), and class-by-class accuracy of 77% (range: 34%-91%); trial 2 (PCDF) achieved an AUROC of .83 (.44-.94), an AUPRC of .70 (.45-.96), and class-by-class accuracy of 71% (42%-93%); trial 3 (ACDF and CDA) achieved an AUROC of .95 (.68-.99), an AUPRC of .91 (.56-.98), and class-by-class accuracy of 87% (63%-99%); trial 4 (ACDF, PCDF, CDA) achieved an AUROC of .95 (.76-.99), an AUPRC of .84 (.49-.99), and class-by-class accuracy of 88% (70%-99%).
Conclusions: We show that the XLNet model can be successfully applied to orthopedic surgeon's operative notes to generate CPT billing codes. As NLP models as a whole continue to improve, billing can be greatly augmented with artificial intelligence assisted generation of CPT billing codes which will help minimize error and promote standardization in the process.
Evaluating Large Language Models for Automated CPT Code Prediction in Endovascular Neurosurgery.
Roy J, Self D, Isch E, Musmar B, Lan M, Keppetipola K J Med Syst. 2025; 49(1):15.
PMID: 39853605 DOI: 10.1007/s10916-025-02149-4.
Chatbot Demonstrates Moderate Interrater Reliability in Billing for Hand Surgery Clinic Encounters.
Latario L, Fowler J Hand (N Y). 2024; :15589447241295328.
PMID: 39548885 PMC: 11571175. DOI: 10.1177/15589447241295328.
Pawelczyk J, Kraus M, Eckl L, Nehrer S, Aurich M, Izadpanah K Arch Orthop Trauma Surg. 2024; 144(8):3541-3552.
PMID: 39127806 PMC: 11417067. DOI: 10.1007/s00402-024-05408-0.
Rupp M, Moser L, Hess S, Angele P, Aurich M, Dyrna F J Exp Orthop. 2024; 11(3):e12080.
PMID: 38974054 PMC: 11227606. DOI: 10.1002/jeo2.12080.
Applications of natural language processing tools in the surgical journey.
Le K, Tay S, Choy K, Verjans J, Sasanelli N, Kong J Front Surg. 2024; 11():1403540.
PMID: 38826809 PMC: 11140056. DOI: 10.3389/fsurg.2024.1403540.