Comparison of Natural Language Processing and Manual Coding for the Identification of Cross-Sectional Imaging Reports Suspicious for Lung Cancer

Overview

Journal JCO Clin Cancer Inform

Specialty Medical Informatics

Date 2019 Jan 18

PMID 30652545

Citations 13

Authors

Roxanne Wadia

Kathleen Akgun

Cynthia Brandt

Brenda T Fenton

Woody Levin

Andrew H Marple

Vijay Garla

Michal G Rose

Tamar Taddei

Caroline Taylor

Affiliations

Soon will be listed here.

Abstract

Purpose: To compare the accuracy and reliability of a natural language processing (NLP) algorithm with manual coding by radiologists, and the combination of the two methods, for the identification of patients whose computed tomography (CT) reports raised the concern for lung cancer.

Methods: An NLP algorithm was developed using Clinical Text Analysis and Knowledge Extraction System (cTAKES) with the Yale cTAKES Extensions and trained to differentiate between language indicating benign lesions and lesions concerning for lung cancer. A random sample of 450 chest CT reports performed at Veterans Affairs Connecticut Healthcare System between January 2014 and July 2015 was selected. A reference standard was created by the manual review of reports to determine if the text stated that follow-up was needed for concern for cancer. The NLP algorithm was applied to all reports and compared with case identification using the manual coding by the radiologists.

Results: A total of 450 reports representing 428 patients were analyzed. NLP had higher sensitivity and lower specificity than manual coding (77.3% v 51.5% and 72.5% v 82.5%, respectively). NLP and manual coding had similar positive predictive values (88.4% v 88.9%), and NLP had a higher negative predictive value than manual coding (54% v 38.5%). When NLP and manual coding were combined, sensitivity increased to 92.3%, with a decrease in specificity to 62.85%. Combined NLP and manual coding had a positive predictive value of 87.0% and a negative predictive value of 75.2%.

Conclusion: Our NLP algorithm was more sensitive than manual coding of CT chest reports for the identification of patients who required follow-up for suspicion of lung cancer. The combination of NLP and manual coding is a sensitive way to identify patients who need further workup for lung cancer.

Citing Articles

Developing and Validating an Automatic Support System for Tumor Coding in Pathology Reports in Spanish.

Villena F, Baez P, Penafiel S, Rojas M, Paredes I, Dunstan J JCO Clin Cancer Inform. 2025; 9:e2400124.

PMID: 39993248 PMC: 11872266. DOI: 10.1200/CCI.24.00124.

Enhancing diagnosis of benign lesions and lung cancer through ensemble text and breath analysis: a retrospective cohort study.

Wang H, Wu Y, Sun M, Cui X Sci Rep. 2024; 14(1):8731.

PMID: 38627587 PMC: 11021445. DOI: 10.1038/s41598-024-59474-w.

Extracting cancer concepts from clinical notes using natural language processing: a systematic review.

Gholipour M, Khajouei R, Amiri P, Gohari S, Ahmadian L BMC Bioinformatics. 2023; 24(1):405.

PMID: 37898795 PMC: 10613366. DOI: 10.1186/s12859-023-05480-0.

Natural Language Processing Applications for Computer-Aided Diagnosis in Oncology.

Li C, Zhang Y, Weng Y, Wang B, Li Z Diagnostics (Basel). 2023; 13(2).

PMID: 36673096 PMC: 9857980. DOI: 10.3390/diagnostics13020286.

A framework for a consistent and reproducible evaluation of manual review for patient matching algorithms.

Gupta A, Kasthurirathne S, Xu H, Li X, Ruppert M, Harle C J Am Med Inform Assoc. 2022; 29(12):2105-2109.

PMID: 36305781 PMC: 9667171. DOI: 10.1093/jamia/ocac175.

References

Bach P, Mirkin J, Oliver T, Azzoli C, Berry D, Brawley O . Benefits and harms of CT screening for lung cancer: a systematic review. JAMA. 2012; 307(22):2418-29. PMC: 3709596. DOI: 10.1001/jama.2012.5521. View

Brown D . Smoking prevalence among US veterans. J Gen Intern Med. 2009; 25(2):147-9. PMC: 2837499. DOI: 10.1007/s11606-009-1160-0. View

Singh H, Hirani K, Kadiyala H, Rudomiotov O, Davis T, Khan M . Characteristics and predictors of missed opportunities in lung cancer diagnosis: an electronic health record-based study. J Clin Oncol. 2010; 28(20):3307-15. PMC: 2903328. DOI: 10.1200/JCO.2009.25.6636. View

Malhotra R, Indrayan A . A simple nomogram for sample size for estimating sensitivity and specificity of medical tests. Indian J Ophthalmol. 2010; 58(6):519-22. PMC: 2993983. DOI: 10.4103/0301-4738.71699. View

Torre L, Siegel R, Jemal A . Lung Cancer Statistics. Adv Exp Med Biol. 2015; 893:1-19. DOI: 10.1007/978-3-319-24223-1_1. View

Klevens R, Giovino G, Peddicord J, Nelson D, Mowery P, Grummer-Strawn L . The association between veteran status and cigarette-smoking behaviors. Am J Prev Med. 1995; 11(4):245-50. View

Littlefair S, Mello-Thoms C, Reed W, Pietryzk M, Lewis S, McEntee M . Increasing Prevalence Expectation in Thoracic Radiology Leads to Overcall. Acad Radiol. 2016; 23(3):284-9. DOI: 10.1016/j.acra.2015.11.007. View

Savova G, Masanz J, Ogren P, Zheng J, Sohn S, Kipper-Schuler K . Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010; 17(5):507-13. PMC: 2995668. DOI: 10.1136/jamia.2009.001560. View

Gould M, Ananth L, Barnett P . A clinical model to estimate the pretest probability of lung cancer in patients with solitary pulmonary nodules. Chest. 2007; 131(2):383-8. PMC: 3008547. DOI: 10.1378/chest.06-1261. View

10.

Garla V, Lo Re 3rd V, Dorey-Stein Z, Kidwai F, Scotch M, Womack J . The Yale cTAKES extensions for document classification: architecture and application. J Am Med Inform Assoc. 2011; 18(5):614-20. PMC: 3168305. DOI: 10.1136/amiajnl-2011-000093. View

11.

Alsamarai S, Yao X, Cain H, Chang B, Chao H, Connery D . The effect of a lung cancer care coordination program on timeliness of care. Clin Lung Cancer. 2013; 14(5):527-34. DOI: 10.1016/j.cllc.2013.04.004. View

12.

Siegel R, Miller K, Jemal A . Cancer statistics, 2016. CA Cancer J Clin. 2016; 66(1):7-30. DOI: 10.3322/caac.21332. View

13.

Danforth K, Early M, Ngan S, Kosco A, Zheng C, Gould M . Automated identification of patients with pulmonary nodules in an integrated health system using administrative health plan data, radiology reports, and natural language processing. J Thorac Oncol. 2012; 7(8):1257-62. PMC: 3443078. DOI: 10.1097/JTO.0b013e31825bd9f5. View

14.

Farjah F, Halgrim S, Buist D, Gould M, Zeliadt S, Loggers E . An Automated Method for Identifying Individuals with a Lung Nodule Can Be Feasibly Implemented Across Health Systems. EGEMS (Wash DC). 2016; 4(1):1254. PMC: 5013935. DOI: 10.13063/2327-9214.1254. View

15.

Gould M, Tang T, Liu I, Lee J, Zheng C, Danforth K . Recent Trends in the Identification of Incidental Pulmonary Nodules. Am J Respir Crit Care Med. 2015; 192(10):1208-14. DOI: 10.1164/rccm.201505-0990OC. View

16.

Hunnibell L, Rose M, Connery D, Grens C, Hampel J, Rosa M . Using nurse navigation to improve timeliness of lung cancer care at a veterans hospital. Clin J Oncol Nurs. 2012; 16(1):29-36. DOI: 10.1188/12.CJON.29-36. View

17.

Dutta S, Long W, Brown D, Reisner A . Automated detection using natural language processing of radiologists recommendations for additional imaging of incidental findings. Ann Emerg Med. 2013; 62(2):162-9. DOI: 10.1016/j.annemergmed.2013.02.001. View