» Articles » PMID: 36300566

Identification of Colorectal Cancer Using Structured and Free Text Clinical Data

Overview
Publisher Sage Publications
Date 2022 Oct 27
PMID 36300566
Authors
Affiliations
Soon will be listed here.
Abstract

Colorectal cancer incidence has continually fallen among those 50 years old and over. However, the incidence has increased in those under 50. Even with the recent screening guidelines recommending that screening begins at age 45, nearly half of all early-onset colorectal cancer will be missed. Methods are needed to identify high-risk individuals in this age group for targeted screening. Colorectal cancer studies, as with other clinical studies, have required labor intensive chart review for the identification of those affected and risk factors. Natural language processing and machine learning can be used to automate the process and enable the screening of large numbers of patients. This study developed and compared four machine learning and statistical models: logistic regression, support vector machine, random forest, and deep neural network, in their performance in classifying colorectal cancer patients. Excellent classification performance is achieved with AUCs over 97%.

Citing Articles

A foundation systematic review of natural language processing applied to gastroenterology & hepatology.

Stammers M, Ramgopal B, Owusu Nimako A, Vyas A, Nouraei R, Metcalf C BMC Gastroenterol. 2025; 25(1):58.

PMID: 39915703 PMC: 11800601. DOI: 10.1186/s12876-025-03608-5.


Development and validation of machine learning models for young-onset colorectal cancer risk stratification.

Zhen J, Li J, Liao F, Zhang J, Liu C, Xie H NPJ Precis Oncol. 2024; 8(1):239.

PMID: 39438621 PMC: 11496529. DOI: 10.1038/s41698-024-00719-2.


Artificial intelligence approaches for phenotyping heart failure in U.S. Veterans Health Administration electronic health record.

Shao Y, Zhang S, Raman V, Patel S, Cheng Y, Parulkar A ESC Heart Fail. 2024; 11(5):3155-3166.

PMID: 38873749 PMC: 11424308. DOI: 10.1002/ehf2.14787.


Using natural language processing to analyze unstructured patient-reported outcomes data derived from electronic health records for cancer populations: a systematic review.

Sim J, Huang X, Horan M, Baker J, Huang I Expert Rev Pharmacoecon Outcomes Res. 2024; 24(4):467-475.

PMID: 38383308 PMC: 11001514. DOI: 10.1080/14737167.2024.2322664.