» Articles » PMID: 20595312

Symbolic Rule-based Classification of Lung Cancer Stages from Free-text Pathology Reports

Overview
Date 2010 Jul 3
PMID 20595312
Citations 52
Authors
Affiliations
Soon will be listed here.
Abstract

Objective: To classify automatically lung tumor-node-metastases (TNM) cancer stages from free-text pathology reports using symbolic rule-based classification.

Design: By exploiting report substructure and the symbolic manipulation of systematized nomenclature of medicine-clinical terms (SNOMED CT) concepts in reports, statements in free text can be evaluated for relevance against factors relating to the staging guidelines. Post-coordinated SNOMED CT expressions based on templates were defined and populated by concepts in reports, and tested for subsumption by staging factors. The subsumption results were used to build logic according to the staging guidelines to calculate the TNM stage.

Measurements: The accuracy measure and confusion matrices were used to evaluate the TNM stages classified by the symbolic rule-based system. The system was evaluated against a database of multidisciplinary team staging decisions and a machine learning-based text classification system using support vector machines.

Results: Overall accuracy on a corpus of pathology reports for 718 lung cancer patients against a database of pathological TNM staging decisions were 72%, 78%, and 94% for T, N, and M staging, respectively. The system's performance was also comparable to support vector machine classification approaches.

Conclusion: A system to classify lung TNM stages from free-text pathology reports was developed, and it was verified that the symbolic rule-based approach using SNOMED CT can be used for the extraction of key lung cancer characteristics from free-text reports. Future work will investigate the applicability of using the proposed methodology for extracting other cancer characteristics and types.

Citing Articles

Leveraging Unlabeled Clinical Data to Boost Performance of Risk Stratification Models for Suspected Acute Coronary Syndrome.

Wu Y, Conlan D, Perez S, Nguyen A AMIA Annu Symp Proc. 2024; 2023:744-753.

PMID: 38222439 PMC: 10785873.


Automatic Detection of Distant Metastasis Mentions in Radiology Reports in Spanish.

Ahumada R, Dunstan J, Rojas M, Penafiel S, Paredes I, Baez P JCO Clin Cancer Inform. 2024; 8:e2300130.

PMID: 38194615 PMC: 10793975. DOI: 10.1200/CCI.23.00130.


Identifying Patient Populations in Texts Describing Drug Approvals Through Deep Learning-Based Information Extraction: Development of a Natural Language Processing Algorithm.

Gendrin A, Souliotis L, Loudon-Griffiths J, Aggarwal R, Amoako D, Desouza G JMIR Form Res. 2023; 7:e44876.

PMID: 37347514 PMC: 10337300. DOI: 10.2196/44876.


Determining cancer stage at diagnosis in population-based cancer registries: A rapid scoping review.

Pung L, Moorin R, Trevithick R, Taylor K, Chai K, Garcia Gewerc C Front Health Serv. 2023; 3:1039266.

PMID: 36926511 PMC: 10012750. DOI: 10.3389/frhs.2023.1039266.


Expanding the Secondary Use of Prostate Cancer Real World Data: Automated Classifiers for Clinical and Pathological Stage.

Bozkurt S, Magnani C, Seneviratne M, Brooks J, Hernandez-Boussard T Front Digit Health. 2022; 4:793316.

PMID: 35721793 PMC: 9201076. DOI: 10.3389/fdgth.2022.793316.


References
1.
DAvolio L, Litwin M, Rogers Jr S, Bui A . Facilitating Clinical Outcomes Assessment through the automated identification of quality measures for prostate cancer surgery. J Am Med Inform Assoc. 2008; 15(3):341-8. PMC: 2410000. DOI: 10.1197/jamia.M2649. View

2.
McCowan I, Moore D, Nguyen A, Bowman R, Clarke B, Duhig E . Collection of cancer stage data by classifying free-text medical reports. J Am Med Inform Assoc. 2007; 14(6):736-45. PMC: 2213490. DOI: 10.1197/jamia.M2130. View

3.
Coden A, Savova G, Sominsky I, Tanenblatt M, Masanz J, Schuler K . Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model. J Biomed Inform. 2009; 42(5):937-49. DOI: 10.1016/j.jbi.2008.12.005. View

4.
Chapman W, Bridewell W, Hanbury P, Cooper G, Buchanan B . A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform. 2002; 34(5):301-10. DOI: 10.1006/jbin.2001.1029. View

5.
Threlfall T, Wittorff J, Boutdara P, Heyworth J, Katris P, Sheiner H . Collection of population-based cancer staging information in Western Australia--a feasibility study. Popul Health Metr. 2005; 3:9. PMC: 1232866. DOI: 10.1186/1478-7954-3-9. View