Topic Modeling Based Classification of Clinical Reports
Overview
Authors
Affiliations
Electronic health records (EHRs) contain important clinical information about patients. Some of these data are in the form of free text and require preprocessing to be able to used in automated systems. Efficient and effective use of this data could be vital to the speed and quality of health care. As a case study, we analyzed classification of CT imaging reports into binary categories. In addition to regular text classification, we utilized topic modeling of the entire dataset in various ways. Topic modeling of the corpora provides interpretable themes that exist in these reports. Representing reports according to their topic distributions is more compact than bag-of-words representation and can be processed faster than raw text in subsequent automated processes. A binary topic model was also built as an unsupervised classification approach with the assumption that each topic corresponds to a class. And, finally an aggregate topic classifier was built where reports are classified based on a single discriminative topic that is determined from the training dataset. Our proposed topic based classifier system is shown to be competitive with existing text classification techniques and provides a more efficient and interpretable representation.
Han B, Kim J, Choi J Biomed Eng Lett. 2024; 14(1):163-171.
PMID: 38186952 PMC: 10769946. DOI: 10.1007/s13534-023-00322-7.
Chen J, Lalor J, Liu W, Druhl E, Granillo E, Vimalananda V J Med Internet Res. 2019; 21(3):e11990.
PMID: 30855231 PMC: 6431826. DOI: 10.2196/11990.
Biomedical Text Categorization Based on Ensemble Pruning and Optimized Topic Modelling.
Onan A Comput Math Methods Med. 2018; 2018:2497471.
PMID: 30140300 PMC: 6081524. DOI: 10.1155/2018/2497471.
Comment Topic Evolution on a Cancer Institution's Facebook Page.
Tang C, Zhou L, Plasek J, Rozenblum R, Bates D Appl Clin Inform. 2017; 8(3):854-865.
PMID: 28832069 PMC: 6220692. DOI: 10.4338/ACI-2017-04-RA-0055.
An overview of topic modeling and its current applications in bioinformatics.
Liu L, Tang L, Dong W, Yao S, Zhou W Springerplus. 2016; 5(1):1608.
PMID: 27652181 PMC: 5028368. DOI: 10.1186/s40064-016-3252-8.