» Articles » PMID: 25862765

Subgraph Augmented Non-negative Tensor Factorization (SANTF) for Modeling Clinical Narrative Text

Overview
Date 2015 Apr 12
PMID 25862765
Citations 26
Authors
Affiliations
Soon will be listed here.
Abstract

Objective: Extracting medical knowledge from electronic medical records requires automated approaches to combat scalability limitations and selection biases. However, existing machine learning approaches are often regarded by clinicians as black boxes. Moreover, training data for these automated approaches at often sparsely annotated at best. The authors target unsupervised learning for modeling clinical narrative text, aiming at improving both accuracy and interpretability.

Methods: The authors introduce a novel framework named subgraph augmented non-negative tensor factorization (SANTF). In addition to relying on atomic features (e.g., words in clinical narrative text), SANTF automatically mines higher-order features (e.g., relations of lymphoid cells expressing antigens) from clinical narrative text by converting sentences into a graph representation and identifying important subgraphs. The authors compose a tensor using patients, higher-order features, and atomic features as its respective modes. We then apply non-negative tensor factorization to cluster patients, and simultaneously identify latent groups of higher-order features that link to patient clusters, as in clinical guidelines where a panel of immunophenotypic features and laboratory results are used to specify diagnostic criteria.

Results And Conclusion: SANTF demonstrated over 10% improvement in averaged F-measure on patient clustering compared to widely used non-negative matrix factorization (NMF) and k-means clustering methods. Multiple baselines were established by modeling patient data using patient-by-features matrices with different feature configurations and then performing NMF or k-means to cluster patients. Feature analysis identified latent groups of higher-order features that lead to medical insights. We also found that the latent groups of atomic features help to better correlate the latent groups of higher-order features.

Citing Articles

Machine Learning for Lung Cancer Diagnosis, Treatment, and Prognosis.

Li Y, Wu X, Yang P, Jiang G, Luo Y Genomics Proteomics Bioinformatics. 2022; 20(5):850-866.

PMID: 36462630 PMC: 10025752. DOI: 10.1016/j.gpb.2022.11.003.


Research and Application of Artificial Intelligence Based on Electronic Health Records of Patients With Cancer: Systematic Review.

Yang X, Mu D, Peng H, Li H, Wang Y, Wang P JMIR Med Inform. 2022; 10(4):e33799.

PMID: 35442195 PMC: 9069295. DOI: 10.2196/33799.


Phenotyping Multiple Organ Dysfunction Syndrome Using Temporal Trends in Critically Ill Children.

Stroup E, Luo Y, Sanchez-Pinto L Proceedings (IEEE Int Conf Bioinformatics Biomed). 2021; 2019:968-972.

PMID: 33842023 PMC: 8030696. DOI: 10.1109/bibm47256.2019.8983126.


Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective.

Luo Y, Szolovits P Proceedings (IEEE Int Conf Bioinformatics Biomed). 2020; 2018:461-466.

PMID: 33376623 PMC: 7769694. DOI: 10.1109/bibm.2018.8621521.


Identifying Breast Cancer Distant Recurrences from Electronic Health Records Using Machine Learning.

Zeng Z, Yao L, Roy A, Li X, Espino S, Clare S J Healthc Inform Res. 2020; 3:283-299.

PMID: 33225204 PMC: 7678240. DOI: 10.1007/s41666-019-00046-3.


References
1.
Yener B, Acar E, Aguis P, Bennett K, Vandenberg S, Plopper G . Multiway modeling and analysis in stem cell systems biology. BMC Syst Biol. 2008; 2:63. PMC: 2527292. DOI: 10.1186/1752-0509-2-63. View

2.
Chapman W, Bridewell W, Hanbury P, Cooper G, Buchanan B . A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform. 2002; 34(5):301-10. DOI: 10.1006/jbin.2001.1029. View

3.
Ozcaglar C, Shabbeer A, Vandenberg S, Yener B, Bennett K . Sublineage structure analysis of Mycobacterium tuberculosis complex strains using multiple-biomarker tensors. BMC Genomics. 2011; 12 Suppl 2:S1. PMC: 3194230. DOI: 10.1186/1471-2164-12-S2-S1. View

4.
Griffiths T, Steyvers M . Finding scientific topics. Proc Natl Acad Sci U S A. 2004; 101 Suppl 1:5228-35. PMC: 387300. DOI: 10.1073/pnas.0307752101. View

5.
Joshi R, Szolovits P . Prognostic physiology: modeling patient severity in Intensive Care Units using radial domain folding. AMIA Annu Symp Proc. 2013; 2012:1276-83. PMC: 3540548. View