» Articles » PMID: 33795682

Ontology-driven Weak Supervision for Clinical Entity Classification in Electronic Health Records

Overview
Journal Nat Commun
Specialty Biology
Date 2021 Apr 2
PMID 33795682
Citations 19
Authors
Affiliations
Soon will be listed here.
Abstract

In the electronic health record, using clinical notes to identify entities such as disorders and their temporality (e.g. the order of an event relative to a time index) can inform many important analyses. However, creating training data for clinical entity tasks is time consuming and sharing labeled data is challenging due to privacy concerns. The information needs of the COVID-19 pandemic highlight the need for agile methods of training machine learning models for clinical notes. We present Trove, a framework for weakly supervised entity classification using medical ontologies and expert-generated rules. Our approach, unlike hand-labeled notes, is easy to share and modify, while offering performance comparable to learning from manually labeled training data. In this work, we validate our framework on six benchmark tasks and demonstrate Trove's ability to analyze the records of patients visiting the emergency department at Stanford Health Care for COVID-19 presenting symptoms and risk factors.

Citing Articles

Leveraging large language models for knowledge-free weak supervision in clinical natural language processing.

Hsu E, Roberts K Sci Rep. 2025; 15(1):8241.

PMID: 40064991 PMC: 11893743. DOI: 10.1038/s41598-024-68168-2.


Clinical entity augmented retrieval for clinical information extraction.

Lopez I, Swaminathan A, Vedula K, Narayanan S, Haredasht F, Ma S NPJ Digit Med. 2025; 8(1):45.

PMID: 39828800 PMC: 11743751. DOI: 10.1038/s41746-024-01377-1.


Automated real-world data integration improves cancer outcome prediction.

Jee J, Fong C, Pichotta K, Tran T, Luthra A, Waters M Nature. 2024; 636(8043):728-736.

PMID: 39506116 PMC: 11655358. DOI: 10.1038/s41586-024-08167-5.


Automated classification of angle-closure mechanisms based on anterior segment optical coherence tomography images via deep learning.

Zhang Y, Zhang X, Zhang Q, Lv B, Hu M, Lv C Heliyon. 2024; 10(15):e35236.

PMID: 39166052 PMC: 11334645. DOI: 10.1016/j.heliyon.2024.e35236.


Transformer models in biomedicine.

Madan S, Lentzen M, Brandt J, Rueckert D, Hofmann-Apitius M, Frohlich H BMC Med Inform Decis Mak. 2024; 24(1):214.

PMID: 39075407 PMC: 11287876. DOI: 10.1186/s12911-024-02600-5.


References
1.
Ratner A, Bach S, Ehrenberg H, Fries J, Wu S, Re C . Snorkel: Rapid Training Data Creation with Weak Supervision. Proceedings VLDB Endowment. 2018; 11(3):269-282. PMC: 5951191. DOI: 10.14778/3157794.3157797. View

2.
Peterson K, Jiang G, Liu H . A corpus-driven standardization framework for encoding clinical problems with HL7 FHIR. J Biomed Inform. 2020; 110:103541. PMC: 7701983. DOI: 10.1016/j.jbi.2020.103541. View

3.
Si Y, Wang J, Xu H, Roberts K . Enhancing clinical concept extraction with contextual embeddings. J Am Med Inform Assoc. 2019; 26(11):1297-1304. PMC: 6798561. DOI: 10.1093/jamia/ocz096. View

4.
Lin C, Dligach D, Miller T, Bethard S, Savova G . Multilayered temporal modeling for the clinical domain. J Am Med Inform Assoc. 2015; 23(2):387-95. PMC: 5009920. DOI: 10.1093/jamia/ocv113. View

5.
Fu S, Chen D, He H, Liu S, Moon S, Peterson K . Clinical concept extraction: A methodology review. J Biomed Inform. 2020; 109:103526. PMC: 7746475. DOI: 10.1016/j.jbi.2020.103526. View