» Articles » PMID: 28487265

Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies

Overview
Journal JMIR Med Inform
Publisher JMIR Publications
Date 2017 May 11
PMID 28487265
Citations 12
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time.

Objective: Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results.

Methods: A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction.

Results: Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports-each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%.

Conclusions: IDEAL-X adopts a unique online machine learning-based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable.

Citing Articles

Clinical concept annotation with contextual word embedding in active transfer learning environment.

Abbas A, Lee M, Shanavas N, Kovatchev V Digit Health. 2024; 10:20552076241308987.

PMID: 39711738 PMC: 11660282. DOI: 10.1177/20552076241308987.


Exploring the Applicability of Using Natural Language Processing to Support Nationwide Venous Thromboembolism Surveillance: Model Evaluation Study.

Wendelboe A, Saber I, Dvorak J, Adamski A, Feland N, Reyes N JMIR Bioinform Biotechnol. 2023; 3(1):e36877.

PMID: 37206160 PMC: 10193259. DOI: 10.2196/36877.


A Multimodal Transformer: Fusing Clinical Notes with Structured EHR Data for Interpretable In-Hospital Mortality Prediction.

Lyu W, Dong X, Wong R, Zheng S, Abell-Hart K, Wang F AMIA Annu Symp Proc. 2023; 2022:719-728.

PMID: 37128451 PMC: 10148371.


Natural language processing in low back pain and spine diseases: A systematic review.

Bacco L, Russo F, Ambrosio L, DAntoni F, Vollero L, Vadala G Front Surg. 2022; 9:957085.

PMID: 35910476 PMC: 9329654. DOI: 10.3389/fsurg.2022.957085.


Racial differences in venous thromboembolism: A surveillance program in Durham County, North Carolina.

Saber I, Adamski A, Kuchibhatla M, Abe K, Beckman M, Reyes N Res Pract Thromb Haemost. 2022; 6(5):e12769.

PMID: 35873215 PMC: 9301530. DOI: 10.1002/rth2.12769.


References
1.
Aronson A . Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp. 2002; :17-21. PMC: 2243666. View

2.
Jiang M, Chen Y, Liu M, Rosenbloom S, Mani S, Denny J . A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. J Am Med Inform Assoc. 2011; 18(5):601-6. PMC: 3168315. DOI: 10.1136/amiajnl-2011-000163. View

3.
Friedman C, Shagina L, Lussier Y, Hripcsak G . Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc. 2004; 11(5):392-402. PMC: 516246. DOI: 10.1197/jamia.M1552. View

4.
Chen Y, Cao H, Mei Q, Zheng K, Xu H . Applying active learning to supervised word sense disambiguation in MEDLINE. J Am Med Inform Assoc. 2013; 20(5):1001-6. PMC: 3756255. DOI: 10.1136/amiajnl-2012-001244. View

5.
Eapen D, Manocha P, Patel R, Hammadah M, Veledar E, Wassel C . Aggregate risk score based on markers of inflammation, cell stress, and coagulation is an independent predictor of adverse cardiovascular outcomes. J Am Coll Cardiol. 2013; 62(4):329-37. PMC: 4066955. DOI: 10.1016/j.jacc.2013.03.072. View