Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies

Overview

Journal JMIR Med Inform

Publisher JMIR Publications

Specialty Medical Informatics

Date 2017 May 11

PMID 28487265

Citations 12

Authors

Shuai Zheng

James J Lu

Nima Ghasemzadeh

Salim S Hayek

Arshed A Quyyumi

Fusheng Wang

Affiliations

Soon will be listed here.

Abstract

Background: Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time.

Objective: Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results.

Methods: A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction.

Results: Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports-each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%.

Conclusions: IDEAL-X adopts a unique online machine learning-based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable.

Citing Articles

Clinical concept annotation with contextual word embedding in active transfer learning environment.

Abbas A, Lee M, Shanavas N, Kovatchev V Digit Health. 2024; 10:20552076241308987.

PMID: 39711738 PMC: 11660282. DOI: 10.1177/20552076241308987.

Exploring the Applicability of Using Natural Language Processing to Support Nationwide Venous Thromboembolism Surveillance: Model Evaluation Study.

Wendelboe A, Saber I, Dvorak J, Adamski A, Feland N, Reyes N JMIR Bioinform Biotechnol. 2023; 3(1):e36877.

PMID: 37206160 PMC: 10193259. DOI: 10.2196/36877.

A Multimodal Transformer: Fusing Clinical Notes with Structured EHR Data for Interpretable In-Hospital Mortality Prediction.

Lyu W, Dong X, Wong R, Zheng S, Abell-Hart K, Wang F AMIA Annu Symp Proc. 2023; 2022:719-728.

PMID: 37128451 PMC: 10148371.

Natural language processing in low back pain and spine diseases: A systematic review.

Bacco L, Russo F, Ambrosio L, DAntoni F, Vollero L, Vadala G Front Surg. 2022; 9:957085.

PMID: 35910476 PMC: 9329654. DOI: 10.3389/fsurg.2022.957085.

Racial differences in venous thromboembolism: A surveillance program in Durham County, North Carolina.

Saber I, Adamski A, Kuchibhatla M, Abe K, Beckman M, Reyes N Res Pract Thromb Haemost. 2022; 6(5):e12769.

PMID: 35873215 PMC: 9301530. DOI: 10.1002/rth2.12769.

References

Aronson A . Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp. 2002; :17-21. PMC: 2243666. View

Jiang M, Chen Y, Liu M, Rosenbloom S, Mani S, Denny J . A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. J Am Med Inform Assoc. 2011; 18(5):601-6. PMC: 3168315. DOI: 10.1136/amiajnl-2011-000163. View

Friedman C, Shagina L, Lussier Y, Hripcsak G . Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc. 2004; 11(5):392-402. PMC: 516246. DOI: 10.1197/jamia.M1552. View

Chen Y, Cao H, Mei Q, Zheng K, Xu H . Applying active learning to supervised word sense disambiguation in MEDLINE. J Am Med Inform Assoc. 2013; 20(5):1001-6. PMC: 3756255. DOI: 10.1136/amiajnl-2012-001244. View

Eapen D, Manocha P, Patel R, Hammadah M, Veledar E, Wassel C . Aggregate risk score based on markers of inflammation, cell stress, and coagulation is an independent predictor of adverse cardiovascular outcomes. J Am Coll Cardiol. 2013; 62(4):329-37. PMC: 4066955. DOI: 10.1016/j.jacc.2013.03.072. View

Gobbel G, Garvin J, Reeves R, Cronin R, Heavirland J, Williams J . Assisted annotation of medical free text using RapTAT. J Am Med Inform Assoc. 2014; 21(5):833-41. PMC: 4147611. DOI: 10.1136/amiajnl-2013-002255. View

Xu H, Stenner S, Doan S, Johnson K, Waitman L, Denny J . MedEx: a medication information extraction system for clinical narratives. J Am Med Inform Assoc. 2010; 17(1):19-24. PMC: 2995636. DOI: 10.1197/jamia.M3378. View

Crowley R, Castine M, Mitchell K, Chavan G, McSherry T, Feldman M . caTIES: a grid based system for coding and retrieval of surgical pathology reports and tissue specimens in support of translational research. J Am Med Inform Assoc. 2010; 17(3):253-64. PMC: 2995710. DOI: 10.1136/jamia.2009.002295. View

Zeng Q, Goryachev S, Weiss S, Sordo M, Murphy S, Lazarus R . Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med Inform Decis Mak. 2006; 6:30. PMC: 1553439. DOI: 10.1186/1472-6947-6-30. View

10.

Huang Y, Lowe H . A novel hybrid approach to automated negation detection in clinical radiology reports. J Am Med Inform Assoc. 2007; 14(3):304-11. PMC: 2244882. DOI: 10.1197/jamia.M2284. View

11.

Savova G, Masanz J, Ogren P, Zheng J, Sohn S, Kipper-Schuler K . Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010; 17(5):507-13. PMC: 2995668. DOI: 10.1136/jamia.2009.001560. View

12.

Chen Y, Carroll R, McPeek Hinz E, Shah A, Eyler A, Denny J . Applying active learning to high-throughput phenotyping algorithms for electronic health records data. J Am Med Inform Assoc. 2013; 20(e2):e253-9. PMC: 3861916. DOI: 10.1136/amiajnl-2013-001945. View