» Articles » PMID: 28748227

Unsupervised Extraction of Diagnosis Codes from EMRs Using Knowledge-Based and Extractive Text Summarization Techniques

Overview
Publisher Springer
Date 2017 Jul 28
PMID 28748227
Citations 5
Authors
Affiliations
Soon will be listed here.
Abstract

Diagnosis codes are extracted from medical records for billing and reimbursement and for secondary uses such as quality control and cohort identification. In the US, these codes come from the standard terminology ICD-9-CM derived from the international classification of diseases (ICD). ICD-9 codes are generally extracted by trained human coders by reading all artifacts available in a patient's medical record following specific coding guidelines. To assist coders in this manual process, this paper proposes an unsupervised ensemble approach to automatically extract ICD-9 diagnosis codes from textual narratives included in electronic medical records (EMRs). Earlier attempts on automatic extraction focused on individual documents such as radiology reports and discharge summaries. Here we use a more realistic dataset and extract ICD-9 codes from EMRs of 1000 inpatient visits at the University of Kentucky Medical Center. Using named entity recognition (NER), graph-based concept-mapping of medical concepts, and extractive text summarization techniques, we achieve an example based average recall of 0.42 with average precision 0.47; compared with a baseline of using only NER, we notice a 12% improvement in recall with the graph-based approach and a 7% improvement in precision using the extractive text summarization approach. Although diagnosis codes are complex concepts often expressed in text with significant long range non-local dependencies, our present work shows the potential of unsupervised methods in extracting a portion of codes. As such, our findings are especially relevant for code extraction tasks where obtaining large amounts of training data is difficult.

Citing Articles

Automated ICD coding via unsupervised knowledge integration (UNITE).

Sonabend W A, Cai W, Ahuja Y, Ananthakrishnan A, Xia Z, Yu S Int J Med Inform. 2020; 139:104135.

PMID: 32361145 PMC: 9410729. DOI: 10.1016/j.ijmedinf.2020.104135.


Constructing a knowledge-based heterogeneous information graph for medical health status classification.

Pham T, Tao X, Zhang J, Yong J Health Inf Sci Syst. 2020; 8(1):10.

PMID: 32117570 PMC: 7021844. DOI: 10.1007/s13755-020-0100-6.


A hierarchical method to automatically encode Chinese diagnoses through semantic similarity estimation.

Ning W, Yu M, Zhang R BMC Med Inform Decis Mak. 2016; 16:30.

PMID: 26940992 PMC: 4778321. DOI: 10.1186/s12911-016-0269-4.


Application of clinical text data for phenome-wide association studies (PheWASs).

Hebbring S, Rastegar-Mojarad M, Ye Z, Mayer J, Jacobson C, Lin S Bioinformatics. 2015; 31(12):1981-7.

PMID: 25657332 PMC: 4481696. DOI: 10.1093/bioinformatics/btv076.


Applying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis.

Hanauer D, Saeed M, Zheng K, Mei Q, Shedden K, Aronson A J Am Med Inform Assoc. 2014; 21(5):925-37.

PMID: 24928177 PMC: 4147617. DOI: 10.1136/amiajnl-2014-002767.

References
1.
Gundersen M, Haug P, Pryor T, Van Bree R, Koehler S, Bauer K . Development and evaluation of a computerized admission diagnoses encoding system. Comput Biomed Res. 1996; 29(5):351-72. DOI: 10.1006/cbmr.1996.0026. View

2.
Bodenreider O, Nelson S, Hole W, Chang H . Beyond synonymy: exploiting the UMLS semantics in mapping vocabularies. Proc AMIA Symp. 1999; :815-9. PMC: 2232139. View

3.
Aronson A, Lang F . An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc. 2010; 17(3):229-36. PMC: 2995713. DOI: 10.1136/jamia.2009.002733. View

4.
Goldstein I, Arzrumtsyan A, Uzuner O . Three approaches to automatic assignment of ICD-9-CM codes to radiology reports. AMIA Annu Symp Proc. 2008; :279-83. PMC: 2655861. View

5.
Farkas R, Szarvas G . Automatic construction of rule-based ICD-9-CM coding systems. BMC Bioinformatics. 2008; 9 Suppl 3:S10. PMC: 2352868. DOI: 10.1186/1471-2105-9-S3-S10. View