Comparison of a Semi-automatic Annotation Tool and a Natural Language Processing Application for the Generation of Clinical Statement Entries

Overview

Journal J Am Med Inform Assoc

Publisher Oxford University Press

Specialty Medical Informatics

Date 2014 Oct 22

PMID 25332357

Citations 3

Authors

Ching-Heng Lin

Nai-Yuan Wu

Wei-Shao Lai

Der-Ming Liou

Affiliations

Soon will be listed here.

Abstract

Background And Objective: Electronic medical records with encoded entries should enhance the semantic interoperability of document exchange. However, it remains a challenge to encode the narrative concept and to transform the coded concepts into a standard entry-level document. This study aimed to use a novel approach for the generation of entry-level interoperable clinical documents.

Methods: Using HL7 clinical document architecture (CDA) as the example, we developed three pipelines to generate entry-level CDA documents. The first approach was a semi-automatic annotation pipeline (SAAP), the second was a natural language processing (NLP) pipeline, and the third merged the above two pipelines. We randomly selected 50 test documents from the i2b2 corpora to evaluate the performance of the three pipelines.

Results: The 50 randomly selected test documents contained 9365 words, including 588 Observation terms and 123 Procedure terms. For the Observation terms, the merged pipeline had a significantly higher F-measure than the NLP pipeline (0.89 vs 0.80, p<0.0001), but a similar F-measure to that of the SAAP (0.89 vs 0.87). For the Procedure terms, the F-measure was not significantly different among the three pipelines.

Conclusions: The combination of a semi-automatic annotation approach and the NLP application seems to be a solution for generating entry-level interoperable clinical documents.

Citing Articles

Designing an openEHR-Based Pipeline for Extracting and Standardizing Unstructured Clinical Data Using Natural Language Processing.

Wulff A, Mast M, Hassler M, Montag S, Marschollek M, Jack T Methods Inf Med. 2020; 59(S 02):e64-e78.

PMID: 33058101 PMC: 7725544. DOI: 10.1055/s-0040-1716403.

Words prediction based on N-gram model for free-text entry in electronic health records.

Yazdani A, Safdari R, Golkar A, Niakan Kalhori S Health Inf Sci Syst. 2019; 7(1):6.

PMID: 30886701 PMC: 6395458. DOI: 10.1007/s13755-019-0065-5.

A computational framework for converting textual clinical diagnostic criteria into the quality data model.

Hong N, Li D, Yu Y, Xiu Q, Liu H, Jiang G J Biomed Inform. 2016; 63:11-21.

PMID: 27444185 PMC: 5077690. DOI: 10.1016/j.jbi.2016.07.016.

References

Aronson A, Bodenreider O, Chang H, Humphrey S, Mork J, Nelson S . The NLM Indexing Initiative. Proc AMIA Symp. 2000; :17-21. PMC: 2243970. View

Friedman C, Shagina L, Lussier Y, Hripcsak G . Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc. 2004; 11(5):392-402. PMC: 516246. DOI: 10.1197/jamia.M1552. View

Savova G, Fan J, Ye Z, Murphy S, Zheng J, Chute C . Discovering peripheral arterial disease cases from radiology notes using natural language processing. AMIA Annu Symp Proc. 2011; 2010:722-6. PMC: 3041293. View

Wu Y, Denny J, Rosenbloom S, Miller R, Giuse D, Xu H . A comparative study of current Clinical Natural Language Processing systems on handling abbreviations in discharge summaries. AMIA Annu Symp Proc. 2013; 2012:997-1003. PMC: 3540461. View

Burgun A, Denier P, Bodenreider O, Botti G, Delamarre D, Pouliquen B . A Web terminology server using UMLS for the description of medical procedures. J Am Med Inform Assoc. 1997; 4(5):356-63. PMC: 61253. DOI: 10.1136/jamia.1997.0040356. View

Meystre S, Haug P . Natural language processing to extract medical problems from electronic clinical documents: performance evaluation. J Biomed Inform. 2005; 39(6):589-99. DOI: 10.1016/j.jbi.2005.11.004. View

Chen C, Hsieh S, Su Y, Hsu K, Lee H, Lai F . Design and implementation of web-based discharge summary note based on service-oriented architecture. J Med Syst. 2010; 36(1):335-45. DOI: 10.1007/s10916-010-9479-y. View

Sevenster M, van Ommering R, Qian Y . Algorithmic and user study of an autocompletion algorithm on a large medical vocabulary. J Biomed Inform. 2011; 45(1):107-19. DOI: 10.1016/j.jbi.2011.09.004. View

Savova G, Masanz J, Ogren P, Zheng J, Sohn S, Kipper-Schuler K . Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010; 17(5):507-13. PMC: 2995668. DOI: 10.1136/jamia.2009.001560. View

10.

Cui L, Bozorgi A, Lhatoo S, Zhang G, Sahoo S . EpiDEA: extracting structured epilepsy and seizure information from patient discharge summaries for cohort identification. AMIA Annu Symp Proc. 2013; 2012:1191-200. PMC: 3540531. View

11.

Kang N, van Mulligen E, Kors J . Comparing and combining chunkers of biomedical text. J Biomed Inform. 2010; 44(2):354-60. DOI: 10.1016/j.jbi.2010.10.005. View

12.

Crowley R, Castine M, Mitchell K, Chavan G, McSherry T, Feldman M . caTIES: a grid based system for coding and retrieval of surgical pathology reports and tissue specimens in support of translational research. J Am Med Inform Assoc. 2010; 17(3):253-64. PMC: 2995710. DOI: 10.1136/jamia.2009.002295. View

13.

Meystre S, Haug P . Automation of a problem list using natural language processing. BMC Med Inform Decis Mak. 2005; 5:30. PMC: 1208893. DOI: 10.1186/1472-6947-5-30. View

14.

Wasserman R . Electronic medical records (EMRs), epidemiology, and epistemology: reflections on EMRs and future pediatric clinical research. Acad Pediatr. 2011; 11(4):280-7. PMC: 3138824. DOI: 10.1016/j.acap.2011.02.007. View

15.

Huang Y, Lowe H, Hersh W . A pilot study of contextual UMLS indexing to improve the precision of concept-based representation in XML-structured clinical radiology reports. J Am Med Inform Assoc. 2003; 10(6):580-7. PMC: 264436. DOI: 10.1197/jamia.M1369. View

16.

Uzuner O, South B, Shen S, DuVall S . 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc. 2011; 18(5):552-6. PMC: 3168320. DOI: 10.1136/amiajnl-2011-000203. View

17.

Cohen A, Hersh W . A survey of current work in biomedical text mining. Brief Bioinform. 2005; 6(1):57-71. DOI: 10.1093/bib/6.1.57. View

18.

Chapman W, Bridewell W, Hanbury P, Cooper G, Buchanan B . A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform. 2002; 34(5):301-10. DOI: 10.1006/jbin.2001.1029. View

19.

Dolin R, Alschuler L, Boyer S, Beebe C, Behlen F, Biron P . HL7 Clinical Document Architecture, Release 2. J Am Med Inform Assoc. 2005; 13(1):30-9. PMC: 1380194. DOI: 10.1197/jamia.M1888. View

20.

Sevenster M, Aleksovski Z . SNOMED CT Saves Keystrokes: Quantifying Semantic Autocompletion. AMIA Annu Symp Proc. 2011; 2010:742-6. PMC: 3041304. View