» Articles » PMID: 36935011

Representing and Utilizing Clinical Textual Data for Real World Studies: An OHDSI Approach

Abstract

Clinical documentation in electronic health records contains crucial narratives and details about patients and their care. Natural language processing (NLP) can unlock the information conveyed in clinical notes and reports, and thus plays a critical role in real-world studies. The NLP Working Group at the Observational Health Data Sciences and Informatics (OHDSI) consortium was established to develop methods and tools to promote the use of textual data and NLP in real-world observational studies. In this paper, we describe a framework for representing and utilizing textual data in real-world evidence generation, including representations of information from clinical text in the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), the workflow and tools that were developed to extract, transform and load (ETL) data from clinical notes into tables in OMOP CDM, as well as current applications and specific use cases of the proposed OHDSI NLP solution at large consortia and individual institutions with English textual data. Challenges faced and lessons learned during the process are also discussed to provide valuable insights for researchers who are planning to implement NLP solutions in real-world studies.

Citing Articles

Automated Integration of AI Results into Radiology Reports Using Common Data Elements.

Mehdiratta G, Duda J, Elahi A, Borthakur A, Chatterjee N, Gee J J Imaging Inform Med. 2025; .

PMID: 39871037 DOI: 10.1007/s10278-025-01414-9.


Generative Artificial Intelligence for Health Technology Assessment: Opportunities, Challenges, and Policy Considerations: An ISPOR Working Group Report.

Fleurence R, Bian J, Wang X, Xu H, Dawoud D, Higashi M Value Health. 2024; 28(2):175-183.

PMID: 39536966 PMC: 11786987. DOI: 10.1016/j.jval.2024.10.3846.


The Growing Impact of Natural Language Processing in Healthcare and Public Health.

Jerfy A, Selden O, Balkrishnan R Inquiry. 2024; 61:469580241290095.

PMID: 39396164 PMC: 11475376. DOI: 10.1177/00469580241290095.


Large Language Models for Social Determinants of Health Information Extraction from Clinical Notes - A Generalizable Approach across Institutions.

Keloth V, Selek S, Chen Q, Gilman C, Fu S, Dang Y medRxiv. 2024; .

PMID: 38826441 PMC: 11142292. DOI: 10.1101/2024.05.21.24307726.


Making causal inferences from transactional data: A narrative review of opportunities and challenges when implementing the target trial framework.

Esteban S, Szmulewicz A J Int Med Res. 2024; 52(3):3000605241241920.

PMID: 38548473 PMC: 10981242. DOI: 10.1177/03000605241241920.


References
1.
Soysal E, Wang J, Jiang M, Wu Y, Pakhomov S, Liu H . CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines. J Am Med Inform Assoc. 2017; 25(3):331-336. PMC: 7378877. DOI: 10.1093/jamia/ocx132. View

2.
Stubbs A, Kotfila C, Uzuner O . Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1. J Biomed Inform. 2015; 58 Suppl:S11-S19. PMC: 4989908. DOI: 10.1016/j.jbi.2015.06.007. View

3.
Digan W, Neveol A, Neuraz A, Wack M, Baudoin D, Burgun A . Can reproducibility be improved in clinical natural language processing? A study of 7 clinical NLP suites. J Am Med Inform Assoc. 2020; 28(3):504-515. PMC: 7936396. DOI: 10.1093/jamia/ocaa261. View

4.
Callahan A, Polony V, Posada J, Banda J, Gombar S, Shah N . ACE: the Advanced Cohort Engine for searching longitudinal patient records. J Am Med Inform Assoc. 2021; 28(7):1468-1479. PMC: 8279796. DOI: 10.1093/jamia/ocab027. View

5.
Liu S, Wen A, Wang L, He H, Fu S, Miller R . An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C). J Am Med Inform Assoc. 2023; 30(12):2036-2040. PMC: 10654844. DOI: 10.1093/jamia/ocad134. View