» Articles » PMID: 35308962

Launching into Clinical Space with MedspaCy: a New Clinical Text Processing Toolkit in Python

Overview
Date 2022 Mar 21
PMID 35308962
Authors
Affiliations
Soon will be listed here.
Abstract

Despite impressive success of machine learning algorithms in clinical natural language processing (cNLP), rule-based approaches still have a prominent role. In this paper, we introduce medspaCy, an extensible, open-source cNLP library based on spaCy framework that allows flexible integration of rule-based and machine learning-based algorithms adapted to clinical text. MedspaCy includes a variety of components that meet common cNLP needs such as context analysis and mapping to standard terminologies. By utilizing spaCy's clear and easy-to-use conventions, medspaCy enables development of custom pipelines that integrate easily with other spaCy-based modules. Our toolkit includes several core components and facilitates rapid development of pipelines for clinical text.

Citing Articles

Medical ontology learning framework to investigate daytime impairment in insomnia disorder and treatment effects.

Busser A, Durrer R, Freidank M, Togninalli M, Olivieri A, Grandner M Commun Med (Lond). 2025; 5(1):54.

PMID: 40021822 PMC: 11871003. DOI: 10.1038/s43856-024-00698-2.


Using Structured Codes and Free-Text Notes to Measure Information Complementarity in Electronic Health Records: Feasibility and Validation Study.

Seinen T, Kors J, van Mulligen E, Rijnbeek P J Med Internet Res. 2025; 27:e66910.

PMID: 39946687 PMC: 11887999. DOI: 10.2196/66910.


A foundation systematic review of natural language processing applied to gastroenterology & hepatology.

Stammers M, Ramgopal B, Owusu Nimako A, Vyas A, Nouraei R, Metcalf C BMC Gastroenterol. 2025; 25(1):58.

PMID: 39915703 PMC: 11800601. DOI: 10.1186/s12876-025-03608-5.


Large Language Models Outperform Traditional Natural Language Processing Methods in Extracting Patient-Reported Outcomes in Inflammatory Bowel Disease.

Patel P, Davis C, Ralbovsky A, Tinoco D, Williams C, Slatter S Gastro Hep Adv. 2025; 4(2):100563.

PMID: 39877865 PMC: 11772946. DOI: 10.1016/j.gastha.2024.10.003.


pyDeid: an improved, fast, flexible, and generalizable rule-based approach for deidentification of free-text medical records.

Sundrelingam V, Parimoo S, Pogacar F, Koppula R, Shin S, Pou-Prom C JAMIA Open. 2025; 8(1):ooae152.

PMID: 39845288 PMC: 11752853. DOI: 10.1093/jamiaopen/ooae152.


References
1.
Soysal E, Wang J, Jiang M, Wu Y, Pakhomov S, Liu H . CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines. J Am Med Inform Assoc. 2017; 25(3):331-336. PMC: 7378877. DOI: 10.1093/jamia/ocx132. View

2.
Sun W, Rumshisky A, Uzuner O . Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. J Am Med Inform Assoc. 2013; 20(5):806-13. PMC: 3756273. DOI: 10.1136/amiajnl-2013-001628. View

3.
Digan W, Neveol A, Neuraz A, Wack M, Baudoin D, Burgun A . Can reproducibility be improved in clinical natural language processing? A study of 7 clinical NLP suites. J Am Med Inform Assoc. 2020; 28(3):504-515. PMC: 7936396. DOI: 10.1093/jamia/ocaa261. View

4.
Goldsmith J, Siegal G, Suster S, Wheeler T, Brown R . Reporting guidelines for clinical laboratory reports in surgical pathology. Arch Pathol Lab Med. 2008; 132(10):1608-16. DOI: 10.5858/2008-132-1608-RGFCLR. View

5.
Bodenreider O . The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2003; 32(Database issue):D267-70. PMC: 308795. DOI: 10.1093/nar/gkh061. View