» Articles » PMID: 34786303

A Novel COVID-19 Data Set and an Effective Deep Learning Approach for the De-Identification of Italian Medical Records

Overview
Journal IEEE Access
Date 2021 Nov 17
PMID 34786303
Citations 6
Authors
Affiliations
Soon will be listed here.
Abstract

In the last years, the need to de-identify privacy-sensitive information within Electronic Health Records (EHRs) has become increasingly felt and extremely relevant to encourage the sharing and publication of their content in accordance with the restrictions imposed by both national and supranational privacy authorities. In the field of Natural Language Processing (NLP), several deep learning techniques for Named Entity Recognition (NER) have been applied to face this issue, significantly improving the effectiveness in identifying sensitive information in EHRs written in English. However, the lack of data sets in other languages has strongly limited their applicability and performance evaluation. To this aim, a new de-identification data set in Italian has been developed in this work, starting from the 115 COVID-19 EHRs provided by the Italian Society of Radiology (SIRM): 65 were used for training and development, the remaining 50 were used for testing. The data set was labelled following the guidelines of the i2b2 2014 de-identification track. As additional contribution, combined with the best performing Bi-LSTM + CRF sequence labeling architecture, a stacked word representation form, not yet experimented for the Italian clinical de-identification scenario, has been tested, based both on a contextualized linguistic model to manage word polysemy and its morpho-syntactic variations and on sub-word embeddings to better capture latent syntactic and semantic similarities. Finally, other cutting-edge approaches were compared with the proposed model, which achieved the best performance highlighting the goodness of the promoted approach.

Citing Articles

Entity-enhanced BERT for medical specialty prediction based on clinical questionnaire data.

Lee S, Han Y, Park H, Lee B, Son D, Kim S PLoS One. 2025; 20(1):e0317795.

PMID: 39883641 PMC: 11781728. DOI: 10.1371/journal.pone.0317795.


Automated redaction of names in adverse event reports using transformer-based neural networks.

Meldau E, Bista S, Melgarejo-Gonzalez C, Noren G BMC Med Inform Decis Mak. 2024; 24(1):401.

PMID: 39716217 PMC: 11668006. DOI: 10.1186/s12911-024-02785-9.


Automatic de-identification of French electronic health records: a cost-effective approach exploiting distant supervision and deep learning models.

El Azzouzi M, Coatrieux G, Bellafqira R, Delamarre D, Riou C, Oubenali N BMC Med Inform Decis Mak. 2024; 24(1):54.

PMID: 38365677 PMC: 10870625. DOI: 10.1186/s12911-024-02422-5.


AI Assisted Attention Mechanism for Hybrid Neural Model to Assess Online Attitudes About COVID-19.

Kour H, Gupta M Neural Process Lett. 2022; :1-40.

PMID: 36575702 PMC: 9780630. DOI: 10.1007/s11063-022-11112-0.


A Natural Language Processing (NLP) Evaluation on COVID-19 Rumour Dataset Using Deep Learning Techniques.

Fatima R, Samad Shaikh N, Riaz A, Ahmad S, El-Affendi M, Alyamani K Comput Intell Neurosci. 2022; 2022:6561622.

PMID: 36156967 PMC: 9492356. DOI: 10.1155/2022/6561622.


References
1.
Liu Z, Chen Y, Tang B, Wang X, Chen Q, Li H . Automatic de-identification of electronic medical records using token-level and character-level conditional random fields. J Biomed Inform. 2015; 58 Suppl:S47-S52. PMC: 4988843. DOI: 10.1016/j.jbi.2015.06.009. View

2.
He B, Guan Y, Cheng J, Cen K, Hua W . CRFs based de-identification of medical records. J Biomed Inform. 2015; 58 Suppl:S39-S46. PMC: 4988860. DOI: 10.1016/j.jbi.2015.08.012. View

3.
Sweeney L . Replacing personally-identifying information in medical records, the Scrub system. Proc AMIA Annu Fall Symp. 1996; :333-7. PMC: 2233179. View

4.
Wu Y, Yang X, Bian J, Guo Y, Xu H, Hogan W . Combine Factual Medical Knowledge and Distributed Word Representation to Improve Clinical Named Entity Recognition. AMIA Annu Symp Proc. 2019; 2018:1110-1117. PMC: 6371322. View

5.
Beckwith B, Mahaadevan R, Balis U, Kuo F . Development and evaluation of an open source software tool for deidentification of pathology reports. BMC Med Inform Decis Mak. 2006; 6:12. PMC: 1421388. DOI: 10.1186/1472-6947-6-12. View