A Cross-institutional Evaluation on Breast Cancer Phenotyping NLP Algorithms on Electronic Health Records

Overview

Journal Comput Struct Biotechnol J

Specialty Biotechnology

Date 2023 Sep 8

PMID 37680211

Authors

Sicheng Zhou

Nan Wang

Liwei Wang

Ju Sun

Anne Blaes

Hongfang Liu

Rui Zhang

Affiliations

Soon will be listed here.

Abstract

Objective: Transformer-based language models are prevailing in the clinical domain due to their excellent performance on clinical NLP tasks. The generalizability of those models is usually ignored during the model development process. This study evaluated the generalizability of CancerBERT, a Transformer-based clinical NLP model, along with classic machine learning models, i.e., conditional random field (CRF), bi-directional long short-term memory CRF (BiLSTM-CRF), across different clinical institutes through a breast cancer phenotype extraction task.

Materials And Methods: Two clinical corpora of breast cancer patients were collected from the electronic health records from the University of Minnesota (UMN) and Mayo Clinic (MC), and annotated following the same guideline. We developed three types of NLP models (i.e., CRF, BiLSTM-CRF and CancerBERT) to extract cancer phenotypes from clinical texts. We evaluated the generalizability of models on different test sets with different learning strategies (model transfer vs locally trained). The entity coverage score was assessed with their association with the model performances.

Results: We manually annotated 200 and 161 clinical documents at UMN and MC. The corpora of the two institutes were found to have higher similarity between the target entities than the overall corpora. The CancerBERT models obtained the best performances among the independent test sets from two clinical institutes and the permutation test set. The CancerBERT model developed in one institute and further fine-tuned in another institute achieved reasonable performance compared to the model developed on local data (micro-F1: 0.925 vs 0.932).

Conclusions: The results indicate the CancerBERT model has superior learning ability and generalizability among the three types of clinical NLP models for our named entity recognition task. It has the advantage to recognize complex entities, e.g., entities with different labels.

Citing Articles

Health Care Language Models and Their Fine-Tuning for Information Extraction: Scoping Review.

Nunes M, Bone J, Ferreira J, Elvas L JMIR Med Inform. 2024; 12:e60164.

PMID: 39432345 PMC: 11535799. DOI: 10.2196/60164.

A taxonomy for advancing systematic error analysis in multi-site electronic health record-based clinical concept extraction.

Fu S, Wang L, He H, Wen A, Zong N, Kumari A J Am Med Inform Assoc. 2024; 31(7):1493-1502.

PMID: 38742455 PMC: 11187420. DOI: 10.1093/jamia/ocae101.

References

Khambete M, Su W, Garcia J, Badgeley M . Quantification of BERT Diagnosis Generalizability Across Medical Specialties Using Semantic Dataset Distance. AMIA Jt Summits Transl Sci Proc. 2021; 2021:345-354. PMC: 8378651. View

Digan W, Neveol A, Neuraz A, Wack M, Baudoin D, Burgun A . Can reproducibility be improved in clinical natural language processing? A study of 7 clinical NLP suites. J Am Med Inform Assoc. 2020; 28(3):504-515. PMC: 7936396. DOI: 10.1093/jamia/ocaa261. View

Coquet J, Bozkurt S, Kan K, Ferrari M, Blayney D, Brooks J . Comparison of orthogonal NLP methods for clinical phenotyping and assessment of bone scan utilization among prostate cancer patients. J Biomed Inform. 2019; 94:103184. PMC: 6584041. DOI: 10.1016/j.jbi.2019.103184. View

Zhou S, Wang N, Wang L, Liu H, Zhang R . CancerBERT: a cancer domain-specific language model for extracting breast cancer phenotypes from electronic health records. J Am Med Inform Assoc. 2022; 29(7):1208-1216. PMC: 9196678. DOI: 10.1093/jamia/ocac040. View

Armstrong R . When to use the Bonferroni correction. Ophthalmic Physiol Opt. 2014; 34(5):502-8. DOI: 10.1111/opo.12131. View

Kim H . Analysis of variance (ANOVA) comparing means of more than two groups. Restor Dent Endod. 2014; 39(1):74-7. PMC: 3916511. DOI: 10.5395/rde.2014.39.1.74. View

Chapman A, Peterson K, Alba P, DuVall S, Patterson O . Detecting Adverse Drug Events with Rapidly Trained Classification Models. Drug Saf. 2019; 42(1):147-156. PMC: 6373386. DOI: 10.1007/s40264-018-0763-y. View

Halpern Y, Horng S, Choi Y, Sontag D . Electronic medical record phenotyping using the anchor and learn framework. J Am Med Inform Assoc. 2016; 23(4):731-40. PMC: 4926745. DOI: 10.1093/jamia/ocw011. View

Schutte D, Vasilakes J, Bompelli A, Zhou Y, Fiszman M, Xu H . Discovering novel drug-supplement interactions using SuppKG generated from the biomedical literature. J Biomed Inform. 2022; 131:104120. PMC: 9335448. DOI: 10.1016/j.jbi.2022.104120. View

10.

Friedman C, Kra P, Rzhetsky A . Two biomedical sublanguages: a description based on the theories of Zellig Harris. J Biomed Inform. 2003; 35(4):222-35. DOI: 10.1016/s1532-0464(03)00012-1. View

11.

Jauregi Unanue I, Borzeshi E, Piccardi M . Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition. J Biomed Inform. 2017; 76:102-109. DOI: 10.1016/j.jbi.2017.11.007. View

12.

Fan J, Prasad R, Yabut R, Loomis R, Zisook D, Mattison J . Part-of-speech tagging for clinical text: wall or bridge between institutions?. AMIA Annu Symp Proc. 2011; 2011:382-91. PMC: 3243258. View

13.

Liu Z, Tang B, Wang X, Chen Q . De-identification of clinical notes via recurrent neural network and conditional random field. J Biomed Inform. 2017; 75S:S34-S42. PMC: 5705329. DOI: 10.1016/j.jbi.2017.05.023. View

14.

Wen A, Fu S, Moon S, El Wazir M, Rosenbaum A, Kaggal V . Desiderata for delivering NLP to accelerate healthcare AI advancement and a Mayo Clinic NLP-as-a-service implementation. NPJ Digit Med. 2019; 2:130. PMC: 6917754. DOI: 10.1038/s41746-019-0208-8. View

15.

Devine E, Van Eaton E, Zadworny M, Symons R, Devlin A, Yanez D . Automating Electronic Clinical Data Capture for Quality Improvement and Research: The CERTAIN Validation Project of Real World Evidence. EGEMS (Wash DC). 2018; 6(1):8. PMC: 5983060. DOI: 10.5334/egems.211. View

16.

Liu M, Shah A, Jiang M, Peterson N, Dai Q, Aldrich M . A study of transportability of an existing smoking status detection module across institutions. AMIA Annu Symp Proc. 2013; 2012:577-86. PMC: 3540509. View

17.

Botsis T, Hartvigsen G, Chen F, Weng C . Secondary Use of EHR: Data Quality Issues and Informatics Opportunities. Summit Transl Bioinform. 2011; 2010:1-5. PMC: 3041534. View

18.

Xie F, Lee J, Munoz-Plaza C, Hahn E, Chen W . Application of Text Information Extraction System for Real-Time Cancer Case Identification in an Integrated Healthcare Organization. J Pathol Inform. 2018; 8:48. PMC: 5760847. DOI: 10.4103/jpi.jpi_55_17. View

19.

Mehrabi S, Krishnan A, Roch A, Schmidt H, Li D, Kesterson J . Identification of Patients with Family History of Pancreatic Cancer--Investigation of an NLP System Portability. Stud Health Technol Inform. 2015; 216:604-8. PMC: 5863760. View

20.

Yang X, Bian J, Hogan W, Wu Y . Clinical concept extraction using transformers. J Am Med Inform Assoc. 2020; 27(12):1935-1942. PMC: 7727351. DOI: 10.1093/jamia/ocaa189. View