Natural Language Processing of Serum Protein Electrophoresis Reports in the Veterans Affairs Health Care System
Overview
Authors
Affiliations
Purpose: Serum protein electrophoresis (SPEP) is a clinical tool used to screen for monoclonal gammopathy, thus it is a critical tool in the evaluation of patients with multiple myeloma. However, SPEP laboratory results are usually returned as short text reports, which are not amenable to simple computerized processing for large-scale studies. We applied natural language processing (NLP) to detect monoclonal gammopathy in SPEP laboratory results and compared its performance at multiple hospitals using both a rules-based manual system and a machine-learning algorithm.
Methods: We used the data from the VA Corporate Data Warehouse, which comprises data from 20 million unique individuals. SPEP reports were collected from July to December 2015 at 5 Veterans Affairs Medical Centers. Of these reports, we annotated the presence or absence of monoclonal gammopathy in 300 reports. We applied a machine learning-based NLP and a manual rules-based NLP to detect monoclonal gammopathy in SPEP reports at each of the hospitals, then applied the model from 1 hospital to each of the other hospitals.
Results: The learning system achieved an area under the receiver operating characteristic curve of 0.997, and the rules-based system achieved an accuracy of 0.99. When a model trained on 1 hospital's data was applied to a different hospital, however, accuracy varied greatly, and the learning-based models performed better than the rules-based model.
Conclusion: Binary classification of short clinical texts such as SPEP reports may be a particularly attractive target on which to train highly accurate NLP systems.
Gutierrez-Gonzalez A, Del Hierro I, Cariaga-Martinez A Biology (Basel). 2024; 13(11).
PMID: 39596878 PMC: 11592186. DOI: 10.3390/biology13110923.
The Growing Impact of Natural Language Processing in Healthcare and Public Health.
Jerfy A, Selden O, Balkrishnan R Inquiry. 2024; 61:469580241290095.
PMID: 39396164 PMC: 11475376. DOI: 10.1177/00469580241290095.
Goryachev S, Yildirim C, DuMontier C, La J, Dharne M, Gaziano J JCO Clin Cancer Inform. 2024; 8:e2300197.
PMID: 39038255 PMC: 11371094. DOI: 10.1200/CCI.23.00197.
Machine learning evaluation for identification of M-proteins in human serum.
Sopasakis A, Nilsson M, Askenmo M, Nyholm F, Mattsson Hulten L, Rotter Sopasakis V PLoS One. 2024; 19(4):e0299600.
PMID: 38564628 PMC: 10986985. DOI: 10.1371/journal.pone.0299600.
Bazoge A, Morin E, Daille B, Gourraud P JMIR Med Inform. 2023; 11:e42477.
PMID: 38100200 PMC: 10757232. DOI: 10.2196/42477.