» Articles » PMID: 34423261

A Fast, Resource Efficient, and Reliable Rule-based System for COVID-19 Symptom Identification

Abstract

Objective: With COVID-19, there was a need for a rapidly scalable annotation system that facilitated real-time integration with clinical decision support systems (CDS). Current annotation systems suffer from a high-resource utilization and poor scalability limiting real-world integration with CDS. A potential solution to mitigate these issues is to use the rule-based gazetteer developed at our institution.

Materials And Methods: Performance, resource utilization, and runtime of the rule-based gazetteer were compared with five annotation systems: BioMedICUS, cTAKES, MetaMap, CLAMP, and MedTagger.

Results: This rule-based gazetteer was the fastest, had a low resource footprint, and similar performance for weighted microaverage and macroaverage measures of precision, recall, and f1-score compared to other annotation systems.

Discussion: Opportunities to increase its performance include fine-tuning lexical rules for symptom identification. Additionally, it could run on multiple compute nodes for faster runtime.

Conclusion: This rule-based gazetteer overcame key technical limitations facilitating real-time symptomatology identification for COVID-19 and integration of unstructured data elements into our CDS. It is ideal for large-scale deployment across a wide variety of healthcare settings for surveillance of acute COVID-19 symptoms for integration into prognostic modeling. Such a system is currently being leveraged for monitoring of postacute sequelae of COVID-19 (PASC) progression in COVID-19 survivors. This study conducted the first in-depth analysis and developed a rule-based gazetteer for COVID-19 symptom extraction with the following key features: low processor and memory utilization, faster runtime, and similar weighted microaverage and macroaverage measures for precision, recall, and f1-score compared to industry-standard annotation systems.

Citing Articles

Clinical decision support systems (CDSS) in assistance to COVID-19 diagnosis: A scoping review on types and evaluation methods.

Ameri A, Ameri A, Salmanizadeh F, Bahaadinbeigy K Health Sci Rep. 2024; 7(2):e1919.

PMID: 38384976 PMC: 10879639. DOI: 10.1002/hsr2.1919.


Performance of a Chest Radiograph AI Diagnostic Tool for COVID-19: A Prospective Observational Study.

Sun J, Peng L, Li T, Adila D, Zaiman Z, Melton-Meaux G Radiol Artif Intell. 2022; 4(4):e210217.

PMID: 35923381 PMC: 9344211. DOI: 10.1148/ryai.210217.


PASCLex: A comprehensive post-acute sequelae of COVID-19 (PASC) symptom lexicon derived from electronic health record clinical notes.

Wang L, Foer D, MacPhaul E, Lo Y, Bates D, Zhou L J Biomed Inform. 2021; 125:103951.

PMID: 34785382 PMC: 8590503. DOI: 10.1016/j.jbi.2021.103951.

References
1.
Uzuner O, South B, Shen S, DuVall S . 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc. 2011; 18(5):552-6. PMC: 3168320. DOI: 10.1136/amiajnl-2011-000203. View

2.
Miller T, Avillach P, Mandl K . Experiences implementing scalable, containerized, cloud-based NLP for extracting biobank participant phenotypes at scale. JAMIA Open. 2020; 3(2):185-189. PMC: 7382623. DOI: 10.1093/jamiaopen/ooaa016. View

3.
Wen A, Fu S, Moon S, El Wazir M, Rosenbaum A, Kaggal V . Desiderata for delivering NLP to accelerate healthcare AI advancement and a Mayo Clinic NLP-as-a-service implementation. NPJ Digit Med. 2019; 2:130. PMC: 6917754. DOI: 10.1038/s41746-019-0208-8. View

4.
Vincze V, Szarvas G, Farkas R, Mora G, Csirik J . The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics. 2008; 9 Suppl 11:S9. PMC: 2586758. DOI: 10.1186/1471-2105-9-S11-S9. View

5.
Meystre S, Haug P . Automation of a problem list using natural language processing. BMC Med Inform Decis Mak. 2005; 5:30. PMC: 1208893. DOI: 10.1186/1472-6947-5-30. View