A Comparison of Rule-based and Machine Learning Approaches for Classifying Patient Portal Messages

Overview

Journal Int J Med Inform

Specialty Medical Informatics

Date 2017 Jul 29

PMID 28750904

Citations 31

Authors

Robert M Cronin

Daniel Fabbri

Joshua C Denny

S Trent Rosenbloom

Gretchen Purcell Jackson

Affiliations

Soon will be listed here.

Abstract

Objective: Secure messaging through patient portals is an increasingly popular way that consumers interact with healthcare providers. The increasing burden of secure messaging can affect clinic staffing and workflows. Manual management of portal messages is costly and time consuming. Automated classification of portal messages could potentially expedite message triage and delivery of care.

Materials And Methods: We developed automated patient portal message classifiers with rule-based and machine learning techniques using bag of words and natural language processing (NLP) approaches. To evaluate classifier performance, we used a gold standard of 3253 portal messages manually categorized using a taxonomy of communication types (i.e., main categories of informational, medical, logistical, social, and other communications, and subcategories including prescriptions, appointments, problems, tests, follow-up, contact information, and acknowledgement). We evaluated our classifiers' accuracies in identifying individual communication types within portal messages with area under the receiver-operator curve (AUC). Portal messages often contain more than one type of communication. To predict all communication types within single messages, we used the Jaccard Index. We extracted the variables of importance for the random forest classifiers.

Results: The best performing approaches to classification for the major communication types were: logistic regression for medical communications (AUC: 0.899); basic (rule-based) for informational communications (AUC: 0.842); and random forests for social communications and logistical communications (AUCs: 0.875 and 0.925, respectively). The best performing classification approach of classifiers for individual communication subtypes was random forests for Logistical-Contact Information (AUC: 0.963). The Jaccard Indices by approach were: basic classifier, Jaccard Index: 0.674; Naïve Bayes, Jaccard Index: 0.799; random forests, Jaccard Index: 0.859; and logistic regression, Jaccard Index: 0.861. For medical communications, the most predictive variables were NLP concepts (e.g., Temporal_Concept, which maps to 'morning', 'evening' and Idea_or_Concept which maps to 'appointment' and 'refill'). For logistical communications, the most predictive variables contained similar numbers of NLP variables and words (e.g., Telephone mapping to 'phone', 'insurance'). For social and informational communications, the most predictive variables were words (e.g., social: 'thanks', 'much', informational: 'question', 'mean').

Conclusions: This study applies automated classification methods to the content of patient portal messages and evaluates the application of NLP techniques on consumer communications in patient portal messages. We demonstrated that random forest and logistic regression approaches accurately classified the content of portal messages, although the best approach to classification varied by communication type. Words were the most predictive variables for classification of most communication types, although NLP variables were most predictive for medical communication types. As adoption of patient portals increases, automated techniques could assist in understanding and managing growing volumes of messages. Further work is needed to improve classification performance to potentially support message triage and answering.

Citing Articles

Natural language processing to evaluate texting conversations between patients and healthcare providers during COVID-19 Home-Based Care in Rwanda at scale.

Lester R, Manson M, Semakula M, Jang H, Mugabo H, Magzari A PLOS Digit Health. 2025; 4(1):e0000625.

PMID: 39813181 PMC: 11734906. DOI: 10.1371/journal.pdig.0000625.

Leveraging GPT-4 for identifying cancer phenotypes in electronic health records: a performance comparison between GPT-4, GPT-3.5-turbo, Flan-T5, Llama-3-8B, and spaCy's rule-based and machine learning-based methods.

Bhattarai K, Oh I, Sierra J, Tang J, Payne P, Abrams Z JAMIA Open. 2024; 7(3):ooae060.

PMID: 38962662 PMC: 11221943. DOI: 10.1093/jamiaopen/ooae060.

Automatic uncovering of patient primary concerns in portal messages using a fusion framework of pretrained language models.

Ren Y, Wu Y, Fan J, Khurana A, Fu S, Wu D J Am Med Inform Assoc. 2024; 31(8):1714-1724.

PMID: 38934289 PMC: 11258404. DOI: 10.1093/jamia/ocae144.

Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review.

Sim J, Huang X, Horan M, Stewart C, Robison L, Hudson M Artif Intell Med. 2023; 146:102701.

PMID: 38042599 PMC: 10693655. DOI: 10.1016/j.artmed.2023.102701.

Patient portal interventions: a scoping review of functionality, automation used, and therapeutic elements of patient portal interventions.

Gleason K, Powell D, Wec A, Zou X, Gamper M, Peereboom D JAMIA Open. 2023; 6(3):ooad077.

PMID: 37663406 PMC: 10469545. DOI: 10.1093/jamiaopen/ooad077.

References

Calabretta N . Consumer-driven, patient-centered health care in the age of electronic information. J Med Libr Assoc. 2002; 90(1):32-7. PMC: 64755. View

Hobbs J, Wald J, Jagannath Y, Kittler A, Pizziferri L, Volk L . Opportunities to enhance patient and physician e-mail contact. Int J Med Inform. 2003; 70(1):1-9. DOI: 10.1016/s1386-5056(03)00007-8. View

Purcell G . Surgical textbooks: past, present, and future. Ann Surg. 2004; 238(6 Suppl):S34-41. DOI: 10.1097/01.sla.0000097525.33229.20. View

Denny J, Irani P, Wehbe F, Smithers J, Spickard 3rd A . The KnowledgeMap project: development of a concept-based medical school curriculum database. AMIA Annu Symp Proc. 2004; :195-9. PMC: 1480333. View

White C, Moyer C, Stern D, Katz S . A content analysis of e-mail communication between patients and their providers: patients get the message. J Am Med Inform Assoc. 2004; 11(4):260-7. PMC: 436072. DOI: 10.1197/jamia.M1445. View

Walters S, Wright J, Shegog R . A review of computer and Internet-based interventions for smoking behavior. Addict Behav. 2005; 31(2):264-77. DOI: 10.1016/j.addbeh.2005.05.002. View

Tang P, Lansky D . The missing link: bridging the patient-provider health information gap. Health Aff (Millwood). 2005; 24(5):1290-5. DOI: 10.1377/hlthaff.24.5.1290. View

Denny J, Smithers J, Armstrong B, Spickard 3rd A . "Where do we teach what?" Finding broad concepts in the medical school curriculum. J Gen Intern Med. 2005; 20(10):943-6. PMC: 1490241. DOI: 10.1111/j.1525-1497.2005.0203.x. View

Jackson C, Bolen S, Brancati F, Batts-Turner M, Gary T . A systematic review of interactive computer-assisted technology in diabetes care. Interactive information technology in diabetes care. J Gen Intern Med. 2006; 21(2):105-10. PMC: 1484664. DOI: 10.1111/j.1525-1497.2005.00310.x. View

10.

Denny J, Spickard 3rd A, Miller R, Schildcrout J, Darbar D, Rosenbloom S . Identifying UMLS concepts from ECG Impressions using KnowledgeMap. AMIA Annu Symp Proc. 2006; :196-200. PMC: 1479847. View

11.

Koonce T, Giuse D, Beauregard J, Giuse N . Toward a more informed patient: bridging health care information through an interactive communication portal. J Med Libr Assoc. 2007; 95(1):77-81. PMC: 1773042. View

12.

Vandelanotte C, Spathonis K, Eakin E, Owen N . Website-delivered physical activity interventions a review of the literature. Am J Prev Med. 2007; 33(1):54-64. DOI: 10.1016/j.amepre.2007.02.041. View

13.

Bussey-Smith K, Rossen R . A systematic review of randomized control trials evaluating the effectiveness of interactive computerized asthma patient education programs. Ann Allergy Asthma Immunol. 2007; 98(6):507-16. DOI: 10.1016/S1081-1206(10)60727-2. View

14.

Denny J, Peterson J . Identifying QT prolongation from ECG impressions using natural language processing and negation detection. Stud Health Technol Inform. 2007; 129(Pt 2):1283-8. View

15.

Denny J, Miller R, Waitman L, Arrieta M, Peterson J . Identifying QT prolongation from ECG impressions using a general-purpose Natural Language Processor. Int J Med Inform. 2008; 78 Suppl 1:S34-42. PMC: 2728459. DOI: 10.1016/j.ijmedinf.2008.09.001. View

16.

Neve M, Morgan P, Jones P, Collins C . Effectiveness of web-based interventions in achieving weight loss and weight loss maintenance in overweight and obese adults: a systematic review with meta-analysis. Obes Rev. 2009; 11(4):306-21. DOI: 10.1111/j.1467-789X.2009.00646.x. View

17.

Denny J, Peterson J, Choma N, Xu H, Miller R, Bastarache L . Development of a natural language processing system to identify timing and status of colonoscopy testing in electronic medical records. AMIA Annu Symp Proc. 2010; 2009:141. PMC: 2815478. View

18.

Dixon R . Enhancing primary care through online communication. Health Aff (Millwood). 2010; 29(7):1364-9. DOI: 10.1377/hlthaff.2010.0110. View

19.

Osborn C, Rosenbloom S, Stenner S, Anders S, Muse S, Johnson K . MyHealthAtVanderbilt: policies and procedures governing patient portal functionality. J Am Med Inform Assoc. 2011; 18 Suppl 1:i18-23. PMC: 3241162. DOI: 10.1136/amiajnl-2011-000184. View

20.

Sowa J, Heider D, Bechmann L, Gerken G, Hoffmann D, Canbay A . Novel algorithm for non-invasive assessment of fibrosis in NAFLD. PLoS One. 2013; 8(4):e62439. PMC: 3640062. DOI: 10.1371/journal.pone.0062439. View