» Articles » PMID: 28298265

Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy

Overview
Publisher JMIR Publications
Date 2017 Mar 17
PMID 28298265
Citations 14
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor's activity for the purposes of quality assurance, safety, and continuing professional development.

Objective: The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors' professional performance in the United Kingdom.

Methods: We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians' colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests.

Results: Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to "popular" (recall=.97), "innovator" (recall=.98), and "respected" (recall=.87) codes and was lower for the "interpersonal" (recall=.80) and "professional" (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as "respected," "professional," and "interpersonal" related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P<.05). Scores did not vary between doctors who were rated as popular or innovative and those who were not rated at all (P>.05).

Conclusions: Machine learning algorithms can classify open-text feedback of doctor performance into multiple themes derived by human raters with high performance. Colleague open-text comments that signal respect, professionalism, and being interpersonal may be key indicators of doctor's performance.

Citing Articles

Leveraging Narrative Feedback in Programmatic Assessment: The Potential of Automated Text Analysis to Support Coaching and Decision-Making in Programmatic Assessment.

Nair B, Moonen-van Loon J, van Lierop M, Govaerts M Adv Med Educ Pract. 2024; 15:671-683.

PMID: 39050116 PMC: 11268569. DOI: 10.2147/AMEP.S465259.


A multiphase study protocol of identifying, and predicting cancer-related symptom clusters: applying a mixed-method design and machine learning algorithms.

Miladinia M, Zarea K, Gheibizadeh M, Jahangiri M, Karimpourian H, Rokhafroz D Front Digit Health. 2024; 6:1290689.

PMID: 38707194 PMC: 11066191. DOI: 10.3389/fdgth.2024.1290689.


Enhanced Surgical Decision-Making Tools in Breast Cancer: Predicting 2-Year Postoperative Physical, Sexual, and Psychosocial Well-Being following Mastectomy and Breast Reconstruction (INSPiRED 004).

Xu C, Pfob A, Mehrara B, Yin P, Nelson J, Pusic A Ann Surg Oncol. 2023; 30(12):7046-7059.

PMID: 37516723 PMC: 10562277. DOI: 10.1245/s10434-023-13971-w.


Changes in Doctor-Patient Relationships in China during COVID-19: A Text Mining Analysis.

Li J, Pang P, Xiao Y, Wong D Int J Environ Res Public Health. 2022; 19(20).

PMID: 36294022 PMC: 9603644. DOI: 10.3390/ijerph192013446.


Co-designing new tools for collecting, analysing and presenting patient experience data in NHS services: working in partnership with patients and carers.

Small N, Ong B, Lewis A, Allen D, Bagshaw N, Nahar P Res Involv Engagem. 2021; 7(1):85.

PMID: 34838128 PMC: 8626979. DOI: 10.1186/s40900-021-00329-3.


References
1.
Hawkins J, Brownstein J, Tuli G, Runels T, Broecker K, Nsoesie E . Measuring patient-perceived quality of care in US hospitals using Twitter. BMJ Qual Saf. 2015; 25(6):404-13. PMC: 4878682. DOI: 10.1136/bmjqs-2015-004309. View

2.
Friedman J, Hastie T, Tibshirani R . Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 2010; 33(1):1-22. PMC: 2929880. View

3.
Gibbons C, Bower P, Lovell K, Valderas J, Skevington S . Electronic Quality of Life Assessment Using Computer-Adaptive Testing. J Med Internet Res. 2016; 18(9):e240. PMC: 5065679. DOI: 10.2196/jmir.6053. View

4.
Greaves F, Ramirez-Cano D, Millett C, Darzi A, Donaldson L . Use of sentiment analysis for capturing patient experience from free-text comments posted online. J Med Internet Res. 2013; 15(11):e239. PMC: 3841376. DOI: 10.2196/jmir.2721. View

5.
Wagland R, Recio-Saucedo A, Simon M, Bracher M, Hunt K, Foster C . Development and testing of a text-mining approach to analyse patients' comments on their experiences of colorectal cancer care. BMJ Qual Saf. 2015; 25(8):604-14. DOI: 10.1136/bmjqs-2015-004063. View