Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy

Overview

Journal J Med Internet Res

Publisher JMIR Publications

Specialty Medical Informatics

Date 2017 Mar 17

PMID 28298265

Citations 14

Authors

Chris Gibbons

Suzanne Richards

Jose Maria Valderas

John Campbell

Affiliations

Soon will be listed here.

Abstract

Background: Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor's activity for the purposes of quality assurance, safety, and continuing professional development.

Objective: The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors' professional performance in the United Kingdom.

Methods: We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians' colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests.

Results: Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to "popular" (recall=.97), "innovator" (recall=.98), and "respected" (recall=.87) codes and was lower for the "interpersonal" (recall=.80) and "professional" (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as "respected," "professional," and "interpersonal" related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P<.05). Scores did not vary between doctors who were rated as popular or innovative and those who were not rated at all (P>.05).

Conclusions: Machine learning algorithms can classify open-text feedback of doctor performance into multiple themes derived by human raters with high performance. Colleague open-text comments that signal respect, professionalism, and being interpersonal may be key indicators of doctor's performance.

Citing Articles

Leveraging Narrative Feedback in Programmatic Assessment: The Potential of Automated Text Analysis to Support Coaching and Decision-Making in Programmatic Assessment.

Nair B, Moonen-van Loon J, van Lierop M, Govaerts M Adv Med Educ Pract. 2024; 15:671-683.

PMID: 39050116 PMC: 11268569. DOI: 10.2147/AMEP.S465259.

A multiphase study protocol of identifying, and predicting cancer-related symptom clusters: applying a mixed-method design and machine learning algorithms.

Miladinia M, Zarea K, Gheibizadeh M, Jahangiri M, Karimpourian H, Rokhafroz D Front Digit Health. 2024; 6:1290689.

PMID: 38707194 PMC: 11066191. DOI: 10.3389/fdgth.2024.1290689.

Enhanced Surgical Decision-Making Tools in Breast Cancer: Predicting 2-Year Postoperative Physical, Sexual, and Psychosocial Well-Being following Mastectomy and Breast Reconstruction (INSPiRED 004).

Xu C, Pfob A, Mehrara B, Yin P, Nelson J, Pusic A Ann Surg Oncol. 2023; 30(12):7046-7059.

PMID: 37516723 PMC: 10562277. DOI: 10.1245/s10434-023-13971-w.

Changes in Doctor-Patient Relationships in China during COVID-19: A Text Mining Analysis.

Li J, Pang P, Xiao Y, Wong D Int J Environ Res Public Health. 2022; 19(20).

PMID: 36294022 PMC: 9603644. DOI: 10.3390/ijerph192013446.

Co-designing new tools for collecting, analysing and presenting patient experience data in NHS services: working in partnership with patients and carers.

Small N, Ong B, Lewis A, Allen D, Bagshaw N, Nahar P Res Involv Engagem. 2021; 7(1):85.

PMID: 34838128 PMC: 8626979. DOI: 10.1186/s40900-021-00329-3.

References

Hawkins J, Brownstein J, Tuli G, Runels T, Broecker K, Nsoesie E . Measuring patient-perceived quality of care in US hospitals using Twitter. BMJ Qual Saf. 2015; 25(6):404-13. PMC: 4878682. DOI: 10.1136/bmjqs-2015-004309. View

Friedman J, Hastie T, Tibshirani R . Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 2010; 33(1):1-22. PMC: 2929880. View

Gibbons C, Bower P, Lovell K, Valderas J, Skevington S . Electronic Quality of Life Assessment Using Computer-Adaptive Testing. J Med Internet Res. 2016; 18(9):e240. PMC: 5065679. DOI: 10.2196/jmir.6053. View

Greaves F, Ramirez-Cano D, Millett C, Darzi A, Donaldson L . Use of sentiment analysis for capturing patient experience from free-text comments posted online. J Med Internet Res. 2013; 15(11):e239. PMC: 3841376. DOI: 10.2196/jmir.2721. View

Wagland R, Recio-Saucedo A, Simon M, Bracher M, Hunt K, Foster C . Development and testing of a text-mining approach to analyse patients' comments on their experiences of colorectal cancer care. BMJ Qual Saf. 2015; 25(8):604-14. DOI: 10.1136/bmjqs-2015-004063. View

Reeve B, Hays R, Bjorner J, Cook K, Crane P, Teresi J . Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Med Care. 2007; 45(5 Suppl 1):S22-31. DOI: 10.1097/01.mlr.0000250483.85507.04. View

Ong M, Magrabi F, Coiera E . Automated identification of extreme-risk events in clinical incident reports. J Am Med Inform Assoc. 2012; 19(e1):e110-8. PMC: 3392867. DOI: 10.1136/amiajnl-2011-000562. View

Kosinski M, Stillwell D, Graepel T . Private traits and attributes are predictable from digital records of human behavior. Proc Natl Acad Sci U S A. 2013; 110(15):5802-5. PMC: 3625324. DOI: 10.1073/pnas.1218772110. View

Ainsworth J, Buchan I . Combining Health Data Uses to Ignite Health System Learning. Methods Inf Med. 2015; 54(6):479-87. DOI: 10.3414/ME15-01-0064. View

10.

Campbell J, Richards S, Dickens A, Greco M, Narayanan A, Brearley S . Assessing the professional performance of UK doctors: an evaluation of the utility of the General Medical Council patient and colleague questionnaires. Qual Saf Health Care. 2008; 17(3):187-93. DOI: 10.1136/qshc.2007.024679. View

11.

Corner J, Wagland R, Glaser A, Richards S . Qualitative analysis of patients' feedback from a PROMs survey of cancer patients in England. BMJ Open. 2013; 3(4). PMC: 3641435. DOI: 10.1136/bmjopen-2012-002316. View

12.

Ong M, Magrabi F, Coiera E . Automated categorisation of clinical incident reports using statistical text classification. Qual Saf Health Care. 2010; 19(6):e55. DOI: 10.1136/qshc.2009.036657. View

13.

Birbeck G, Kim S, Hays R, Vickrey B . Quality of life measures in epilepsy: how well can they detect change over time?. Neurology. 2000; 54(9):1822-7. DOI: 10.1212/wnl.54.9.1822. View

14.

Campbell J, Roberts M, Wright C, Hill J, Greco M, Taylor M . Factors associated with variability in the assessment of UK doctors' professionalism: analysis of survey results. BMJ. 2011; 343:d6212. PMC: 3203200. DOI: 10.1136/bmj.d6212. View

15.

Richards S, Campbell J, Walshaw E, Dickens A, Greco M . A multi-method analysis of free-text comments from the UK General Medical Council Colleague Questionnaires. Med Educ. 2009; 43(8):757-66. DOI: 10.1111/j.1365-2923.2009.03416.x. View

16.

Mnih V, Kavukcuoglu K, Silver D, Rusu A, Veness J, Bellemare M . Human-level control through deep reinforcement learning. Nature. 2015; 518(7540):529-33. DOI: 10.1038/nature14236. View

17.

Youyou W, Kosinski M, Stillwell D . Computer-based personality judgments are more accurate than those made by humans. Proc Natl Acad Sci U S A. 2015; 112(4):1036-40. PMC: 4313801. DOI: 10.1073/pnas.1418680112. View

18.

Bee P, Gibbons C, Callaghan P, Fraser C, Lovell K . Evaluating and Quantifying User and Carer Involvement in Mental Health Care Planning (EQUIP): Co-Development of a New Patient-Reported Outcome Measure. PLoS One. 2016; 11(3):e0149973. PMC: 4786101. DOI: 10.1371/journal.pone.0149973. View