Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study

Overview

Journal JMIR Med Inform

Publisher JMIR Publications

Specialty Medical Informatics

Date 2023 Jan 18

PMID 36652282

Authors

Jonathan Kentley

Jochen Weber

Konstantinos Liopyris

Ralph P Braun

Ashfaq A Marghoob

Elizabeth A Quigley

Kelly Nelson

Kira Prentice

Erik Duhaime

Allan C Halpern

Veronica Rotemberg

Affiliations

Soon will be listed here.

Abstract

Background: Dermoscopy is commonly used for the evaluation of pigmented lesions, but agreement between experts for identification of dermoscopic structures is known to be relatively poor. Expert labeling of medical data is a bottleneck in the development of machine learning (ML) tools, and crowdsourcing has been demonstrated as a cost- and time-efficient method for the annotation of medical images.

Objective: The aim of this study is to demonstrate that crowdsourcing can be used to label basic dermoscopic structures from images of pigmented lesions with similar reliability to a group of experts.

Methods: First, we obtained labels of 248 images of melanocytic lesions with 31 dermoscopic "subfeatures" labeled by 20 dermoscopy experts. These were then collapsed into 6 dermoscopic "superfeatures" based on structural similarity, due to low interrater reliability (IRR): dots, globules, lines, network structures, regression structures, and vessels. These images were then used as the gold standard for the crowd study. The commercial platform DiagnosUs was used to obtain annotations from a nonexpert crowd for the presence or absence of the 6 superfeatures in each of the 248 images. We replicated this methodology with a group of 7 dermatologists to allow direct comparison with the nonexpert crowd. The Cohen κ value was used to measure agreement across raters.

Results: In total, we obtained 139,731 ratings of the 6 dermoscopic superfeatures from the crowd. There was relatively lower agreement for the identification of dots and globules (the median κ values were 0.526 and 0.395, respectively), whereas network structures and vessels showed the highest agreement (the median κ values were 0.581 and 0.798, respectively). This pattern was also seen among the expert raters, who had median κ values of 0.483 and 0.517 for dots and globules, respectively, and 0.758 and 0.790 for network structures and vessels. The median κ values between nonexperts and thresholded average-expert readers were 0.709 for dots, 0.719 for globules, 0.714 for lines, 0.838 for network structures, 0.818 for regression structures, and 0.728 for vessels.

Conclusions: This study confirmed that IRR for different dermoscopic features varied among a group of experts; a similar pattern was observed in a nonexpert crowd. There was good or excellent agreement for each of the 6 superfeatures between the crowd and the experts, highlighting the similar reliability of the crowd for labeling dermoscopic images. This confirms the feasibility and dependability of using crowdsourcing as a scalable solution to annotate large sets of dermoscopic images, with several potential clinical and educational applications, including the development of novel, explainable ML tools.

Citing Articles

Gamified Crowdsourcing as a Novel Approach to Lung Ultrasound Data Set Labeling: Prospective Analysis.

Duggan N, Jin M, Duran Mendicuti M, Hallisey S, Bernier D, Selame L J Med Internet Res. 2024; 26:e51397.

PMID: 38963923 PMC: 11258523. DOI: 10.2196/51397.

Boosting wisdom of the crowd for medical image annotation using training performance and task features.

Hasan E, Duhaime E, Trueblood J Cogn Res Princ Implic. 2024; 9(1):31.

PMID: 38763994 PMC: 11102897. DOI: 10.1186/s41235-024-00558-6.

Training Family Medicine Residents in Dermoscopy Using an e-Learning Course: Pilot Interventional Study.

Friche P, Moulis L, Du Thanh A, Dereure O, Duflos C, Carbonnel F JMIR Form Res. 2024; 8:e56005.

PMID: 38739910 PMC: 11130775. DOI: 10.2196/56005.

Real-time near infrared artificial intelligence using scalable non-expert crowdsourcing in colorectal surgery.

Skinner G, Chen T, Jentis G, Liu Y, McCulloh C, Harzman A NPJ Digit Med. 2024; 7(1):99.

PMID: 38649447 PMC: 11035672. DOI: 10.1038/s41746-024-01095-8.

Crowdsourcing Skin Demarcations of Chronic Graft-Versus-Host Disease in Patient Photographs: Training Versus Performance Study.

McNeil A, Parks K, Liu X, Jiang B, Coco J, McCool K JMIR Dermatol. 2023; 6:e48589.

PMID: 38147369 PMC: 10777279. DOI: 10.2196/48589.

References

King A, Gehl R, Grossman D, Jensen J . Skin self-examinations and visual identification of atypical nevi: comparing individual and crowdsourcing approaches. Cancer Epidemiol. 2013; 37(6):979-84. PMC: 3849386. DOI: 10.1016/j.canep.2013.09.004. View

Candido Dos Reis F, Lynn S, Ali H, Eccles D, Hanby A, Provenzano E . Crowdsourcing the General Public for Large Scale Molecular Pathology Studies in Cancer. EBioMedicine. 2015; 2(7):681-9. PMC: 4534635. DOI: 10.1016/j.ebiom.2015.05.009. View

Braun R, Gaide O, Oliviero M, Kopf A, French L, Saurat J . The significance of multiple blue-grey dots (granularity) for the dermoscopic diagnosis of melanoma. Br J Dermatol. 2007; 157(5):907-13. DOI: 10.1111/j.1365-2133.2007.08145.x. View

Kittler H, Marghoob A, Argenziano G, Carrera C, Curiel-Lewandrowski C, Hofmann-Wellenhof R . Standardization of terminology in dermoscopy/dermatoscopy: Results of the third consensus conference of the International Society of Dermoscopy. J Am Acad Dermatol. 2016; 74(6):1093-106. PMC: 5551974. DOI: 10.1016/j.jaad.2015.12.038. View

Henning J, Stein J, Yeung J, Dusza S, Marghoob A, Rabinovitz H . CASH algorithm for dermoscopy revisited. Arch Dermatol. 2008; 144(4):554-5. DOI: 10.1001/archderm.144.4.554. View

Nachbar F, Stolz W, Merkle T, Cognetta A, Vogt T, Landthaler M . The ABCD rule of dermatoscopy. High prospective value in the diagnosis of doubtful melanocytic skin lesions. J Am Acad Dermatol. 1994; 30(4):551-9. DOI: 10.1016/s0190-9622(94)70061-3. View

Carli P, Quercioli E, Sestini S, Stante M, Ricci L, Brunasso G . Pattern analysis, not simplified algorithms, is the most reliable method for teaching dermoscopy for melanoma diagnosis to residents in dermatology. Br J Dermatol. 2003; 148(5):981-4. DOI: 10.1046/j.1365-2133.2003.05023.x. View

Sim J, Wright C . The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys Ther. 2005; 85(3):257-68. View

Madooei A, Drew M, Sadeghi M, Atkins M . Automatic detection of blue-white veil by discrete colour matching in dermoscopy images. Med Image Comput Comput Assist Interv. 2014; 16(Pt 3):453-60. DOI: 10.1007/978-3-642-40760-4_57. View

10.

Mattessich S, Tassavor M, Swetter S, Grant-Kels J . How I learned to stop worrying and love machine learning. Clin Dermatol. 2018; 36(6):777-778. DOI: 10.1016/j.clindermatol.2018.06.003. View

11.

Celebi M, Iyatomi H, V Stoecker W, Moss R, Rabinovitz H, Argenziano G . Automatic detection of blue-white veil and related structures in dermoscopy images. Comput Med Imaging Graph. 2008; 32(8):670-7. PMC: 3160648. DOI: 10.1016/j.compmedimag.2008.08.003. View

12.

Heim E, Ross T, Seitel A, Marz K, Stieltjes B, Eisenmann M . Large-scale medical image annotation with crowd-powered algorithms. J Med Imaging (Bellingham). 2019; 5(3):034002. PMC: 6129178. DOI: 10.1117/1.JMI.5.3.034002. View

13.

McHugh M . Interrater reliability: the kappa statistic. Biochem Med (Zagreb). 2012; 22(3):276-82. PMC: 3900052. View

14.

van der Wal D, Jhun I, Laklouk I, Nirschl J, Richer L, Rojansky R . Biological data annotation via a human-augmenting AI-based labeling system. NPJ Digit Med. 2021; 4(1):145. PMC: 8497580. DOI: 10.1038/s41746-021-00520-6. View

15.

Fujisawa Y, Inoue S, Nakamura Y . The Possibility of Deep Learning-Based, Computer-Aided Skin Tumor Classifiers. Front Med (Lausanne). 2019; 6:191. PMC: 6719629. DOI: 10.3389/fmed.2019.00191. View

16.

Garcia Arroyo J, Garcia Zapirain B . Detection of pigment network in dermoscopy images using supervised machine learning and structural analysis. Comput Biol Med. 2013; 44:144-57. DOI: 10.1016/j.compbiomed.2013.11.002. View

17.

Thomsen K, Iversen L, Titlestad T, Winther O . Systematic review of machine learning for diagnosis and prognosis in dermatology. J Dermatolog Treat. 2019; 31(5):496-510. DOI: 10.1080/09546634.2019.1682500. View

18.

Rotemberg V, Halpern A, Dusza S, Codella N . The role of public challenges and data sets towards algorithm development, trust, and use in clinical practice. Semin Cutan Med Surg. 2019; 38(1):E38-E42. DOI: 10.12788/j.sder.2019.013. View

19.

Argenziano G, Zalaudek I, Corona R, Sera F, Cicale L, Petrillo G . Vascular structures in skin tumors: a dermoscopy study. Arch Dermatol. 2004; 140(12):1485-9. DOI: 10.1001/archderm.140.12.1485. View

20.

Maurya A, Stanley R, Lama N, Jagannathan S, Saeed D, Swinfard S . A deep learning approach to detect blood vessels in basal cell carcinoma. Skin Res Technol. 2022; 28(4):571-576. PMC: 9907638. DOI: 10.1111/srt.13150. View