Learning from Multiple Annotators for Medical Image Segmentation

Overview

Journal Pattern Recognit

Publisher Elsevier

Date 2023 Oct 2

PMID 37781685

Authors

Le Zhang

Ryutaro Tanno

Moucheng Xu

Yawen Huang

Kevin Bronik

Chen Jin

Joseph Jacob

Yefeng Zheng

Ling Shao

Olga Ciccarelli

Frederik Barkhof

Daniel C Alexander

Affiliations

Soon will be listed here.

Abstract

Supervised machine learning methods have been widely developed for segmentation tasks in recent years. However, the quality of labels has high impact on the predictive performance of these algorithms. This issue is particularly acute in the medical image domain, where both the cost of annotation and the inter-observer variability are high. Different human experts contribute estimates of the "actual" segmentation labels in a typical label acquisition process, influenced by their personal biases and competency levels. The performance of automatic segmentation algorithms is limited when these noisy labels are used as the expert consensus label. In this work, we use two coupled CNNs to jointly learn, from purely noisy observations alone, the reliability of individual annotators and the expert consensus label distributions. The separation of the two is achieved by maximally describing the annotator's "unreliable behavior" (we call it "maximally unreliable") while achieving high fidelity with the noisy training data. We first create a toy segmentation dataset using MNIST and investigate the properties of the proposed algorithm. We then use three public medical imaging segmentation datasets to demonstrate our method's efficacy, including both simulated (where necessary) and real-world annotations: 1) ISBI2015 (multiple-sclerosis lesions); 2) BraTS (brain tumors); 3) LIDC-IDRI (lung abnormalities). Finally, we create a real-world multiple sclerosis lesion dataset (QSMSC at UCL: Queen Square Multiple Sclerosis Center at UCL, UK) with manual segmentations from 4 different annotators (3 radiologists with different level skills and 1 expert to generate the expert consensus label). In all datasets, our method consistently outperforms competing methods and relevant baselines, especially when the number of annotations is small and the amount of disagreement is large. The studies also reveal that the system is capable of capturing the complicated spatial characteristics of annotators' mistakes.

Citing Articles

Stacking Model-Based Classifiers for Dealing With Multiple Sets of Noisy Labels.

Montani G, Cappozzo A Biom J. 2025; 67(2):e70042.

PMID: 40071867 PMC: 11898607. DOI: 10.1002/bimj.70042.

Evaluation of artificial intelligence-based autosegmentation for a high-performance cone-beam computed tomography imaging system in the pelvic region.

Sluijter J, van de Schoot A, Yaakoubi A, de Jong M, van der Knaap-van Dongen M, Kunnen B Phys Imaging Radiat Oncol. 2025; 33():100687.

PMID: 39802649 PMC: 11721864. DOI: 10.1016/j.phro.2024.100687.

RapidBrachyIVBT: A dosimetry software for patient-specific intravascular brachytherapy dose calculations on optical coherence tomography images.

Rahbaran M, Kalinowski J, DeCunha J, Croce K, Bergmark B, Tsui J Med Phys. 2024; 52(2):1256-1267.

PMID: 39561213 PMC: 11788245. DOI: 10.1002/mp.17525.

Advancing image segmentation with DBO-Otsu: Addressing rubber tree diseases through enhanced threshold techniques.

Xie Z, Wu J, Tang W, Liu Y PLoS One. 2024; 19(3):e0297284.

PMID: 38512907 PMC: 10956860. DOI: 10.1371/journal.pone.0297284.

References

Warfield S, Zou K, Wells W . Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans Med Imaging. 2004; 23(7):903-21. PMC: 1283110. DOI: 10.1109/TMI.2004.828354. View

Winzeck S, Hakim A, McKinley R, Pinto J, Alves V, Silva C . ISLES 2016 and 2017-Benchmarking Ischemic Stroke Lesion Outcome Prediction Based on Multispectral MRI. Front Neurol. 2018; 9:679. PMC: 6146088. DOI: 10.3389/fneur.2018.00679. View

Menze B, Jakab A, Bauer S, Kalpathy-Cramer J, Farahani K, Kirby J . The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans Med Imaging. 2014; 34(10):1993-2024. PMC: 4833122. DOI: 10.1109/TMI.2014.2377694. View

Carass A, Roy S, Jog A, Cuzzocreo J, Magrath E, Gherman A . Longitudinal multiple sclerosis lesion segmentation: Resource and challenge. Neuroimage. 2017; 148:77-102. PMC: 5344762. DOI: 10.1016/j.neuroimage.2016.12.064. View

Goceri E . Diagnosis of skin diseases in the era of deep learning and mobile technology. Comput Biol Med. 2021; 134:104458. DOI: 10.1016/j.compbiomed.2021.104458. View

Commowick O, Istace A, Kain M, Laurent B, Leray F, Simon M . Objective Evaluation of Multiple Sclerosis Lesion Segmentation using a Data Management and Processing Infrastructure. Sci Rep. 2018; 8(1):13650. PMC: 6135867. DOI: 10.1038/s41598-018-31911-7. View

Akhondi-Asl A, Hoyte L, Lockhart M, Warfield S . A logarithmic opinion pool based STAPLE algorithm for the fusion of segmentations with associated reliability weights. IEEE Trans Med Imaging. 2014; 33(10):1997-2009. PMC: 4264575. DOI: 10.1109/TMI.2014.2329603. View

Iglesias J, Sabuncu M, Van Leemput K . A unified framework for cross-modality multi-atlas segmentation of brain MRI. Med Image Anal. 2013; 17(8):1181-91. PMC: 3888218. DOI: 10.1016/j.media.2013.08.001. View

Asman A, Landman B . Robust statistical label fusion through COnsensus Level, Labeler Accuracy, and Truth Estimation (COLLATE). IEEE Trans Med Imaging. 2011; 30(10):1779-94. PMC: 3150602. DOI: 10.1109/TMI.2011.2147795. View

10.

Cardoso M, Leung K, Modat M, Keihaninejad S, Cash D, Barnes J . STEPS: Similarity and Truth Estimation for Propagated Segmentations and its application to hippocampal segmentation and brain parcelation. Med Image Anal. 2013; 17(6):671-84. DOI: 10.1016/j.media.2013.02.006. View

11.

Zhang H, Valcarcel A, Bakshi R, Chu R, Bagnato F, Shinohara R . Multiple Sclerosis Lesion Segmentation with Tiramisu and 2.5D Stacked Slices. Med Image Comput Comput Assist Interv. 2021; 11766:338-346. PMC: 8692167. DOI: 10.1007/978-3-030-32248-9_38. View

12.

Asman A, Landman B . Formulating spatially varying performance in the statistical fusion framework. IEEE Trans Med Imaging. 2012; 31(6):1326-36. PMC: 3368083. DOI: 10.1109/TMI.2012.2190992. View

13.

Armato 3rd S, McLennan G, Bidaut L, McNitt-Gray M, Meyer C, Reeves A . The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): a completed reference database of lung nodules on CT scans. Med Phys. 2011; 38(2):915-31. PMC: 3041807. DOI: 10.1118/1.3528204. View

14.

Asman A, Landman B . Non-local statistical label fusion for multi-atlas segmentation. Med Image Anal. 2012; 17(2):194-208. PMC: 3648421. DOI: 10.1016/j.media.2012.10.002. View

15.

Watadani T, Sakai F, Johkoh T, Noma S, Akira M, Fujimoto K . Interobserver variability in the CT assessment of honeycombing in the lungs. Radiology. 2012; 266(3):936-44. DOI: 10.1148/radiol.12112516. View

16.

Hinton G . Training products of experts by minimizing contrastive divergence. Neural Comput. 2002; 14(8):1771-800. DOI: 10.1162/089976602760128018. View