» Articles » PMID: 38197800

Expert-centered Evaluation of Deep Learning Algorithms for Brain Tumor Segmentation

Abstract

Purpose To present results from a literature survey on practices in deep learning segmentation algorithm evaluation and perform a study on expert quality perception of brain tumor segmentation. Materials and Methods A total of 180 articles reporting on brain tumor segmentation algorithms were surveyed for the reported quality evaluation. Additionally, ratings of segmentation quality on a four-point scale were collected from medical professionals for 60 brain tumor segmentation cases. Results Of the surveyed articles, Dice score, sensitivity, and Hausdorff distance were the most popular metrics to report segmentation performance. Notably, only 2.8% of the articles included clinical experts' evaluation of segmentation quality. The experimental results revealed a low interrater agreement (Krippendorff α, 0.34) in experts' segmentation quality perception. Furthermore, the correlations between the ratings and commonly used quantitative quality metrics were low (Kendall tau between Dice score and mean rating, 0.23; Kendall tau between Hausdorff distance and mean rating, 0.51), with large variability among the experts. Conclusion The results demonstrate that quality ratings are prone to variability due to the ambiguity of tumor boundaries and individual perceptual differences, and existing metrics do not capture the clinical perception of segmentation quality. Brain Tumor Segmentation, Deep Learning Algorithms, Glioblastoma, Cancer, Machine Learning Clinical trial registration nos. NCT00756106 and NCT00662506 © RSNA, 2023.

Citing Articles

A review of deep learning for brain tumor analysis in MRI.

Dorfner F, Patel J, Kalpathy-Cramer J, Gerstner E, Bridge C NPJ Precis Oncol. 2025; 9(1):2.

PMID: 39753730 PMC: 11698745. DOI: 10.1038/s41698-024-00789-2.


Automated brain segmentation and volumetry in dementia diagnostics: a narrative review with emphasis on FreeSurfer.

Khadhraoui E, Nickl-Jockschat T, Henkes H, Behme D, Muller S Front Aging Neurosci. 2024; 16:1459652.

PMID: 39291276 PMC: 11405240. DOI: 10.3389/fnagi.2024.1459652.

References
1.
Gorgolewski K, Burns C, Madison C, Clark D, Halchenko Y, Waskom M . Nipype: a flexible, lightweight and extensible neuroimaging data processing framework in python. Front Neuroinform. 2011; 5:13. PMC: 3159964. DOI: 10.3389/fninf.2011.00013. View

2.
Wang G, Li W, Zuluaga M, Pratt R, Patel P, Aertsen M . Interactive Medical Image Segmentation Using Deep Learning With Image-Specific Fine Tuning. IEEE Trans Med Imaging. 2018; 37(7):1562-1573. PMC: 6051485. DOI: 10.1109/TMI.2018.2791721. View

3.
Kalpathy-Cramer J, Campbell J, Erdogmus D, Tian P, Kedarisetti D, Moleta C . Plus Disease in Retinopathy of Prematurity: Improving Diagnosis by Ranking Disease Severity and Using Quantitative Image Analysis. Ophthalmology. 2016; 123(11):2345-2351. PMC: 5077696. DOI: 10.1016/j.ophtha.2016.07.020. View

4.
Batchelor T, Gerstner E, Emblem K, Duda D, Kalpathy-Cramer J, Snuderl M . Improved tumor oxygenation and survival in glioblastoma patients who show increased blood perfusion after cediranib and chemoradiation. Proc Natl Acad Sci U S A. 2013; 110(47):19059-64. PMC: 3839699. DOI: 10.1073/pnas.1318022110. View

5.
Lambert S, Madi M, Sopka S, Lenes A, Stange H, Buszello C . An integrative review on the acceptance of artificial intelligence among healthcare professionals in hospitals. NPJ Digit Med. 2023; 6(1):111. PMC: 10257646. DOI: 10.1038/s41746-023-00852-5. View