» Articles » PMID: 38360049

Need for Objective Task-Based Evaluation of Image Segmentation Algorithms for Quantitative PET: A Study with ACRIN 6668/RTOG 0235 Multicenter Clinical Trial Data

Overview
Journal J Nucl Med
Specialty Nuclear Medicine
Date 2024 Feb 15
PMID 38360049
Authors
Affiliations
Soon will be listed here.
Abstract

Reliable performance of PET segmentation algorithms on clinically relevant tasks is required for their clinical translation. However, these algorithms are typically evaluated using figures of merit (FoMs) that are not explicitly designed to correlate with clinical task performance. Such FoMs include the Dice similarity coefficient (DSC), the Jaccard similarity coefficient (JSC), and the Hausdorff distance (HD). The objective of this study was to investigate whether evaluating PET segmentation algorithms using these task-agnostic FoMs yields interpretations consistent with evaluation on clinically relevant quantitative tasks. We conducted a retrospective study to assess the concordance in the evaluation of segmentation algorithms using the DSC, JSC, and HD and on the tasks of estimating the metabolic tumor volume (MTV) and total lesion glycolysis (TLG) of primary tumors from PET images of patients with non-small cell lung cancer. The PET images were collected from the American College of Radiology Imaging Network 6668/Radiation Therapy Oncology Group 0235 multicenter clinical trial data. The study was conducted in 2 contexts: (1) evaluating conventional segmentation algorithms, namely those based on thresholding (SUV40% and SUV50%), boundary detection (Snakes), and stochastic modeling (Markov random field-Gaussian mixture model); (2) evaluating the impact of network depth and loss function on the performance of a state-of-the-art U-net-based segmentation algorithm. Evaluation of conventional segmentation algorithms based on the DSC, JSC, and HD showed that SUV40% significantly outperformed SUV50%. However, SUV40% yielded lower accuracy on the tasks of estimating MTV and TLG, with a 51% and 54% increase, respectively, in the ensemble normalized bias. Similarly, the Markov random field-Gaussian mixture model significantly outperformed Snakes on the basis of the task-agnostic FoMs but yielded a 24% increased bias in estimated MTV. For the U-net-based algorithm, our evaluation showed that although the network depth did not significantly alter the DSC, JSC, and HD values, a deeper network yielded substantially higher accuracy in the estimated MTV and TLG, with a decreased bias of 91% and 87%, respectively. Additionally, whereas there was no significant difference in the DSC, JSC, and HD values for different loss functions, up to a 73% and 58% difference in the bias of the estimated MTV and TLG, respectively, existed. Evaluation of PET segmentation algorithms using task-agnostic FoMs could yield findings discordant with evaluation on clinically relevant quantitative tasks. This study emphasizes the need for objective task-based evaluation of image segmentation algorithms for quantitative PET.

Citing Articles

Landscape of 2D Deep Learning Segmentation Networks Applied to CT Scan from Lung Cancer Patients: A Systematic Review.

Mehrnia S, Safahi Z, Mousavi A, Panahandeh F, Farmani A, Yuan R J Imaging Inform Med. 2025; .

PMID: 40038137 DOI: 10.1007/s10278-025-01458-x.

References
1.
Foster B, Bagci U, Mansoor A, Xu Z, Mollura D . A review on segmentation of positron emission tomography images. Comput Biol Med. 2014; 50:76-96. PMC: 4060809. DOI: 10.1016/j.compbiomed.2014.04.014. View

2.
Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P . The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository. J Digit Imaging. 2013; 26(6):1045-57. PMC: 3824915. DOI: 10.1007/s10278-013-9622-7. View

3.
Pretorius P, Liu J, Kalluri K, Jiang Y, Leppo J, Dahlberg S . Observer studies of image quality of denoising reduced-count cardiac single photon emission computed tomography myocardial perfusion imaging by three-dimensional Gaussian post-reconstruction filtering and deep learning. J Nucl Cardiol. 2023; 30(6):2427-2437. PMC: 11401514. DOI: 10.1007/s12350-023-03295-3. View

4.
Im H, Pak K, Cheon G, Kang K, Kim S, Kim I . Prognostic value of volumetric parameters of (18)F-FDG PET in non-small-cell lung cancer: a meta-analysis. Eur J Nucl Med Mol Imaging. 2014; 42(2):241-51. DOI: 10.1007/s00259-014-2903-7. View

5.
Jha A, Caffo B, Frey E . A no-gold-standard technique for objective assessment of quantitative nuclear-medicine imaging methods. Phys Med Biol. 2016; 61(7):2780-800. PMC: 4921224. DOI: 10.1088/0031-9155/61/7/2780. View