Inter-observer Variability of Manual Contour Delineation of Structures in CT
Overview
Authors
Affiliations
Purpose: To quantify the inter-observer variability of manual delineation of lesions and organ contours in CT to establish a reference standard for volumetric measurements for clinical decision making and for the evaluation of automatic segmentation algorithms.
Materials And Methods: Eleven radiologists manually delineated 3193 contours of liver tumours (896), lung tumours (1085), kidney contours (434) and brain hematomas (497) on 490 slices of clinical CT scans. A comparative analysis of the delineations was then performed to quantify the inter-observer delineation variability with standard volume metrics and with new group-wise metrics for delineations produced by groups of observers.
Results: The mean volume overlap variability values and ranges (in %) between the delineations of two observers were: liver tumours 17.8 [-5.8,+7.2]%, lung tumours 20.8 [-8.8,+10.2]%, kidney contours 8.8 [-0.8,+1.2]% and brain hematomas 18 [-6.0,+6.0] %. For any two randomly selected observers, the mean delineation volume overlap variability was 5-57%. The mean variability captured by groups of two, three and five observers was 37%, 53% and 72%; eight observers accounted for 75-94% of the total variability. For all cases, 38.5% of the delineation non-agreement was due to parts of the delineation of a single observer disagreeing with the others. No statistical difference was found for the delineation variability between the observers based on their expertise.
Conclusion: The variability in manual delineations for different structures and observers is large and spans a wide range across a variety of structures and pathologies. Two and even three observers may not be sufficient to establish the full range of inter-observer variability.
Key Points: • This study quantifies the inter-observer variability of manual delineation of lesions and organ contours in CT. • The variability of manual delineations between two observers can be significant. Two and even three observers capture only a fraction of the full range of inter-observer variability observed in common practice. • Inter-observer manual delineation variability is necessary to establish a reference standard for radiologist training and evaluation and for the evaluation of automatic segmentation algorithms.
Carmo D, Pezzulo A, Villacreses R, Eisenbeisz M, Anderson R, Dorin S Sci Data. 2025; 12(1):402.
PMID: 40055348 PMC: 11889079. DOI: 10.1038/s41597-025-04709-2.
Automatic future remnant segmentation in liver resection planning.
Messaoudi H, Abbas M, Badic B, Ben Salem D, Belaid A, Conze P Int J Comput Assist Radiol Surg. 2025; .
PMID: 39961898 DOI: 10.1007/s11548-025-03331-2.
Dual-Stage AI Model for Enhanced CT Imaging: Precision Segmentation of Kidney and Tumors.
Karunanayake N, Lu L, Yang H, Geng P, Akin O, Furberg H Tomography. 2025; 11(1).
PMID: 39852683 PMC: 11769543. DOI: 10.3390/tomography11010003.
Chen X, Zhao Y, Baroudi H, El Basha M, Daniel A, Gay S Diagnostics (Basel). 2025; 14(24).
PMID: 39767212 PMC: 11675285. DOI: 10.3390/diagnostics14242851.
Awasthi A, Le N, Deng Z, Agrawal R, Wu C, Van Nguyen H Comput Struct Biotechnol J. 2024; 24:711-723.
PMID: 39660015 PMC: 11629193. DOI: 10.1016/j.csbj.2024.11.012.