» Articles » PMID: 30194472

Inter-observer Variability of Manual Contour Delineation of Structures in CT

Overview
Journal Eur Radiol
Specialty Radiology
Date 2018 Sep 9
PMID 30194472
Citations 56
Authors
Affiliations
Soon will be listed here.
Abstract

Purpose: To quantify the inter-observer variability of manual delineation of lesions and organ contours in CT to establish a reference standard for volumetric measurements for clinical decision making and for the evaluation of automatic segmentation algorithms.

Materials And Methods: Eleven radiologists manually delineated 3193 contours of liver tumours (896), lung tumours (1085), kidney contours (434) and brain hematomas (497) on 490 slices of clinical CT scans. A comparative analysis of the delineations was then performed to quantify the inter-observer delineation variability with standard volume metrics and with new group-wise metrics for delineations produced by groups of observers.

Results: The mean volume overlap variability values and ranges (in %) between the delineations of two observers were: liver tumours 17.8 [-5.8,+7.2]%, lung tumours 20.8 [-8.8,+10.2]%, kidney contours 8.8 [-0.8,+1.2]% and brain hematomas 18 [-6.0,+6.0] %. For any two randomly selected observers, the mean delineation volume overlap variability was 5-57%. The mean variability captured by groups of two, three and five observers was 37%, 53% and 72%; eight observers accounted for 75-94% of the total variability. For all cases, 38.5% of the delineation non-agreement was due to parts of the delineation of a single observer disagreeing with the others. No statistical difference was found for the delineation variability between the observers based on their expertise.

Conclusion: The variability in manual delineations for different structures and observers is large and spans a wide range across a variety of structures and pathologies. Two and even three observers may not be sufficient to establish the full range of inter-observer variability.

Key Points: • This study quantifies the inter-observer variability of manual delineation of lesions and organ contours in CT. • The variability of manual delineations between two observers can be significant. Two and even three observers capture only a fraction of the full range of inter-observer variability observed in common practice. • Inter-observer manual delineation variability is necessary to establish a reference standard for radiologist training and evaluation and for the evaluation of automatic segmentation algorithms.

Citing Articles

Manual segmentation of opacities and consolidations on CT of long COVID patients from multiple annotators.

Carmo D, Pezzulo A, Villacreses R, Eisenbeisz M, Anderson R, Dorin S Sci Data. 2025; 12(1):402.

PMID: 40055348 PMC: 11889079. DOI: 10.1038/s41597-025-04709-2.


Automatic future remnant segmentation in liver resection planning.

Messaoudi H, Abbas M, Badic B, Ben Salem D, Belaid A, Conze P Int J Comput Assist Radiol Surg. 2025; .

PMID: 39961898 DOI: 10.1007/s11548-025-03331-2.


Dual-Stage AI Model for Enhanced CT Imaging: Precision Segmentation of Kidney and Tumors.

Karunanayake N, Lu L, Yang H, Geng P, Akin O, Furberg H Tomography. 2025; 11(1).

PMID: 39852683 PMC: 11769543. DOI: 10.3390/tomography11010003.


Comparison of Vendor-Pretrained and Custom-Trained Deep Learning Segmentation Models for Head-and-Neck, Breast, and Prostate Cancers.

Chen X, Zhao Y, Baroudi H, El Basha M, Daniel A, Gay S Diagnostics (Basel). 2025; 14(24).

PMID: 39767212 PMC: 11675285. DOI: 10.3390/diagnostics14242851.


Bridging human and machine intelligence: Reverse-engineering radiologist intentions for clinical trust and adoption.

Awasthi A, Le N, Deng Z, Agrawal R, Wu C, Van Nguyen H Comput Struct Biotechnol J. 2024; 24:711-723.

PMID: 39660015 PMC: 11629193. DOI: 10.1016/j.csbj.2024.11.012.


References
1.
Warfield S, Zou K, Wells W . Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans Med Imaging. 2004; 23(7):903-21. PMC: 1283110. DOI: 10.1109/TMI.2004.828354. View

2.
Meyer C, Johnson T, McLennan G, Aberle D, Kazerooni E, MacMahon H . Evaluation of lung MDCT nodule annotation across radiologists and methods. Acad Radiol. 2006; 13(10):1254-65. PMC: 1994157. DOI: 10.1016/j.acra.2006.07.012. View

3.
Haas M, Hamm B, Niehues S . Automated lung volumetry from routine thoracic CT scans: how reliable is the result?. Acad Radiol. 2014; 21(5):633-8. DOI: 10.1016/j.acra.2014.01.002. View

4.
Irshad H, Montaser-Kouhsari L, Waltz G, Bucur O, Nowak J, Dong F . Crowdsourcing image annotation for nucleus detection and segmentation in computational pathology: evaluating experts, automated methods, and the crowd. Pac Symp Biocomput. 2015; :294-305. PMC: 4299942. DOI: 10.1142/9789814644730_0029. View

5.
Nanda A, Konar S, Maiti T, Bir S, Guthikonda B . Stratification of predictive factors to assess resectability and surgical outcome in clinoidal meningioma. Clin Neurol Neurosurg. 2016; 142:31-37. DOI: 10.1016/j.clineuro.2016.01.005. View