Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer

Objective: Assess the performance of automated deep learning algorithms at detecting metastases in hematoxylin and eosin-stained tissue sections of lymph nodes of women with breast cancer and compare it with pathologists' diagnoses in a diagnostic setting.

Design, Setting, And Participants: Researcher challenge competition (CAMELYON16) to develop automated solutions for detecting lymph node metastases (November 2015-November 2016). A training data set of whole-slide images from 2 centers in the Netherlands with (n = 110) and without (n = 160) nodal metastases verified by immunohistochemical staining were provided to challenge participants to build algorithms. Algorithm performance was evaluated in an independent test set of 129 whole-slide images (49 with and 80 without metastases). The same test set of corresponding glass slides was also evaluated by a panel of 11 pathologists with time constraint (WTC) from the Netherlands to ascertain likelihood of nodal metastases for each slide in a flexible 2-hour session, simulating routine pathology workflow, and by 1 pathologist without time constraint (WOTC).

Exposures: Deep learning algorithms submitted as part of a challenge competition or pathologist interpretation.

Main Outcomes And Measures: The presence of specific metastatic foci and the absence vs presence of lymph node metastasis in a slide or image using receiver operating characteristic curve analysis. The 11 pathologists participating in the simulation exercise rated their diagnostic confidence as definitely normal, probably normal, equivocal, probably tumor, or definitely tumor.

Results: The area under the receiver operating characteristic curve (AUC) for the algorithms ranged from 0.556 to 0.994. The top-performing algorithm achieved a lesion-level, true-positive fraction comparable with that of the pathologist WOTC (72.4% [95% CI, 64.3%-80.4%]) at a mean of 0.0125 false-positives per normal whole-slide image. For the whole-slide image classification task, the best algorithm (AUC, 0.994 [95% CI, 0.983-0.999]) performed significantly better than the pathologists WTC in a diagnostic simulation (mean AUC, 0.810 [range, 0.738-0.884]; P < .001). The top 5 algorithms had a mean AUC that was comparable with the pathologist interpreting the slides in the absence of time constraints (mean AUC, 0.960 [range, 0.923-0.994] for the top 5 algorithms vs 0.966 [95% CI, 0.927-0.998] for the pathologist WOTC).

Conclusions And Relevance: In the setting of a challenge competition, some deep learning algorithms achieved better diagnostic performance than a panel of 11 pathologists participating in a simulation exercise designed to mimic routine pathology workflow; algorithm performance was comparable with an expert pathologist interpreting whole-slide images without time constraints. Whether this approach has clinical utility will require evaluation in a clinical setting.

Citing Articles

Abnormality-aware multimodal learning for WSI classification.

Dang T, Zhou Q, Guo Y, Ma H, Na S, Dang T Front Med (Lausanne). 2025; 12:1546452.

PMID: 40070646 PMC: 11893561. DOI: 10.3389/fmed.2025.1546452.

A foundation model for generalizable cancer diagnosis and survival prediction from histopathological images.

Yang Z, Wei T, Liang Y, Yuan X, Gao R, Xia Y Nat Commun. 2025; 16(1):2366.

PMID: 40064883 PMC: 11894166. DOI: 10.1038/s41467-025-57587-y.

Unlocking the potential of digital pathology: Novel baselines for compression.

Fischer M, Neher P, Schuffler P, Ziegler S, Xiao S, Peretzke R J Pathol Inform. 2025; 17:100421.

PMID: 40059908 PMC: 11889581. DOI: 10.1016/j.jpi.2025.100421.

Tumor‑stroma ratio as a clinical prognostic factor in colorectal carcinoma: A meta‑analysis of 7,934 patients.

Shang A, Yu P, Li L, He G, Xu J Oncol Lett. 2025; 29(4):190.

PMID: 40041409 PMC: 11877013. DOI: 10.3892/ol.2025.14936.

Deeply supervised two stage generative adversarial network for stain normalization.

Du Z, Zhang P, Huang X, Hu Z, Yang G, Xi M Sci Rep. 2025; 15(1):7068.

PMID: 40016308 PMC: 11868385. DOI: 10.1038/s41598-025-91587-8.

References

Gulshan V, Peng L, Coram M, Stumpe M, Wu D, Narayanaswamy A . Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA. 2016; 316(22):2402-2410. DOI: 10.1001/jama.2016.17216. View

Chagpar A, Middleton L, Sahin A, Meric-Bernstam F, Kuerer H, Feig B . Clinical outcome of patients with lymph node-negative breast carcinoma who have sentinel lymph node micrometastases detected by immunohistochemistry. Cancer. 2005; 103(8):1581-6. DOI: 10.1002/cncr.20934. View

Hillis S, Obuchowski N, Berbaum K . Power estimation for multireader ROC methods an updated and unified approach. Acad Radiol. 2011; 18(2):129-42. PMC: 3053069. DOI: 10.1016/j.acra.2010.09.007. View

Litjens G, Sanchez C, Timofeeva N, Hermsen M, Nagtegaal I, Kovacs I . Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Sci Rep. 2016; 6:26286. PMC: 4876324. DOI: 10.1038/srep26286. View

Chakraborty D . Recent developments in imaging system assessment methodology, FROC analysis and the search model. Nucl Instrum Methods Phys Res A. 2011; 648 Supplement 1:S297-S301. PMC: 3144765. DOI: 10.1016/j.nima.2010.11.042. View

Obuchowski N, Beiden S, Berbaum K, Hillis S, Ishwaran H, Song H . Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods. Acad Radiol. 2004; 11(9):980-95. DOI: 10.1016/j.acra.2004.04.014. View

Pendas S, Dauway E, Cox C, Giuliano R, Ku N, Schreiber R . Sentinel node biopsy and cytokeratin staining for the accurate staging of 478 breast cancer patients. Am Surg. 1999; 65(6):500-5; discussion 505-6. View

Vestjens J, Pepels M, de Boer M, Borm G, van Deurzen C, van Diest P . Relevant impact of central pathology review on nodal classification in individual breast cancer patients. Ann Oncol. 2012; 23(10):2561-2566. DOI: 10.1093/annonc/mds072. View

Bejnordi B, Litjens G, Timofeeva N, Otte-Holler I, Homeyer A, Karssemeijer N . Stain Specific Standardization of Whole-Slide Histopathological Images. IEEE Trans Med Imaging. 2015; 35(2):404-15. DOI: 10.1109/TMI.2015.2476509. View

10.

Badrinarayanan V, Kendall A, Cipolla R . SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans Pattern Anal Mach Intell. 2017; 39(12):2481-2495. DOI: 10.1109/TPAMI.2016.2644615. View

11.

Gallas B, Chan H, DOrsi C, Dodd L, Giger M, Gur D . Evaluating imaging and computer-aided detection and diagnosis devices at the FDA. Acad Radiol. 2012; 19(4):463-77. PMC: 5557046. DOI: 10.1016/j.acra.2011.12.016. View

12.

Reed J, Rosman M, Verbanac K, Mannie A, Cheng Z, Tafra L . Prognostic implications of isolated tumor cells and micrometastases in sentinel nodes of patients with invasive breast cancer: 10-year analysis of patients enrolled in the prospective East Carolina University/Anne Arundel Medical Center Sentinel.... J Am Coll Surg. 2009; 208(3):333-40. DOI: 10.1016/j.jamcollsurg.2008.10.036. View

13.

Madabhushi A, Lee G . Image analysis and machine learning in digital pathology: Challenges and opportunities. Med Image Anal. 2016; 33:170-175. PMC: 5556681. DOI: 10.1016/j.media.2016.06.037. View

14.

Esteva A, Kuprel B, Novoa R, Ko J, Swetter S, Blau H . Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017; 542(7639):115-118. PMC: 8382232. DOI: 10.1038/nature21056. View

15.

Albarqouni S, Baur C, Achilles F, Belagiannis V, Demirci S, Navab N . AggNet: Deep Learning From Crowds for Mitosis Detection in Breast Cancer Histology Images. IEEE Trans Med Imaging. 2016; 35(5):1313-21. DOI: 10.1109/TMI.2016.2528120. View

16.

Griffin J, Treanor D . Digital pathology in clinical use: where are we now and what is holding us back?. Histopathology. 2016; 70(1):134-145. DOI: 10.1111/his.12993. View

17.

Dorfman D, Berbaum K, Metz C . Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. Invest Radiol. 1992; 27(9):723-31. View