Artificial Intelligence-based Image Analysis in Clinical Testing: Lessons from Cervical Cancer Screening

Overview

Journal J Natl Cancer Inst

Publisher Oxford University Press

Specialty Oncology

Date 2023 Sep 27

PMID 37758250

Authors

Didem Egemen

Rebecca B Perkins

Li C Cheung

Brian Befano

Ana Cecilia Rodriguez

Kanan Desai

Andreanne Lemay

Syed Rakin Ahmed

Sameer Antani

Jose Jeronimo

Nicolas Wentzensen

Jayashree Kalpathy-Cramer

Silvia de Sanjose

Mark Schiffman

Affiliations

Soon will be listed here.

Abstract

Novel screening and diagnostic tests based on artificial intelligence (AI) image recognition algorithms are proliferating. Some initial reports claim outstanding accuracy followed by disappointing lack of confirmation, including our own early work on cervical screening. This is a presentation of lessons learned, organized as a conceptual step-by-step approach to bridge the gap between the creation of an AI algorithm and clinical efficacy. The first fundamental principle is specifying rigorously what the algorithm is designed to identify and what the test is intended to measure (eg, screening, diagnostic, or prognostic). Second, designing the AI algorithm to minimize the most clinically important errors. For example, many equivocal cervical images cannot yet be labeled because the borderline between cases and controls is blurred. To avoid a misclassified case-control dichotomy, we have isolated the equivocal cases and formally included an intermediate, indeterminate class (severity order of classes: case>indeterminate>control). The third principle is evaluating AI algorithms like any other test, using clinical epidemiologic criteria. Repeatability of the algorithm at the borderline, for indeterminate images, has proven extremely informative. Distinguishing between internal and external validation is also essential. Linking the AI algorithm results to clinical risk estimation is the fourth principle. Absolute risk (not relative) is the critical metric for translating a test result into clinical use. Finally, generating risk-based guidelines for clinical use that match local resources and priorities is the last principle in our approach. We are particularly interested in applications to lower-resource settings to address health disparities. We note that similar principles apply to other domains of AI-based image analysis for medical diagnostic testing.

Citing Articles

Generalizable deep neural networks for image quality classification of cervical images.

Ahmed S, Befano B, Egemen D, Rodriguez A, Desai K, Jeronimo J Sci Rep. 2025; 15(1):6312.

PMID: 39984572 PMC: 11845747. DOI: 10.1038/s41598-025-90024-0.

Design and validation of ultra-compact metamaterial-based biosensor for non-invasive cervical cancer diagnosis in terahertz regime.

Hamza M, Islam M, Lavadiya S, Ud Din I, Sanches B, Koziel S PLoS One. 2025; 20(2):e0311431.

PMID: 39899558 PMC: 11790148. DOI: 10.1371/journal.pone.0311431.

Prospective Applications of Artificial Intelligence In Fetal Medicine: A Scoping Review of Recent Updates.

Miskeen E, Alfaifi J, Alhuian D, Alghamdi M, Alharthi M, Alshahrani N Int J Gen Med. 2025; 18():237-245.

PMID: 39834911 PMC: 11745059. DOI: 10.2147/IJGM.S490261.

Automated Image Clarity Detection for the Improvement of Colposcopy Imaging with Multiple Devices.

Ekem L, Skerrett E, Huchko M, Ramanujam N Biomed Signal Process Control. 2024; 100(Pt B).

PMID: 39669100 PMC: 11633643. DOI: 10.1016/j.bspc.2024.106948.

The Future of Cervical Cancer Screening.

Goldstein A, Gersh M, Skovronsky G, Moss C Int J Womens Health. 2024; 16:1715-1731.

PMID: 39464249 PMC: 11512781. DOI: 10.2147/IJWH.S474571.

References

Katki H, Kinney W, Fetterman B, Lorey T, Poitras N, Cheung L . Cervical cancer risk for women undergoing concurrent testing for human papillomavirus and cervical cytology: a population-based study in routine clinical practice. Lancet Oncol. 2011; 12(7):663-72. PMC: 3272857. DOI: 10.1016/S1470-2045(11)70145-0. View

Desai K, Befano B, Xue Z, Kelly H, Campos N, Egemen D . The development of "automated visual evaluation" for cervical cancer screening: The promise and challenges in adapting deep-learning for clinical testing: Interdisciplinary principles of automated visual evaluation in cervical screening. Int J Cancer. 2021; 150(5):741-752. PMC: 8732320. DOI: 10.1002/ijc.33879. View

Egemen D, Cheung L, Chen X, Demarco M, Perkins R, Kinney W . Risk Estimates Supporting the 2019 ASCCP Risk-Based Management Consensus Guidelines. J Low Genit Tract Dis. 2020; 24(2):132-143. PMC: 7147417. DOI: 10.1097/LGT.0000000000000529. View

Perkins R, Smith D, Jeronimo J, Campos N, Gage J, Hansen N . Use of risk-based cervical screening programs in resource-limited settings. Cancer Epidemiol. 2023; 84:102369. DOI: 10.1016/j.canep.2023.102369. View

Gidwani M, Chang K, Patel J, Hoebel K, Ahmed S, Singh P . Inconsistent Partitioning and Unproductive Feature Associations Yield Idealized Radiomic Models. Radiology. 2022; 307(1):e220715. PMC: 10068883. DOI: 10.1148/radiol.220715. View

Pan I, Thodberg H, Halabi S, Kalpathy-Cramer J, Larson D . Improving Automated Pediatric Bone Age Estimation Using Ensembles of Models from the 2017 RSNA Machine Learning Challenge. Radiol Artif Intell. 2020; 1(6):e190053. PMC: 6884060. DOI: 10.1148/ryai.2019190053. View

Inturrisi F, de Sanjose S, Desai K, Dagnall C, Egemen D, Befano B . A rapid HPV typing assay to support global cervical cancer screening and risk-based management: A cross-sectional study. Int J Cancer. 2023; 154(2):241-250. DOI: 10.1002/ijc.34698. View

Wentzensen N, Lahrmann B, Clarke M, Kinney W, Tokugawa D, Poitras N . Accuracy and Efficiency of Deep-Learning-Based Automation of Dual Stain Cytology in Cervical Cancer Screening. J Natl Cancer Inst. 2020; 113(1):72-79. PMC: 7781458. DOI: 10.1093/jnci/djaa066. View

Halabi S, Prevedello L, Kalpathy-Cramer J, Mamonov A, Bilbily A, Cicero M . The RSNA Pediatric Bone Age Machine Learning Challenge. Radiology. 2018; 290(2):498-503. PMC: 6358027. DOI: 10.1148/radiol.2018180736. View

10.

Mongan J, Moy L, Kahn Jr C . Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A Guide for Authors and Reviewers. Radiol Artif Intell. 2021; 2(2):e200029. PMC: 8017414. DOI: 10.1148/ryai.2020200029. View

11.

Bridge C, Best T, Wrobel M, Marquardt J, Magudia K, Javidan C . A Fully Automated Deep Learning Pipeline for Multi-Vertebral Level Quantification and Characterization of Muscle and Adipose Tissue on Chest CT Scans. Radiol Artif Intell. 2022; 4(1):e210080. PMC: 8823460. DOI: 10.1148/ryai.210080. View

12.

Desai K, Adepiti C, Schiffman M, Egemen D, Gage J, Wentzensen N . Redesign of a rapid, low-cost HPV typing assay to support risk-based cervical screening and management. Int J Cancer. 2022; 151(7):1142-1149. PMC: 9378567. DOI: 10.1002/ijc.34151. View

13.

Hu L, Bell D, Antani S, Xue Z, Yu K, Horning M . An Observational Study of Deep Learning and Automated Evaluation of Cervical Images for Cancer Screening. J Natl Cancer Inst. 2019; 111(9):923-932. PMC: 6748814. DOI: 10.1093/jnci/djy225. View

14.

Kim D, Jang H, Kim K, Shin Y, Park S . Design Characteristics of Studies Reporting the Performance of Artificial Intelligence Algorithms for Diagnostic Analysis of Medical Images: Results from Recently Published Papers. Korean J Radiol. 2019; 20(3):405-410. PMC: 6389801. DOI: 10.3348/kjr.2019.0025. View

15.

Katki H . Quantifying risk stratification provided by diagnostic tests and risk predictions: Comparison to AUC and decision curve analysis. Stat Med. 2019; 38(16):2943-2955. PMC: 6827980. DOI: 10.1002/sim.8163. View

16.

deCampos-Stairiker M, Coyner A, Gupta A, Oh M, Shah P, Subramanian P . Epidemiologic Evaluation of Retinopathy of Prematurity Severity in a Large Telemedicine Program in India Using Artificial Intelligence. Ophthalmology. 2023; 130(8):837-843. PMC: 10524227. DOI: 10.1016/j.ophtha.2023.03.026. View

17.

Bouvard V, Wentzensen N, Mackie A, Berkhof J, Brotherton J, Giorgi-Rossi P . The IARC Perspective on Cervical Cancer Screening. N Engl J Med. 2021; 385(20):1908-1918. DOI: 10.1056/NEJMsr2030640. View

18.

Justice A, Covinsky K, Berlin J . Assessing the generalizability of prognostic information. Ann Intern Med. 1999; 130(6):515-24. DOI: 10.7326/0003-4819-130-6-199903160-00016. View

19.

Ahmed S, Befano B, Lemay A, Egemen D, Rodriguez A, Angara S . Reproducible and clinically translatable deep neural networks for cervical screening. Sci Rep. 2023; 13(1):21772. PMC: 10709439. DOI: 10.1038/s41598-023-48721-1. View

20.

Schiffman M, Castle P, Jeronimo J, Rodriguez A, Wacholder S . Human papillomavirus and cervical cancer. Lancet. 2007; 370(9590):890-907. DOI: 10.1016/S0140-6736(07)61416-0. View