» Articles » PMID: 39732723

BMT: A Cross-Validated ThinPrep Pap Cervical Cytology Dataset for Machine Learning Model Training and Validation

Overview
Journal Sci Data
Specialty Science
Date 2024 Dec 28
PMID 39732723
Authors
Affiliations
Soon will be listed here.
Abstract

In the past several years, a few cervical Pap smear datasets have been published for use in clinical training. However, most publicly available datasets consist of pre-segmented single cell images, contain on-image annotations that must be manually edited out, or are prepared using the conventional Pap smear method. Multicellular liquid Pap image datasets are a more accurate reflection of current cervical screening techniques. While a multicellular liquid SurePath™ dataset has been created, machine learning models struggle to classify a test image set when it is prepared differently from the training set due to visual differences. Therefore, this dataset of multicellular Pap smear images prepared with the more common ThinPrep® protocol is presented as a helpful resource for training and testing artificial intelligence models, particularly for future application in cervical dysplasia diagnosis. The "Brown Multicellular ThinPrep" (BMT) dataset is the first publicly available multicellular ThinPrep® dataset, consisting of 600 clinically vetted images collected from 180 Pap smear slides from 180 patients, classified into three key diagnostic categories.

References
1.
Jorundsson E, Lumsden J, Jacobs R . Rapid staining techniques in cytopathology: a review and comparison of modified protocols for hematoxylin and eosin, Papanicolaou and Romanowsky stains. Vet Clin Pathol. 2002; 28(3):100-108. DOI: 10.1111/j.1939-165x.1999.tb01057.x. View

2.
Buskwofie A, David-West G, Clare C . A Review of Cervical Cancer: Incidence and Disparities. J Natl Med Assoc. 2020; 112(2):229-232. DOI: 10.1016/j.jnma.2020.03.002. View

3.
Karasu Benyes Y, Welch E, Singhal A, Ou J, Tripathi A . A Comparative Analysis of Deep Learning Models for Automated Cross-Preparation Diagnosis of Multi-Cell Liquid Pap Smear Images. Diagnostics (Basel). 2022; 12(8). PMC: 9406372. DOI: 10.3390/diagnostics12081838. View

4.
Whitlock E, Vesco K, Eder M, Lin J, Senger C, Burda B . Liquid-based cytology and human papillomavirus testing to screen for cervical cancer: a systematic review for the U.S. Preventive Services Task Force. Ann Intern Med. 2011; 155(10):687-97, W214-5. DOI: 10.7326/0003-4819-155-10-201111150-00376. View

5.
Cheng S, Liu S, Yu J, Rao G, Xiao Y, Han W . Robust whole slide image analysis for cervical cancer screening using deep learning. Nat Commun. 2021; 12(1):5639. PMC: 8463673. DOI: 10.1038/s41467-021-25296-x. View