» Articles » PMID: 34812384

Bias Analysis on Public X-Ray Image Datasets of Pneumonia and COVID-19 Patients

Overview
Journal IEEE Access
Date 2021 Nov 23
PMID 34812384
Citations 5
Authors
Affiliations
Soon will be listed here.
Abstract

Chest X-ray images are useful for early COVID-19 diagnosis with the advantage that X-ray devices are already available in health centers and images are obtained immediately. Some datasets containing X-ray images with cases (pneumonia or COVID-19) and controls have been made available to develop machine-learning-based methods to aid in diagnosing the disease. However, these datasets are mainly composed of different sources coming from pre-COVID-19 datasets and COVID-19 datasets. Particularly, we have detected a significant bias in some of the released datasets used to train and test diagnostic systems, which might imply that the results published are optimistic and may overestimate the actual predictive capacity of the techniques proposed. In this article, we analyze the existing bias in some commonly used datasets and propose a series of preliminary steps to carry out before the classic machine learning pipeline in order to detect possible biases, to avoid them if possible and to report results that are more representative of the actual predictive power of the methods under analysis.

Citing Articles

Challenges issues and future recommendations of deep learning techniques for SARS-CoV-2 detection utilising X-ray and CT images: a comprehensive review.

Islam M, Al Farid F, Shamrat F, Islam M, Rashid M, Bari B PeerJ Comput Sci. 2025; 10:e2517.

PMID: 39896401 PMC: 11784792. DOI: 10.7717/peerj-cs.2517.


Deep Learning for Pneumonia Detection in Chest X-ray Images: A Comprehensive Survey.

Siddiqi R, Javaid S J Imaging. 2024; 10(8).

PMID: 39194965 PMC: 11355845. DOI: 10.3390/jimaging10080176.


Digital Determinants of Health: Health data poverty amplifies existing health disparities-A scoping review.

Paik K, Hicklen R, Kaggwa F, Puyat C, Nakayama L, Ong B PLOS Digit Health. 2023; 2(10):e0000313.

PMID: 37824445 PMC: 10569513. DOI: 10.1371/journal.pdig.0000313.


Validating Automatic Concept-Based Explanations for AI-Based Digital Histopathology.

Sauter D, Lodde G, Nensa F, Schadendorf D, Livingstone E, Kukuk M Sensors (Basel). 2022; 22(14).

PMID: 35891026 PMC: 9319808. DOI: 10.3390/s22145346.


Explainable artificial intelligence-based edge fuzzy images for COVID-19 detection and identification.

Hu Q, Gois F, Costa R, Zhang L, Yin L, Magaia N Appl Soft Comput. 2022; 123:108966.

PMID: 35582662 PMC: 9102011. DOI: 10.1016/j.asoc.2022.108966.


References
1.
Saez C, Zurriaga O, Perez-Panades J, Melchor I, Robles M, Garcia-Gomez J . Applying probabilistic temporal and multisite data quality control methods to a public health mortality registry in Spain: a systematic approach to quality control of repositories. J Am Med Inform Assoc. 2016; 23(6):1085-1095. PMC: 11741068. DOI: 10.1093/jamia/ocw010. View

2.
Dansana D, Kumar R, Bhattacharjee A, Hemanth D, Gupta D, Khanna A . Early diagnosis of COVID-19-affected patients based on X-ray and computed tomography images using deep learning algorithm. Soft comput. 2020; 27(5):2635-2643. PMC: 7453871. DOI: 10.1007/s00500-020-05275-y. View

3.
Ismael A, Sengur A . Deep learning approaches for COVID-19 detection based on chest X-ray images. Expert Syst Appl. 2020; 164:114054. PMC: 7521412. DOI: 10.1016/j.eswa.2020.114054. View

4.
Oh Y, Park S, Ye J . Deep Learning COVID-19 Features on CXR Using Limited Training Data Sets. IEEE Trans Med Imaging. 2020; 39(8):2688-2700. DOI: 10.1109/TMI.2020.2993291. View

5.
Perez-Benito F, Saez C, Conejero J, Tortajada S, Valdivieso B, Garcia-Gomez J . Temporal variability analysis reveals biases in electronic health records due to hospital process reengineering interventions over seven years. PLoS One. 2019; 14(8):e0220369. PMC: 6685618. DOI: 10.1371/journal.pone.0220369. View