» Articles » PMID: 39005487

The Receiver Operating Characteristic Curve Accurately Assesses Imbalanced Datasets

Overview
Journal Patterns (N Y)
Date 2024 Jul 15
PMID 39005487
Authors
Affiliations
Soon will be listed here.
Abstract

Many problems in biology require looking for a "needle in a haystack," corresponding to a binary classification where there are a few positives within a much larger set of negatives, which is referred to as a class imbalance. The receiver operating characteristic (ROC) curve and the associated area under the curve (AUC) have been reported as ill-suited to evaluate prediction performance on imbalanced problems where there is more interest in performance on the positive minority class, while the precision-recall (PR) curve is preferable. We show via simulation and a real case study that this is a misinterpretation of the difference between the ROC and PR spaces, showing that the ROC curve is robust to class imbalance, while the PR curve is highly sensitive to class imbalance. Furthermore, we show that class imbalance cannot be easily disentangled from classifier performance measured via PR-AUC.

Citing Articles

Using a Neural Network Architecture for the Prediction of Neurologic Outcome for Out-of-Hospital Cardiac Arrests Using Hospital Level Variables and Novel Physiologic Markers.

Razo M, Kotini P, Li J, Khosla S, Buhimschi I, Hoek T Bioengineering (Basel). 2025; 12(2).

PMID: 40001644 PMC: 11852285. DOI: 10.3390/bioengineering12020124.


Comparative Analysis of Recurrent Neural Networks with Conjoint Fingerprints for Skin Corrosion Prediction.

Duy H, Srisongkram T J Chem Inf Model. 2025; 65(3):1305-1317.

PMID: 39835935 PMC: 11815816. DOI: 10.1021/acs.jcim.4c02062.


Application of machine learning for detecting high fall risk in middle-aged workers using video-based analysis of the first 3 steps.

Sakane N, Yamauchi K, Kutsuna I, Suganuma A, Domichi M, Hirano K J Occup Health. 2025; 67(1).

PMID: 39792357 PMC: 11848130. DOI: 10.1093/joccuh/uiae075.


Assessing Glioblastoma Treatment Response Using Machine Learning Approach Based on Magnetic Resonance Images Radiomics: An Exploratory Study.

Sadeghinasab A, Fatahiasl J, Tahmasbi M, Razmjoo S, Yousefipour M Health Sci Rep. 2025; 8(1):e70323.

PMID: 39741746 PMC: 11683675. DOI: 10.1002/hsr2.70323.


Understanding the heterogeneous performance of variant effect predictors across human protein-coding genes.

Fawzy M, Marsh J Sci Rep. 2024; 14(1):26114.

PMID: 39478110 PMC: 11526010. DOI: 10.1038/s41598-024-76202-6.


References
1.
Williams N, Rodrigues C, Truong J, Ascher D, Holien J . DockNet: high-throughput protein-protein interface contact prediction. Bioinformatics. 2022; 39(1). PMC: 9825772. DOI: 10.1093/bioinformatics/btac797. View

2.
Pittala S, Bailey-Kellogg C . Learning context-aware structural representations to predict antigen and antibody binding interfaces. Bioinformatics. 2020; 36(13):3996-4003. PMC: 7332568. DOI: 10.1093/bioinformatics/btaa263. View

3.
Li Y, Sackett P, Nielsen M, Barra C . NetAllergen, a random forest model integrating MHC-II presentation propensity for improved allergenicity prediction. Bioinform Adv. 2023; 3(1):vbad151. PMC: 10603389. DOI: 10.1093/bioadv/vbad151. View

4.
Swamidass S, Azencott C, Daily K, Baldi P . A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval. Bioinformatics. 2010; 26(10):1348-56. PMC: 2865862. DOI: 10.1093/bioinformatics/btq140. View

5.
Chicco D, Jurman G . The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 2020; 21(1):6. PMC: 6941312. DOI: 10.1186/s12864-019-6413-7. View