The Receiver Operating Characteristic Curve Accurately Assesses Imbalanced Datasets
Overview
Authors
Affiliations
Many problems in biology require looking for a "needle in a haystack," corresponding to a binary classification where there are a few positives within a much larger set of negatives, which is referred to as a class imbalance. The receiver operating characteristic (ROC) curve and the associated area under the curve (AUC) have been reported as ill-suited to evaluate prediction performance on imbalanced problems where there is more interest in performance on the positive minority class, while the precision-recall (PR) curve is preferable. We show via simulation and a real case study that this is a misinterpretation of the difference between the ROC and PR spaces, showing that the ROC curve is robust to class imbalance, while the PR curve is highly sensitive to class imbalance. Furthermore, we show that class imbalance cannot be easily disentangled from classifier performance measured via PR-AUC.
Razo M, Kotini P, Li J, Khosla S, Buhimschi I, Hoek T Bioengineering (Basel). 2025; 12(2).
PMID: 40001644 PMC: 11852285. DOI: 10.3390/bioengineering12020124.
Duy H, Srisongkram T J Chem Inf Model. 2025; 65(3):1305-1317.
PMID: 39835935 PMC: 11815816. DOI: 10.1021/acs.jcim.4c02062.
Sakane N, Yamauchi K, Kutsuna I, Suganuma A, Domichi M, Hirano K J Occup Health. 2025; 67(1).
PMID: 39792357 PMC: 11848130. DOI: 10.1093/joccuh/uiae075.
Sadeghinasab A, Fatahiasl J, Tahmasbi M, Razmjoo S, Yousefipour M Health Sci Rep. 2025; 8(1):e70323.
PMID: 39741746 PMC: 11683675. DOI: 10.1002/hsr2.70323.
Fawzy M, Marsh J Sci Rep. 2024; 14(1):26114.
PMID: 39478110 PMC: 11526010. DOI: 10.1038/s41598-024-76202-6.