» Articles » PMID: 19095540

Exploratory Undersampling for Class-imbalance Learning

Overview
Date 2008 Dec 20
PMID 19095540
Citations 158
Authors
Affiliations
Soon will be listed here.
Abstract

Undersampling is a popular method in dealing with class-imbalance problems, which uses only a subset of the majority class and thus is very efficient. The main deficiency is that many majority class examples are ignored. We propose two algorithms to overcome this deficiency. EasyEnsemble samples several subsets from the majority class, trains a learner using each of them, and combines the outputs of those learners. BalanceCascade trains the learners sequentially, where in each step, the majority class examples that are correctly classified by the current trained learners are removed from further consideration. Experimental results show that both methods have higher Area Under the ROC Curve, F-measure, and G-mean values than many existing class-imbalance learning methods. Moreover, they have approximately the same training time as that of undersampling when the same number of weak classifiers is used, which is significantly faster than other methods.

Citing Articles

Development of model for identifying homologous recombination deficiency (HRD) status of ovarian cancer with deep learning on whole slide images.

Zhang K, Qiu Y, Feng S, Yin H, Liu Q, Zhu Y J Transl Med. 2025; 23(1):267.

PMID: 40038690 PMC: 11877705. DOI: 10.1186/s12967-025-06234-7.


Unused housing in urban China and its carbon emission impact.

Zheng H, Zhang R, Yin X, Wu J Nat Commun. 2025; 16(1):1985.

PMID: 40011428 PMC: 11865530. DOI: 10.1038/s41467-025-57217-7.


Interpretable Machine Learning to Predict the Malignancy Risk of Follicular Thyroid Neoplasms in Extremely Unbalanced Data: Retrospective Cohort Study and Literature Review.

Shan R, Li X, Chen J, Chen Z, Cheng Y, Han B JMIR Cancer. 2025; 11:e66269.

PMID: 39930991 PMC: 11833187. DOI: 10.2196/66269.


Addressing imbalanced data classification with Cluster-Based Reduced Noise SMOTE.

Hemmatian J, Hajizadeh R, Nazari F PLoS One. 2025; 20(2):e0317396.

PMID: 39928607 PMC: 11809912. DOI: 10.1371/journal.pone.0317396.


Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification.

Salehi A, Khedmati M Sci Rep. 2025; 15(1):3460.

PMID: 39870706 PMC: 11772689. DOI: 10.1038/s41598-024-84786-2.