» Articles » PMID: 35603639

A Classification for Complex Imbalanced Data in Disease Screening and Early Diagnosis

Overview
Journal Stat Med
Publisher Wiley
Specialty Public Health
Date 2022 May 23
PMID 35603639
Authors
Affiliations
Soon will be listed here.
Abstract

Imbalanced classification has drawn considerable attention in the statistics and machine learning literature. Typically, traditional classification methods often perform poorly when a severely skewed class distribution is observed, not to mention under a high-dimensional longitudinal data structure. Given the ubiquity of big data in modern health research, it is expected that imbalanced classification in disease diagnosis may encounter an additional level of difficulty that is imposed by such a complex data structure. In this article, we propose a nonparametric classification approach for imbalanced data in longitudinal and high-dimensional settings. Technically, the functional principal component analysis is first applied for feature extraction under the longitudinal structure. The univariate exponential loss function coupled with group LASSO penalty is then adopted into the classification procedure in high-dimensional settings. Along with a good improvement in imbalanced classification, our approach provides a meaningful feature selection for interpretation while enjoying a remarkably lower computational complexity. The proposed method is illustrated on the real data application of Alzheimer's disease early detection and its empirical performance in finite sample size is extensively evaluated by simulations.

Citing Articles

Comparing the Artificial Intelligence Detection Models to Standard Diagnostic Methods and Alternative Models in Identifying Alzheimer's Disease in At-Risk or Early Symptomatic Individuals: A Scoping Review.

Babu B, Parvathy G, Mohideen Bawa F, Gill G, Patel J, Sibia D Cureus. 2025; 16(12):e75389.

PMID: 39781179 PMC: 11709138. DOI: 10.7759/cureus.75389.


Enhancing Alzheimer's disease classification through split federated learning and GANs for imbalanced datasets.

Narayanee Nimeshika G, D S PeerJ Comput Sci. 2024; 10:e2459.

PMID: 39650412 PMC: 11623002. DOI: 10.7717/peerj-cs.2459.


Machine learning-enabled prediction of prolonged length of stay in hospital after surgery for tuberculosis spondylitis patients with unbalanced data: a novel approach using explainable artificial intelligence (XAI).

Yasin P, Yimit Y, Cai X, Aimaiti A, Sheng W, Mamat M Eur J Med Res. 2024; 29(1):383.

PMID: 39054495 PMC: 11270948. DOI: 10.1186/s40001-024-01988-0.


Differentiating Pressure Ulcer Risk Levels through Interpretable Classification Models Based on Readily Measurable Indicators.

Vera-Salmeron E, Dominguez-Nogueira C, Saez J, Romero-Bejar J, Mota-Romero E Healthcare (Basel). 2024; 12(9).

PMID: 38727470 PMC: 11083727. DOI: 10.3390/healthcare12090913.


A classification for complex imbalanced data in disease screening and early diagnosis.

Li Y, Hsu W Stat Med. 2022; 41(19):3679-3695.

PMID: 35603639 PMC: 9541048. DOI: 10.1002/sim.9442.

References
1.
De la Cruz-Mesia R, Quintana F . A model-based approach to Bayesian classification with applications to predicting pregnancy outcomes from longitudinal beta-hCG profiles. Biostatistics. 2006; 8(2):228-38. DOI: 10.1093/biostatistics/kxl003. View

2.
Hu J, Yang H, Lyu M, King I, So A . Online Nonlinear AUC Maximization for Imbalanced Data Sets. IEEE Trans Neural Netw Learn Syst. 2017; 29(4):882-895. DOI: 10.1109/TNNLS.2016.2610465. View

3.
Friedman J, Hastie T, Tibshirani R . Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 2010; 33(1):1-22. PMC: 2929880. View

4.
Ma S, Huang J . Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics. 2005; 21(24):4356-62. DOI: 10.1093/bioinformatics/bti724. View

5.
Yao F, Muller H, Clifford A, Dueker S, Follett J, Lin Y . Shrinkage estimation for functional principal component scores with application to the population kinetics of plasma folate. Biometrics. 2003; 59(3):676-85. DOI: 10.1111/1541-0420.00078. View