» Articles » PMID: 20634556

Local-learning-based Feature Selection for High-dimensional Data Analysis

Overview
Authors
Affiliations
Soon will be listed here.
Abstract

This paper considers feature selection for data classification in the presence of a huge number of irrelevant features. We propose a new feature-selection algorithm that addresses several major issues with prior work, including problems with algorithm implementation, computational complexity, and solution accuracy. The key idea is to decompose an arbitrarily complex nonlinear problem into a set of locally linear ones through local learning, and then learn feature relevance globally within the large margin framework. The proposed algorithm is based on well-established machine learning and numerical analysis techniques, without making any assumptions about the underlying data distribution. It is capable of processing many thousands of features within minutes on a personal computer while maintaining a very high accuracy that is nearly insensitive to a growing number of irrelevant features. Theoretical analyses of the algorithm's sample complexity suggest that the algorithm has a logarithmical sample complexity with respect to the number of features. Experiments on 11 synthetic and real-world data sets demonstrate the viability of our formulation of the feature-selection problem for supervised learning and the effectiveness of our algorithm.

Citing Articles

Prostate Cancer Progression Modeling Provides Insight into Dynamic Molecular Changes Associated with Progressive Disease States.

Chen R, Tang L, Melendy T, Yang L, Goodison S, Sun Y Cancer Res Commun. 2024; 4(10):2783-2798.

PMID: 39347576 PMC: 11500312. DOI: 10.1158/2767-9764.CRC-24-0210.


Exploring the Role of Machine Learning in Diagnosing and Treating Speech Disorders: A Systematic Literature Review.

Brahmi Z, Mahyoob M, Al-Sarem M, Algaraady J, Bousselmi K, Alblwi A Psychol Res Behav Manag. 2024; 17:2205-2232.

PMID: 38835654 PMC: 11149643. DOI: 10.2147/PRBM.S460283.


Time Series Data Prediction and Feature Analysis of Sports Dance Movements Based on Machine Learning.

Zheng D, Yuan Y Comput Intell Neurosci. 2022; 2022:5611829.

PMID: 36059406 PMC: 9433201. DOI: 10.1155/2022/5611829.


Computational approach to modeling microbiome landscapes associated with chronic human disease progression.

Li L, Sohn J, Genco R, Wactawski-Wende J, Goodison S, Diaz P PLoS Comput Biol. 2022; 18(8):e1010373.

PMID: 35926003 PMC: 9380910. DOI: 10.1371/journal.pcbi.1010373.


Relevance, redundancy, and complementarity trade-off (RRCT): A principled, generic, robust feature-selection tool.

Tsanas A Patterns (N Y). 2022; 3(5):100471.

PMID: 35607618 PMC: 9122960. DOI: 10.1016/j.patter.2022.100471.


References
1.
Roth V . The generalized LASSO. IEEE Trans Neural Netw. 2004; 15(1):16-28. DOI: 10.1109/TNN.2003.809398. View

2.
J van t Veer L, Dai H, van de Vijver M, He Y, Hart A, Mao M . Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002; 415(6871):530-6. DOI: 10.1038/415530a. View

3.
Donoho D, Elad M . Optimally sparse representation in general (nonorthogonal) dictionaries via l minimization. Proc Natl Acad Sci U S A. 2006; 100(5):2197-202. PMC: 153464. DOI: 10.1073/pnas.0437847100. View

4.
Stephenson A, Smith A, Kattan M, Satagopan J, Reuter V, Scardino P . Integration of gene expression profiling and clinical variables to predict prostate carcinoma recurrence after radical prostatectomy. Cancer. 2005; 104(2):290-8. PMC: 1852494. DOI: 10.1002/cncr.21157. View

5.
Sun Y, Goodison S, Li J, Liu L, Farmerie W . Improved breast cancer prognosis through the combination of clinical and genetic markers. Bioinformatics. 2006; 23(1):30-7. PMC: 3431620. DOI: 10.1093/bioinformatics/btl543. View