Evaluation of Feature Selection Techniques for Breast Cancer Risk Prediction
Overview
Public Health
Authors
Affiliations
This study evaluates several feature ranking techniques together with some classifiers based on machine learning to identify relevant factors regarding the probability of contracting breast cancer and improve the performance of risk prediction models for breast cancer in a healthy population. The dataset with 919 cases and 946 controls comes from the MCC-Spain study and includes only environmental and genetic features. Breast cancer is a major public health problem. Our aim is to analyze which factors in the cancer risk prediction model are the most important for breast cancer prediction. Likewise, quantifying the stability of feature selection methods becomes essential before trying to gain insight into the data. This paper assesses several feature selection algorithms in terms of performance for a set of predictive models. Furthermore, their robustness is quantified to analyze both the similarity between the feature selection rankings and their own stability. The ranking provided by the SVM-RFE approach leads to the best performance in terms of the area under the ROC curve (AUC) metric. Top-47 ranked features obtained with this approach fed to the Logistic Regression classifier achieve an AUC = 0.616. This means an improvement of 5.8% in comparison with the full feature set. Furthermore, the SVM-RFE ranking technique turned out to be highly stable (as well as Random Forest), whereas relief and the wrapper approaches are quite unstable. This study demonstrates that the stability and performance of the model should be studied together as Random Forest and SVM-RFE turned out to be the most stable algorithms, but in terms of model performance SVM-RFE outperforms Random Forest.
Sreehari E, Dhinesh Babu L Sci Rep. 2025; 15(1):4171.
PMID: 39905191 PMC: 11794878. DOI: 10.1038/s41598-025-87826-7.
Smart Biosensor for Breast Cancer Survival Prediction Based on Multi-View Multi-Way Graph Learning.
Ma W, Li M, Chu Z, Chen H Sensors (Basel). 2024; 24(11).
PMID: 38894082 PMC: 11174864. DOI: 10.3390/s24113289.
Genetic and lifestyle factors for breast cancer risk assessment in Southeast China.
Zou S, Lin Y, Yu X, Eriksson M, Lin M, Fu F Cancer Med. 2023; 12(14):15504-15514.
PMID: 37264741 PMC: 10417168. DOI: 10.1002/cam4.6198.
Accurate breast cancer diagnosis using a stable feature ranking algorithm.
Yu S, Jin M, Wen T, Zhao L, Zou X, Liang X BMC Med Inform Decis Mak. 2023; 23(1):64.
PMID: 37024893 PMC: 10080822. DOI: 10.1186/s12911-023-02142-2.
ZNF143 Expression is Associated with COPD and Tumor Microenvironment in Non-Small Cell Lung Cancer.
Feng Z, Yin Y, Liu B, Wang L, Chen M, Zhu Y Int J Chron Obstruct Pulmon Dis. 2022; 17:685-700.
PMID: 35400998 PMC: 8986213. DOI: 10.2147/COPD.S352392.