Machine Learning-based Prediction of Distant Metastasis Risk in Invasive Ductal Carcinoma of the Breast
Overview
Authors
Affiliations
More than 90% of deaths due to breast cancer (BC) are due to metastasis-related complications, with invasive ductal carcinoma (IDC) of the breast being the most common pathologic type of breast cancer and highly susceptible to metastasis to distant organs. BC patients who develop cancer metastases are more likely to have a poor prognosis and poor quality of life, so it is extremely important to recognize and diagnose whether distant metastases have occurred in IDC as early as possible. In this study, we develop a non-invasive breast cancer classification system for detecting cancer metastasis. We used Anaconda-Jupyter notebooks to develop various Python programming modules for text mining, data processing, and machine learning (ML) methods. A risk prediction model was constructed based on four algorithms: Random Forest, XGBoost, Logistic Regression, and SVM. Additionally, we developed a hybrid model based on a voting mechanism using these four algorithms as the base models. The models were compared and evaluated by the following metrics: accuracy, precision, recall, F1-score, and area under the ROC curve (AUC) values. The experimental results show that the hybrid model based on the voting mechanism exhibits the best prediction performance (accuracy: 0.867, precision: 0.929, recall: 0.805, F1-score: 0.856, AUC: 0.94). This stable risk prediction model provides a valuable reference support for doctors in assessing and diagnosing the risk of IDC hematogenous metastasis. It also improves the work efficiency of doctors and strives to provide patients with increased chances of survival.