Building and Analyzing Machine Learning-based Warfarin Dose Prediction Models Using Scikit-learn
Overview
Authors
Affiliations
For personalized drug dosing, prediction models may be utilized to overcome the inter-individual variability. Multiple linear regression has been used as a conventional method to model the relationship between patient features and optimal drug dose. However, linear regression cannot capture non-linear relationships and may be adversely affected by non-normal distribution and collinearity of data. To overcome this hurdle, machine learning models have been extensively adapted in drug dose prediction. In this tutorial, random forest and neural network models will be trained in tandem with a multiple linear regression model on the International Warfarin Pharmacogenetics Consortium dataset using the scikit-learn python library. Subsequent model analyses including performance comparison, permutation feature importance computation and partial dependence plotting will be demonstrated. The basic methods of model training and analysis discussed in this article may be implemented in drug dose-related studies.
Pillay T EJIFCC. 2025; 36(1):12-25.
PMID: 40061065 PMC: 11886622.
Liu J, Huang W, Hu J, Hong N, Rhee Y, Li Q JACC Asia. 2025; 4(12):972-984.
PMID: 39802987 PMC: 11712017. DOI: 10.1016/j.jacasi.2024.09.010.
Data science through natural language with ChatGPT's Code Interpreter.
Ahn S Transl Clin Pharmacol. 2024; 32(2):73-82.
PMID: 38974344 PMC: 11224898. DOI: 10.12793/tcp.2024.32.e8.
Jeong S, Choi Y Nutrients. 2024; 16(5).
PMID: 38474852 PMC: 10934821. DOI: 10.3390/nu16050724.
Zhang L, Yang Y, Wang T, Chen X, Tang M, Deng J Cancer Imaging. 2023; 23(1):103.
PMID: 37885031 PMC: 10601231. DOI: 10.1186/s40644-023-00622-2.