Article: Building and analyzing machine learning-based warfarin dose prediction models using scikit-learn

For personalized drug dosing, prediction models may be utilized to overcome the inter-individual variability. Multiple linear regression has been used as a conventional method to model the relationship between patient features and optimal drug dose. However, linear regression cannot capture non-linear relationships and may be adversely affected by non-normal distribution and collinearity of data. To overcome this hurdle, machine learning models have been extensively adapted in drug dose prediction. In this tutorial, random forest and neural network models will be trained in tandem with a multiple linear regression model on the International Warfarin Pharmacogenetics Consortium dataset using the scikit-learn python library. Subsequent model analyses including performance comparison, permutation feature importance computation and partial dependence plotting will be demonstrated. The basic methods of model training and analysis discussed in this article may be implemented in drug dose-related studies.

Citing Articles

Increasing the Impact and Value of Laboratory Medicine Through Effective and AI-Assisted Communication.

Pillay T EJIFCC. 2025; 36(1):12-25.

PMID: 40061065 PMC: 11886622.

Validating Machine Learning Models Against the Saline Test Gold Standard for Primary Aldosteronism Diagnosis.

Liu J, Huang W, Hu J, Hong N, Rhee Y, Li Q JACC Asia. 2025; 4(12):972-984.

PMID: 39802987 PMC: 11712017. DOI: 10.1016/j.jacasi.2024.09.010.

Data science through natural language with ChatGPT's Code Interpreter.

Ahn S Transl Clin Pharmacol. 2024; 32(2):73-82.

PMID: 38974344 PMC: 11224898. DOI: 10.12793/tcp.2024.32.e8.

Investigating the Influence of Heavy Metals and Environmental Factors on Metabolic Syndrome Risk Based on Nutrient Intake: Machine Learning Analysis of Data from the Eighth Korea National Health and Nutrition Examination Survey (KNHANES).

Jeong S, Choi Y Nutrients. 2024; 16(5).

PMID: 38474852 PMC: 10934821. DOI: 10.3390/nu16050724.

Intratumoral and peritumoral MRI-based radiomics prediction of histopathological grade in soft tissue sarcomas: a two-center study.

Zhang L, Yang Y, Wang T, Chen X, Tang M, Deng J Cancer Imaging. 2023; 23(1):103.

PMID: 37885031 PMC: 10601231. DOI: 10.1186/s40644-023-00622-2.

References

1.

Li X, Li D, Wu J, Liu Z, Zhou H, Yin J . Precision dosing of warfarin: open questions and strategies. Pharmacogenomics J. 2019; 19(3):219-229. DOI: 10.1038/s41397-019-0083-3. View

2.

Ma Z, Wang P, Gao Z, Wang R, Khalighi K . Ensemble of machine learning algorithms using the stacked generalization approach to estimate the warfarin dose. PLoS One. 2018; 13(10):e0205872. PMC: 6195267. DOI: 10.1371/journal.pone.0205872. View

3.

Fisher A, Rudin C, Dominici F . All Models are Wrong, but are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously. J Mach Learn Res. 2021; 20. PMC: 8323609. View

4.

Zhao Q, Hastie T . CAUSAL INTERPRETATIONS OF BLACK-BOX MODELS. J Bus Econ Stat. 2020; 2019. PMC: 7597863. DOI: 10.1080/07350015.2019.1624293. View

5.

Cosgun E, Limdi N, Duarte C . High-dimensional pharmacogenetic prediction of a continuous trait using machine learning techniques with application to warfarin dose prediction in African Americans. Bioinformatics. 2011; 27(10):1384-9. PMC: 3087957. DOI: 10.1093/bioinformatics/btr159. View

Building and Analyzing Machine Learning-based Warfarin Dose Prediction Models Using Scikit-learn