» Articles » PMID: 38455941

Greedy Knot Selection Algorithm for Restricted Cubic Spline Regression

Overview
Journal Front Epidemiol
Specialty Public Health
Date 2024 Mar 8
PMID 38455941
Authors
Affiliations
Soon will be listed here.
Abstract

Non-linear regression modeling is common in epidemiology for prediction purposes or estimating relationships between predictor and response variables. Restricted cubic spline (RCS) regression is one such method, for example, highly relevant to Cox proportional hazard regression model analysis. RCS regression uses third-order polynomials joined at knot points to model non-linear relationships. The standard approach is to place knots by a regular sequence of quantiles between the outer boundaries. A regression curve can easily be fitted to the sample using a relatively high number of knots. The problem is then overfitting, where a regression model has a good fit to the given sample but does not generalize well to other samples. A low knot count is thus preferred. However, the standard knot selection process can lead to underperformance in the sparser regions of the predictor variable, especially when using a low number of knots. It can also lead to overfitting in the denser regions. We present a simple greedy search algorithm using a backward method for knot selection that shows reduced prediction error and Bayesian information criterion scores compared to the standard knot selection process in simulation experiments. We have implemented the algorithm as part of an open-source R-package, knutar.

Citing Articles

Correlation between liver fibrosis in non-alcoholic fatty liver disease and insulin resistance indicators: a cross-sectional study from NHANES 2017-2020.

Yang B, Gong M, Zhu X, Luo Y, Li R, Meng H Front Endocrinol (Lausanne). 2025; 16:1514093.

PMID: 39959621 PMC: 11825334. DOI: 10.3389/fendo.2025.1514093.


Endocrine disruptors and bladder function: the role of phthalates in overactive bladder.

Liu L, Li X, Hao X, Xu Z, Wang Q, Ren C Front Public Health. 2024; 12:1493794.

PMID: 39722714 PMC: 11668814. DOI: 10.3389/fpubh.2024.1493794.


Triglyceride glucose index is associated with vertebral fracture in older adults: a longitudinal study.

Wei Z, Gao X, Wang J, Wang Y, Tang H, Ma Z Endocrine. 2024; 87(3):1022-1030.

PMID: 39699802 DOI: 10.1007/s12020-024-04136-0.


Association between branched-chain amino acid levels and gastric cancer risk: large-scale prospective cohort study.

Yu L, Bao S, Zhu F, Xu Y, Liu Y, Jiang R Front Nutr. 2024; 11:1479800.

PMID: 39634548 PMC: 11614650. DOI: 10.3389/fnut.2024.1479800.


Associations of genetic variation and mRNA expression of PDGF/PDGFRB pathway genes with coronary artery disease in the Chinese population.

Wei P, Xie H, Sun J, Zhuang Q, Xie J, Yin Y J Cell Mol Med. 2024; 28(22):e70193.

PMID: 39569832 PMC: 11579943. DOI: 10.1111/jcmm.70193.


References
1.
Zucchini . An Introduction to Model Selection. J Math Psychol. 2000; 44(1):41-61. DOI: 10.1006/jmps.1999.1276. View

2.
Gauthier J, Wu Q, Gooley T . Cubic splines to model relationships between continuous variables and outcomes: a guide for clinicians. Bone Marrow Transplant. 2019; 55(4):675-680. DOI: 10.1038/s41409-019-0679-x. View

3.
Perperoglou A, Sauerbrei W, Abrahamowicz M, Schmid M . A review of spline function procedures in R. BMC Med Res Methodol. 2019; 19(1):46. PMC: 6402144. DOI: 10.1186/s12874-019-0666-3. View

4.
Sauerbrei W, Abrahamowicz M, Altman D, le Cessie S, Carpenter J . STRengthening analytical thinking for observational studies: the STRATOS initiative. Stat Med. 2014; 33(30):5413-32. PMC: 4320765. DOI: 10.1002/sim.6265. View

5.
Hu C, Yin J, Lindenberg S, Dalgar I, Weissgerber S, Vergara R . Data from the Human Penguin Project, a cross-national dataset testing social thermoregulation principles. Sci Data. 2019; 6(1):32. PMC: 6470130. DOI: 10.1038/s41597-019-0029-2. View