» Articles » PMID: 18058845

Selection of Important Variables and Determination of Functional Form for Continuous Predictors in Multivariable Model Building

Overview
Journal Stat Med
Publisher Wiley
Specialty Public Health
Date 2007 Dec 7
PMID 18058845
Citations 543
Authors
Affiliations
Soon will be listed here.
Abstract

In developing regression models, data analysts are often faced with many predictor variables that may influence an outcome variable. After more than half a century of research, the 'best' way of selecting a multivariable model is still unresolved. It is generally agreed that subject matter knowledge, when available, should guide model building. However, such knowledge is often limited, and data-dependent model building is required. We limit the scope of the modelling exercise to selecting important predictors and choosing interpretable and transportable functions for continuous predictors. Assuming linear functions, stepwise selection and all-subset strategies are discussed; the key tuning parameters are the nominal P-value for testing a variable for inclusion and the penalty for model complexity, respectively. We argue that stepwise procedures perform better than a literature-based assessment would suggest. Concerning selection of functional form for continuous predictors, the principal competitors are fractional polynomial functions and various types of spline techniques. We note that a rigorous selection strategy known as multivariable fractional polynomials (MFP) has been developed. No spline-based procedure for simultaneously selecting variables and functional forms has found wide acceptance. Results of FP and spline modelling are compared in two data sets. It is shown that spline modelling, while extremely flexible, can generate fitted curves with uninterpretable 'wiggles', particularly when automatic methods for choosing the smoothness are employed. We give general recommendations to practitioners for carrying out variable and function selection. While acknowledging that further research is needed, we argue why MFP is our preferred approach for multivariable model building with continuous covariates.

Citing Articles

Development and validation of a nomogram for predicting the outcome of metabolic syndrome among people living with HIV after antiretroviral therapy in China.

Jin Y, Zhu J, Chen Q, Wang M, Shen Z, Dong Y Front Cell Infect Microbiol. 2025; 15:1514823.

PMID: 40051708 PMC: 11882517. DOI: 10.3389/fcimb.2025.1514823.


Prediction of risk for isolated incomplete lateral meniscal injury using a dynamic nomogram based on MRI-derived anatomic radiomics and physical activity: a proof-of-concept study in 3PM-guided management.

Xie C, Chen J, Chen H, Zuo Z, Li Y, Lin L EPMA J. 2025; 16(1):199-215.

PMID: 39991097 PMC: 11842652. DOI: 10.1007/s13167-025-00399-3.


A Risk Warning Model for Anemia Based on Facial Visible Light Reflectance Spectroscopy: Cross-Sectional Study.

Zhang Y, Chun Y, Fu H, Jiao W, Bao J, Jiang T JMIR Med Inform. 2025; 13:e64204.

PMID: 39952235 PMC: 11845237. DOI: 10.2196/64204.


Factors contributing to perioperative blood transfusion during total hip arthroplasty in patients continuing preoperative aspirin treatment: a nomogram prediction model.

Hong D, Zhu Q, Chen W, Chaudhary M, Hong R, Zhang L BMC Musculoskelet Disord. 2025; 26(1):138.

PMID: 39934755 PMC: 11817545. DOI: 10.1186/s12891-025-08399-0.


MRI-based intratumoral and peritumoral radiomics for assessing deep myometrial invasion in patients with early-stage endometrioid adenocarcinoma.

Yang J, Liu Y, Liu X, Wang Y, Wang X, Ai C Front Oncol. 2025; 14:1474427.

PMID: 39882442 PMC: 11774896. DOI: 10.3389/fonc.2024.1474427.