» Articles » PMID: 39117997

Regression Without Regrets -initial Data Analysis is a Prerequisite for Multivariable Regression

Overview
Publisher Biomed Central
Date 2024 Aug 8
PMID 39117997
Authors
Affiliations
Soon will be listed here.
Abstract

Statistical regression models are used for predicting outcomes based on the values of some predictor variables or for describing the association of an outcome with predictors. With a data set at hand, a regression model can be easily fit with standard software packages. This bears the risk that data analysts may rush to perform sophisticated analyses without sufficient knowledge of basic properties, associations in and errors of their data, leading to wrong interpretation and presentation of the modeling results that lacks clarity. Ignorance about special features of the data such as redundancies or particular distributions may even invalidate the chosen analysis strategy. Initial data analysis (IDA) is prerequisite to regression analyses as it provides knowledge about the data needed to confirm the appropriateness of or to refine a chosen model building strategy, to interpret the modeling results correctly, and to guide the presentation of modeling results. In order to facilitate reproducibility, IDA needs to be preplanned, an IDA plan should be included in the general statistical analysis plan of a research project, and results should be well documented. Biased statistical inference of the final regression model can be minimized if IDA abstains from evaluating associations of outcome and predictors, a key principle of IDA. We give advice on which aspects to consider in an IDA plan for data screening in the context of regression modeling to supplement the statistical analysis plan. We illustrate this IDA plan for data screening in an example of a typical diagnostic modeling project and give recommendations for data visualizations.

Citing Articles

Combined association of physical activity and depressive symptoms with cardiometabolic risk factors in Chilean adults.

Ferrero-Hernandez P, Farias-Valenzuela C, Rezende L, de Maio Nascimento M, Marques A, de Victo E Sci Rep. 2024; 14(1):31100.

PMID: 39730815 PMC: 11680827. DOI: 10.1038/s41598-024-82396-6.


Evaluating variable selection methods for multivariable regression models: A simulation study protocol.

Ullmann T, Heinze G, Hafermann L, Schilhart-Wallisch C, Dunkler D PLoS One. 2024; 19(8):e0308543.

PMID: 39121055 PMC: 11315300. DOI: 10.1371/journal.pone.0308543.


Initial data analysis for longitudinal studies to build a solid foundation for reproducible analysis.

Lusa L, Proust-Lima C, Schmidt C, Lee K, le Cessie S, Baillie M PLoS One. 2024; 19(5):e0295726.

PMID: 38809844 PMC: 11135704. DOI: 10.1371/journal.pone.0295726.

References
1.
Sauerbrei W, Perperoglou A, Schmid M, Abrahamowicz M, Becher H, Binder H . State of the art in selection of variables and functional forms in multivariable analysis-outstanding issues. Diagn Progn Res. 2020; 4:3. PMC: 7114804. DOI: 10.1186/s41512-020-00074-3. View

2.
Glasziou P, Altman D, Bossuyt P, Boutron I, Clarke M, Julious S . Reducing waste from incomplete or unusable reports of biomedical research. Lancet. 2014; 383(9913):267-76. DOI: 10.1016/S0140-6736(13)62228-X. View

3.
Sauerbrei W, Haeussler T, Balmford J, Huebner M . Structured reporting to improve transparency of analyses in prognostic marker studies. BMC Med. 2022; 20(1):184. PMC: 9095054. DOI: 10.1186/s12916-022-02304-5. View

4.
Wicherts J, Veldkamp C, Augusteijn H, Bakker M, van Aert R, van Assen M . Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid -Hacking. Front Psychol. 2016; 7:1832. PMC: 5122713. DOI: 10.3389/fpsyg.2016.01832. View

5.
Lusa L, Proust-Lima C, Schmidt C, Lee K, le Cessie S, Baillie M . Initial data analysis for longitudinal studies to build a solid foundation for reproducible analysis. PLoS One. 2024; 19(5):e0295726. PMC: 11135704. DOI: 10.1371/journal.pone.0295726. View