R$^{2}$s for Correlated Data: Phylogenetic Models, LMMs, and GLMMs
Overview
Authors
Affiliations
Many researchers want to report an $R^{2}$ to measure the variance explained by a model. When the model includes correlation among data, such as phylogenetic models and mixed models, defining an $R^{2}$ faces two conceptual problems. (i) It is unclear how to measure the variance explained by predictor (independent) variables when the model contains covariances. (ii) Researchers may want the $R^{2}$ to include the variance explained by the covariances by asking questions such as "How much of the data is explained by phylogeny?" Here, I investigated three $R^{2}$s for phylogenetic and mixed models. $R^{2}_{resid}$ is an extension of the ordinary least-squares $R^{2}$ that weights residuals by variances and covariances estimated by the model; it is closely related to $R^{2}_{glmm}$ presented by Nakagawa and Schielzeth (2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods Ecol. Evol. 4:133-142). $R^{2}_{pred}$ is based on predicting each residual from the fitted model and computing the variance between observed and predicted values. $R^{2}_{lik}$ is based on the likelihood of fitted models, and therefore, reflects the amount of information that the models contain. These three $R^{2}$s are formulated as partial $R^{2}$s, making it possible to compare the contributions of predictor variables and variance components (phylogenetic signal and random effects) to the fit of models. Because partial $R^{2}$s compare a full model with a reduced model without components of the full model, they are distinct from marginal $R^{2}$s that partition additive components of the variance. I assessed the properties of the $R^{2}$s for phylogenetic models using simulations for continuous and binary response data (phylogenetic generalized least squares and phylogenetic logistic regression). Because the $R^{2}$s are designed broadly for any model for correlated data, I also compared $R^{2}$s for linear mixed models and generalized linear mixed models. $R^{2}_{resid}$, $R^{2}_{pred}$, and $R^{2}_{lik}$ all have similar performance in describing the variance explained by different components of models. However, $R^{2}_{pred}$ gives the most direct answer to the question of how much variance in the data is explained by a model. $R^{2}_{resid}$ is most appropriate for comparing models fit to different data sets, because it does not depend on sample sizes. And $R^{2}_{lik}$ is most appropriate to assess the importance of different components within the same model applied to the same data, because it is most closely associated with statistical significance tests.
Negative global-scale association between genetic diversity and speciation rates in mammals.
Afonso Silva A, Maliet O, Aristide L, Nogues-Bravo D, Upham N, Jetz W Nat Commun. 2025; 16(1):1796.
PMID: 39979262 PMC: 11842793. DOI: 10.1038/s41467-025-56820-y.
Metabolic rate of angiosperm seeds: effects of allometry, phylogeny and bioclimate.
Dalziell E, Tomlinson S, Merritt D, Lewandrowski W, Turner S, Withers P Proc Biol Sci. 2025; 292(2041):20242683.
PMID: 39968610 PMC: 11836704. DOI: 10.1098/rspb.2024.2683.
Does metabolic rate influence genome-wide amino acid composition in the course of animal evolution?.
Wang W, Zhang D Evol Lett. 2025; 9(1):137-149.
PMID: 39906584 PMC: 11790228. DOI: 10.1093/evlett/qrae061.
A phylogenetic approach to comparative genomics.
Dewar A, Belcher L, West S Nat Rev Genet. 2025; .
PMID: 39779997 PMC: 7617348. DOI: 10.1038/s41576-024-00803-0.
Hermanson G, Evers S Ecol Evol. 2024; 14(11):e70504.
PMID: 39539674 PMC: 11557996. DOI: 10.1002/ece3.70504.