Do We Need Flexible Machine-learning Algorithms to Assess the Effect of Long-term Exposure to Fine Particulate Matter on Mortality?: An Example from a Canadian National Cohort

Overview

Journal Environ Epidemiol

Publisher Wolters Kluwer

Specialty Environmental Health

Date 2025 Mar 6

PMID 40046729

Authors

Chen Chen

Jay S Kaufman

Juwel Rana

Tarik Benmarhnia

Hong Chen

Affiliations

Soon will be listed here.

Abstract

Background: Evidence suggests the existence of nonlinearity in the relationship between long-term fine particulate matter (PM) and mortality, and the methods to flexibly incorporate nonlinearity can be improved. To heuristically evaluate the necessity of incorporating machine-learning algorithms, we compared the benefit of reducing long-term PM on mortality estimated from three analytical methods with varying flexibility and complexity.

Methods: Using a cohort of the Canadian Community Health Survey respondents (followed from 2005 until 2014), we obtained consented respondents' baseline characteristics, time-varying annual average PM in the previous 3 years, yearly income and neighborhood characteristics, and vital status. We estimated the 10-year cumulative mortality rate under both a natural-course exposure and a hypothetical dynamic intervention, which would set the respondent's exposure to 8.8 μg/m (current Canadian annual PM standard) if higher. We compared estimates of three analytical methods and mean squared errors under a range of hypothetical true values.

Results: Among 62,365 participants, the 10-year cumulative mortality rate differences per 1000 participants were -0.23 (95% confidence intervals: -0.46, 0.00), -0.83 (-1.24, -0.43), and -0.67 (-1.27, -0.06) for parametric g-computation, targeted minimum loss-based estimator using parametric models, and targeted minimum loss-based estimator with SuperLearner and six candidate algorithms of high flexibility, respectively. Changing the hyperparameters did not meaningful change estimates or algorithm weights.

Conclusions: All three methods of reducing long-term exposure to PM yielded tangible public health benefits in Canada where PM levels are among the lowest worldwide. However, the advantage of employing machine-learning algorithms with a doubly robust estimator remains minimal, especially considering the variance-bias tradeoff.

References

McGrath S, Lin V, Zhang Z, Petito L, Logan R, Hernan M . gfoRmula: An R Package for Estimating the Effects of Sustained Treatment Strategies via the Parametric g-formula. Patterns (N Y). 2020; 1(3). PMC: 7351102. DOI: 10.1016/j.patter.2020.100008. View

Zivich P, Breskin A . Machine Learning for Causal Inference: On the Use of Cross-fit Estimators. Epidemiology. 2021; 32(3):393-401. PMC: 8012235. DOI: 10.1097/EDE.0000000000001332. View

Chen J, Hoek G . Long-term exposure to PM and all-cause and cause-specific mortality: A systematic review and meta-analysis. Environ Int. 2020; 143:105974. DOI: 10.1016/j.envint.2020.105974. View

Moccia C, Moirano G, Popovic M, Pizzi C, Fariselli P, Richiardi L . Machine learning in causal inference for epidemiology. Eur J Epidemiol. 2024; 39(10):1097-1108. PMC: 11599438. DOI: 10.1007/s10654-024-01173-x. View

Kaufman J, MacLehose R . Which of these things is not like the others?. Cancer. 2013; 119(24):4216-22. PMC: 4026206. DOI: 10.1002/cncr.28359. View

Neophytou A, Costello S, Picciotto S, Noth E, Liu S, Lutzker L . Accelerated lung function decline in an aluminium manufacturing industry cohort exposed to PM: an application of the parametric g-formula. Occup Environ Med. 2019; 76(12):888-894. PMC: 7771835. DOI: 10.1136/oemed-2019-105908. View

Diaz I, Hoffman K, Hejazi N . Causal survival analysis under competing risks using longitudinal modified treatment policies. Lifetime Data Anal. 2023; 30(1):213-236. DOI: 10.1007/s10985-023-09606-7. View

Phillips R, van der Laan M, Lee H, Gruber S . Practical considerations for specifying a super learner. Int J Epidemiol. 2023; 52(4):1276-1285. DOI: 10.1093/ije/dyad023. View

Pappin A, Christidis T, Pinault L, Crouse D, Brook J, Erickson A . Examining the Shape of the Association between Low Levels of Fine Particulate Matter and Mortality across Three Cycles of the Canadian Census Health and Environment Cohort. Environ Health Perspect. 2019; 127(10):107008. PMC: 6867181. DOI: 10.1289/EHP5204. View

10.

Friedman J, Hastie T, Tibshirani R . Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 2010; 33(1):1-22. PMC: 2929880. View

11.

Naimi A, Balzer L . Stacked generalization: an introduction to super learning. Eur J Epidemiol. 2018; 33(5):459-464. PMC: 6089257. DOI: 10.1007/s10654-018-0390-z. View

12.

Nasari M, Szyszkowicz M, Chen H, Crouse D, Turner M, Jerrett M . A class of non-linear exposure-response models suitable for health impact assessment applicable to large cohort studies of ambient air pollution. Air Qual Atmos Health. 2016; 9(8):961-972. PMC: 5093184. DOI: 10.1007/s11869-016-0398-z. View

13.

Chen C, Chen H, van Donkelaar A, Burnett R, Martin R, Chen L . Using Parametric g-Computation to Estimate the Effect of Long-Term Exposure to Air Pollution on Mortality Risk and Simulate the Benefits of Hypothetical Policies: The Canadian Community Health Survey Cohort (2005 to 2015). Environ Health Perspect. 2023; 131(3):37010. PMC: 10016347. DOI: 10.1289/EHP11095. View