» Articles » PMID: 40046729

Do We Need Flexible Machine-learning Algorithms to Assess the Effect of Long-term Exposure to Fine Particulate Matter on Mortality?: An Example from a Canadian National Cohort

Overview
Publisher Wolters Kluwer
Date 2025 Mar 6
PMID 40046729
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Evidence suggests the existence of nonlinearity in the relationship between long-term fine particulate matter (PM) and mortality, and the methods to flexibly incorporate nonlinearity can be improved. To heuristically evaluate the necessity of incorporating machine-learning algorithms, we compared the benefit of reducing long-term PM on mortality estimated from three analytical methods with varying flexibility and complexity.

Methods: Using a cohort of the Canadian Community Health Survey respondents (followed from 2005 until 2014), we obtained consented respondents' baseline characteristics, time-varying annual average PM in the previous 3 years, yearly income and neighborhood characteristics, and vital status. We estimated the 10-year cumulative mortality rate under both a natural-course exposure and a hypothetical dynamic intervention, which would set the respondent's exposure to 8.8 μg/m (current Canadian annual PM standard) if higher. We compared estimates of three analytical methods and mean squared errors under a range of hypothetical true values.

Results: Among 62,365 participants, the 10-year cumulative mortality rate differences per 1000 participants were -0.23 (95% confidence intervals: -0.46, 0.00), -0.83 (-1.24, -0.43), and -0.67 (-1.27, -0.06) for parametric g-computation, targeted minimum loss-based estimator using parametric models, and targeted minimum loss-based estimator with SuperLearner and six candidate algorithms of high flexibility, respectively. Changing the hyperparameters did not meaningful change estimates or algorithm weights.

Conclusions: All three methods of reducing long-term exposure to PM yielded tangible public health benefits in Canada where PM levels are among the lowest worldwide. However, the advantage of employing machine-learning algorithms with a doubly robust estimator remains minimal, especially considering the variance-bias tradeoff.

References
1.
McGrath S, Lin V, Zhang Z, Petito L, Logan R, Hernan M . gfoRmula: An R Package for Estimating the Effects of Sustained Treatment Strategies via the Parametric g-formula. Patterns (N Y). 2020; 1(3). PMC: 7351102. DOI: 10.1016/j.patter.2020.100008. View

2.
Zivich P, Breskin A . Machine Learning for Causal Inference: On the Use of Cross-fit Estimators. Epidemiology. 2021; 32(3):393-401. PMC: 8012235. DOI: 10.1097/EDE.0000000000001332. View

3.
Chen J, Hoek G . Long-term exposure to PM and all-cause and cause-specific mortality: A systematic review and meta-analysis. Environ Int. 2020; 143:105974. DOI: 10.1016/j.envint.2020.105974. View

4.
Moccia C, Moirano G, Popovic M, Pizzi C, Fariselli P, Richiardi L . Machine learning in causal inference for epidemiology. Eur J Epidemiol. 2024; 39(10):1097-1108. PMC: 11599438. DOI: 10.1007/s10654-024-01173-x. View

5.
Kaufman J, MacLehose R . Which of these things is not like the others?. Cancer. 2013; 119(24):4216-22. PMC: 4026206. DOI: 10.1002/cncr.28359. View