» Articles » PMID: 39841828

Distributed Representations of Temporally Accumulated Reward Prediction Errors in the Mouse Cortex

Overview
Journal Sci Adv
Date 2025 Jan 22
PMID 39841828
Authors
Affiliations
Soon will be listed here.
Abstract

Reward prediction errors (RPEs) quantify the difference between expected and actual rewards, serving to refine future actions. Although reinforcement learning (RL) provides ample theoretical evidence suggesting that the long-term accumulation of these error signals improves learning efficiency, it remains unclear whether the brain uses similar mechanisms. To explore this, we constructed RL-based theoretical models and used multiregional two-photon calcium imaging in the mouse dorsal cortex. We identified a population of neurons whose activity was modulated by varying degrees of RPE accumulation. Consequently, RPE-encoding neurons were sequentially activated within each trial, forming a distributed assembly. RPE representations in mice aligned with theoretical predictions of RL, emerging during learning and being subject to manipulations of the reward function. Interareal comparisons revealed a region-specific code, with higher-order cortical regions exhibiting long-term encoding of RPE accumulation. These results present an additional layer of complexity in cortical RPE computation, potentially augmenting learning efficiency in animals.

References
1.
Vann S, Aggleton J, Maguire E . What does the retrosplenial cortex do?. Nat Rev Neurosci. 2009; 10(11):792-802. DOI: 10.1038/nrn2733. View

2.
Eshel N, Bukwich M, Rao V, Hemmelder V, Tian J, Uchida N . Arithmetic and local circuitry underlying dopamine prediction errors. Nature. 2015; 525(7568):243-6. PMC: 4567485. DOI: 10.1038/nature14855. View

3.
Engelhard B, Finkelstein J, Cox J, Fleming W, Jang H, Ornelas S . Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons. Nature. 2019; 570(7762):509-513. PMC: 7147811. DOI: 10.1038/s41586-019-1261-9. View

4.
Lee D, Seo H, Jung M . Neural basis of reinforcement learning and decision making. Annu Rev Neurosci. 2012; 35:287-308. PMC: 3490621. DOI: 10.1146/annurev-neuro-062111-150512. View

5.
Ottenheimer D, Bari B, Sutlief E, Fraser K, Kim T, Richard J . A quantitative reward prediction error signal in the ventral pallidum. Nat Neurosci. 2020; 23(10):1267-1276. PMC: 7870109. DOI: 10.1038/s41593-020-0688-5. View