Distributed Representations of Temporally Accumulated Reward Prediction Errors in the Mouse Cortex

Overview

Journal Sci Adv

Date 2025 Jan 22

PMID 39841828

Authors

Hiroshi Makino

Ahmad Suhaimi

Affiliations

Soon will be listed here.

Abstract

Reward prediction errors (RPEs) quantify the difference between expected and actual rewards, serving to refine future actions. Although reinforcement learning (RL) provides ample theoretical evidence suggesting that the long-term accumulation of these error signals improves learning efficiency, it remains unclear whether the brain uses similar mechanisms. To explore this, we constructed RL-based theoretical models and used multiregional two-photon calcium imaging in the mouse dorsal cortex. We identified a population of neurons whose activity was modulated by varying degrees of RPE accumulation. Consequently, RPE-encoding neurons were sequentially activated within each trial, forming a distributed assembly. RPE representations in mice aligned with theoretical predictions of RL, emerging during learning and being subject to manipulations of the reward function. Interareal comparisons revealed a region-specific code, with higher-order cortical regions exhibiting long-term encoding of RPE accumulation. These results present an additional layer of complexity in cortical RPE computation, potentially augmenting learning efficiency in animals.

References

Vann S, Aggleton J, Maguire E . What does the retrosplenial cortex do?. Nat Rev Neurosci. 2009; 10(11):792-802. DOI: 10.1038/nrn2733. View

Eshel N, Bukwich M, Rao V, Hemmelder V, Tian J, Uchida N . Arithmetic and local circuitry underlying dopamine prediction errors. Nature. 2015; 525(7568):243-6. PMC: 4567485. DOI: 10.1038/nature14855. View

Engelhard B, Finkelstein J, Cox J, Fleming W, Jang H, Ornelas S . Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons. Nature. 2019; 570(7762):509-513. PMC: 7147811. DOI: 10.1038/s41586-019-1261-9. View

Lee D, Seo H, Jung M . Neural basis of reinforcement learning and decision making. Annu Rev Neurosci. 2012; 35:287-308. PMC: 3490621. DOI: 10.1146/annurev-neuro-062111-150512. View

Ottenheimer D, Bari B, Sutlief E, Fraser K, Kim T, Richard J . A quantitative reward prediction error signal in the ventral pallidum. Nat Neurosci. 2020; 23(10):1267-1276. PMC: 7870109. DOI: 10.1038/s41593-020-0688-5. View

Ranganath C, Ritchey M . Two cortical systems for memory-guided behaviour. Nat Rev Neurosci. 2012; 13(10):713-26. DOI: 10.1038/nrn3338. View

Hattori R, Danskin B, Babic Z, Mlynaryk N, Komiyama T . Area-Specificity and Plasticity of History-Dependent Value Coding During Learning. Cell. 2019; 177(7):1858-1872.e15. PMC: 6663310. DOI: 10.1016/j.cell.2019.04.027. View

Cembrowski M, Phillips M, DiLisio S, Shields B, Winnubst J, Chandrashekar J . Dissociable Structural and Functional Hippocampal Outputs via Distinct Subiculum Cell Classes. Cell. 2018; 173(5):1280-1292.e18. DOI: 10.1016/j.cell.2018.03.031. View

Park I, Meister M, Huk A, Pillow J . Encoding and decoding in parietal cortex during sensorimotor decision-making. Nat Neurosci. 2014; 17(10):1395-403. PMC: 4176983. DOI: 10.1038/nn.3800. View

10.

Suhaimi A, Lim A, Chia X, Li C, Makino H . Representation learning in the artificial and biological neural networks underlying sensorimotor integration. Sci Adv. 2022; 8(22):eabn0984. PMC: 9166289. DOI: 10.1126/sciadv.abn0984. View

11.

Hardcastle K, Maheswaranathan N, Ganguli S, Giocomo L . A Multiplexed, Heterogeneous, and Adaptive Code for Navigation in Medial Entorhinal Cortex. Neuron. 2017; 94(2):375-387.e7. PMC: 5498174. DOI: 10.1016/j.neuron.2017.03.025. View

12.

Schultz W, Dayan P, Montague P . A neural substrate of prediction and reward. Science. 1997; 275(5306):1593-9. DOI: 10.1126/science.275.5306.1593. View

13.

Driscoll L, Pettit N, Minderer M, Chettih S, Harvey C . Dynamic Reorganization of Neuronal Activity Patterns in Parietal Cortex. Cell. 2017; 170(5):986-999.e16. PMC: 5718200. DOI: 10.1016/j.cell.2017.07.021. View

14.

Benjamin A, Fernandes H, Tomlinson T, Ramkumar P, Versteeg C, Chowdhury R . Modern Machine Learning as a Benchmark for Fitting Neural Responses. Front Comput Neurosci. 2018; 12:56. PMC: 6060269. DOI: 10.3389/fncom.2018.00056. View

15.

Watabe-Uchida M, Eshel N, Uchida N . Neural Circuitry of Reward Prediction Error. Annu Rev Neurosci. 2017; 40:373-394. PMC: 6721851. DOI: 10.1146/annurev-neuro-072116-031109. View

16.

Takahashi Y, Schoenbaum G, Niv Y . Silencing the critics: understanding the effects of cocaine sensitization on dorsolateral and ventral striatum in the context of an actor/critic model. Front Neurosci. 2008; 2(1):86-99. PMC: 2570074. DOI: 10.3389/neuro.01.014.2008. View

17.

Amo R, Matias S, Yamanaka A, Tanaka K, Uchida N, Watabe-Uchida M . A gradual temporal shift of dopamine responses mirrors the progression of temporal difference error in machine learning. Nat Neurosci. 2022; 25(8):1082-1092. PMC: 9624460. DOI: 10.1038/s41593-022-01109-2. View

18.

Izawa J, Shadmehr R . Learning from sensory and reward prediction errors during motor adaptation. PLoS Comput Biol. 2011; 7(3):e1002012. PMC: 3053313. DOI: 10.1371/journal.pcbi.1002012. View

19.

Glimcher P . Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis. Proc Natl Acad Sci U S A. 2011; 108 Suppl 3:15647-54. PMC: 3176615. DOI: 10.1073/pnas.1014269108. View

20.

Cohen J, Haesler S, Vong L, Lowell B, Uchida N . Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature. 2012; 482(7383):85-8. PMC: 3271183. DOI: 10.1038/nature10754. View