Dopamine Transients Encode Reward Prediction Errors Independent of Learning Rates

Overview

Journal bioRxiv

Date 2024 Apr 25

PMID 38659861

Authors

Andrew Mah

Carla E M Golden

Christine M Constantinople

Affiliations

Soon will be listed here.

Abstract

Biological accounts of reinforcement learning posit that dopamine encodes reward prediction errors (RPEs), which are multiplied by a learning rate to update state or action values. These values are thought to be represented in synaptic weights in the striatum, and updated by dopamine-dependent plasticity, suggesting that dopamine release might reflect the product of the learning rate and RPE. Here, we leveraged the fact that animals learn faster in volatile environments to characterize dopamine encoding of learning rates in the nucleus accumbens core (NAcc). We trained rats on a task with semi-observable states offering different rewards, and rats adjusted how quickly they initiated trials across states using RPEs. Computational modeling and behavioral analyses showed that learning rates were higher following state transitions, and scaled with trial-by-trial changes in beliefs about hidden states, approximating normative Bayesian strategies. Notably, dopamine release in the NAcc encoded RPEs independent of learning rates, suggesting that dopamine-independent mechanisms instantiate dynamic learning rates.

References

McGuire J, Nassar M, Gold J, Kable J . Functionally dissociable influences on learning rate in a dynamic environment. Neuron. 2014; 84(4):870-81. PMC: 4437663. DOI: 10.1016/j.neuron.2014.10.013. View

Wilson R, Nassar M, Gold J . A mixture of delta-rules approximation to bayesian inference in change-point problems. PLoS Comput Biol. 2013; 9(7):e1003150. PMC: 3723502. DOI: 10.1371/journal.pcbi.1003150. View

Olds J . Self-stimulation of the brain; its use to study local effects of hunger, sex, and drugs. Science. 1958; 127(3294):315-24. DOI: 10.1126/science.127.3294.315. View

Steinberg E, Keiflin R, Boivin J, Witten I, Deisseroth K, Janak P . A causal link between prediction errors, dopamine neurons and learning. Nat Neurosci. 2013; 16(7):966-73. PMC: 3705924. DOI: 10.1038/nn.3413. View

Schultz W, Dayan P, Montague P . A neural substrate of prediction and reward. Science. 1997; 275(5306):1593-9. DOI: 10.1126/science.275.5306.1593. View

Pearce J, Hall G . A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychol Rev. 1980; 87(6):532-52. View

Howe M, Dombeck D . Rapid signalling in distinct dopaminergic axons during locomotion and reward. Nature. 2016; 535(7613):505-10. PMC: 4970879. DOI: 10.1038/nature18942. View

Cohen J, Haesler S, Vong L, Lowell B, Uchida N . Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature. 2012; 482(7383):85-8. PMC: 3271183. DOI: 10.1038/nature10754. View

Tsai H, Zhang F, Adamantidis A, Stuber G, Bonci A, De Lecea L . Phasic firing in dopaminergic neurons is sufficient for behavioral conditioning. Science. 2009; 324(5930):1080-4. PMC: 5262197. DOI: 10.1126/science.1168878. View

10.

Kerr J, Wickens J . Dopamine D-1/D-5 receptor activation is required for long-term potentiation in the rat neostriatum in vitro. J Neurophysiol. 2001; 85(1):117-24. DOI: 10.1152/jn.2001.85.1.117. View

11.

Sharpe M, Chang C, Liu M, Batchelor H, Mueller L, Jones J . Dopamine transients are sufficient and necessary for acquisition of model-based associations. Nat Neurosci. 2017; 20(5):735-742. PMC: 5413864. DOI: 10.1038/nn.4538. View

12.

Grossman C, Bari B, Cohen J . Serotonin neurons modulate learning rate through uncertainty. Curr Biol. 2021; 32(3):586-599.e7. PMC: 8825708. DOI: 10.1016/j.cub.2021.12.006. View

13.

Brischoux F, Chakraborty S, Brierley D, Ungless M . Phasic excitation of dopamine neurons in ventral VTA by noxious stimuli. Proc Natl Acad Sci U S A. 2009; 106(12):4894-9. PMC: 2660746. DOI: 10.1073/pnas.0811507106. View

14.

Engelhard B, Finkelstein J, Cox J, Fleming W, Jang H, Ornelas S . Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons. Nature. 2019; 570(7762):509-513. PMC: 7147811. DOI: 10.1038/s41586-019-1261-9. View

15.

Lammel S, Ion D, Roeper J, Malenka R . Projection-specific modulation of dopamine neuron synapses by aversive and rewarding stimuli. Neuron. 2011; 70(5):855-62. PMC: 3112473. DOI: 10.1016/j.neuron.2011.03.025. View

16.

Iigaya K, Fonseca M, Murakami M, Mainen Z, Dayan P . An effect of serotonergic stimulation on learning rates for rewards apparent after long intertrial intervals. Nat Commun. 2018; 9(1):2477. PMC: 6018802. DOI: 10.1038/s41467-018-04840-2. View

17.

Corbett D, Wise R . Intracranial self-stimulation in relation to the ascending dopaminergic systems of the midbrain: a moveable electrode mapping study. Brain Res. 1980; 185(1):1-15. DOI: 10.1016/0006-8993(80)90666-6. View

18.

Floresco S . The nucleus accumbens: an interface between cognition, emotion, and action. Annu Rev Psychol. 2014; 66:25-52. DOI: 10.1146/annurev-psych-010213-115159. View

19.

Parker N, Cameron C, Taliaferro J, Lee J, Yoon Choi J, Davidson T . Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target. Nat Neurosci. 2016; 19(6):845-54. PMC: 4882228. DOI: 10.1038/nn.4287. View

20.

Cai L, Pizano K, Gundersen G, Hayes C, Fleming W, Holt S . Distinct signals in medial and lateral VTA dopamine neurons modulate fear extinction at different times. Elife. 2020; 9. PMC: 7363446. DOI: 10.7554/eLife.54936. View