What is Dopamine Doing in Model-based Reinforcement Learning?

Overview

Journal Curr Opin Behav Sci

Date 2023 Apr 21

PMID 37082448

Authors

Thomas Akam

Mark E Walton

Affiliations

Soon will be listed here.

Abstract

Experiments have implicated dopamine in model-based reinforcement learning (RL). These findings are unexpected as dopamine is thought to encode a reward prediction error (RPE), which is the key teaching signal in model-free RL. Here we examine two possible accounts for dopamine's involvement in model-based RL: the first that dopamine neurons carry a prediction error used to update a type of predictive state representation called a successor representation, the second that two well established aspects of dopaminergic activity, RPEs and surprise signals, can together explain dopamine's involvement in model-based RL.

Citing Articles

Devaluing memories of reward: a case for dopamine.

Fry B, Russell N, Fex V, Mo B, Pence N, Beatty J Commun Biol. 2025; 8(1):161.

PMID: 39900665 PMC: 11790953. DOI: 10.1038/s42003-024-07440-7.

The curious case of dopaminergic prediction errors and learning associative information beyond value.

Kahnt T, Schoenbaum G Nat Rev Neurosci. 2025; 26(3):169-178.

PMID: 39779974 DOI: 10.1038/s41583-024-00898-8.

Biomarker discovery using machine learning in the psychosis spectrum.

Yassin W, Loedige K, Wannan C, Holton K, Chevinsky J, Torous J Biomark Neuropsychiatry. 2024; 11.

PMID: 39687745 PMC: 11649307. DOI: 10.1016/j.bionps.2024.100107.

Dopamine Release in the Nucleus Accumbens Core Encodes the General Excitatory Components of Learning.

Taira M, Millard S, Verghese A, DiFazio L, Hoang I, Jia R J Neurosci. 2024; 44(35).

PMID: 38969504 PMC: 11358529. DOI: 10.1523/JNEUROSCI.0120-24.2024.

Dopamine Increases Accuracy and Lengthens Deliberation Time in Explicit Motor Skill Learning.

Leow L, Bernheine L, Carroll T, Dux P, Filmer H eNeuro. 2024; 11(1).

PMID: 38238069 PMC: 10849023. DOI: 10.1523/ENEURO.0360-23.2023.

References

Howe M, Dombeck D . Rapid signalling in distinct dopaminergic axons during locomotion and reward. Nature. 2016; 535(7613):505-10. PMC: 4970879. DOI: 10.1038/nature18942. View

Takahashi Y, Batchelor H, Liu B, Khanna A, Morales M, Schoenbaum G . Dopamine Neurons Respond to Errors in the Prediction of Sensory Features of Expected Rewards. Neuron. 2017; 95(6):1395-1405.e3. PMC: 5658021. DOI: 10.1016/j.neuron.2017.08.025. View

Deserno L, Huys Q, Boehme R, Buchert R, Heinze H, Grace A . Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making. Proc Natl Acad Sci U S A. 2015; 112(5):1595-600. PMC: 4321318. DOI: 10.1073/pnas.1417219112. View

Starkweather C, Babayan B, Uchida N, Gershman S . Dopamine reward prediction errors reflect hidden-state inference across time. Nat Neurosci. 2017; 20(4):581-589. PMC: 5374025. DOI: 10.1038/nn.4520. View

Lerner T, Shilyansky C, Davidson T, Evans K, Beier K, Zalocusky K . Intact-Brain Analyses Reveal Distinct Information Carried by SNc Dopamine Subcircuits. Cell. 2015; 162(3):635-47. PMC: 4790813. DOI: 10.1016/j.cell.2015.07.014. View

Langdon A, Sharpe M, Schoenbaum G, Niv Y . Model-based predictions for dopamine. Curr Opin Neurobiol. 2017; 49:1-7. PMC: 6034703. DOI: 10.1016/j.conb.2017.10.006. View

Buzsaki G . Hippocampal sharp wave-ripple: A cognitive biomarker for episodic memory and planning. Hippocampus. 2015; 25(10):1073-188. PMC: 4648295. DOI: 10.1002/hipo.22488. View

Stalnaker T, Howard J, Takahashi Y, Gershman S, Kahnt T, Schoenbaum G . Dopamine neuron ensembles signal the content of sensory prediction errors. Elife. 2019; 8. PMC: 6839916. DOI: 10.7554/eLife.49315. View

Knudsen E, Wallis J . Closed-Loop Theta Stimulation in the Orbitofrontal Cortex Prevents Reward-Based Learning. Neuron. 2020; 106(3):537-547.e4. PMC: 7480400. DOI: 10.1016/j.neuron.2020.02.003. View

10.

Lisman J, Grace A . The hippocampal-VTA loop: controlling the entry of information into long-term memory. Neuron. 2005; 46(5):703-13. DOI: 10.1016/j.neuron.2005.05.002. View

11.

Gomperts S, Kloosterman F, Wilson M . VTA neurons coordinate with the hippocampal reactivation of spatial experience. Elife. 2015; 4. PMC: 4695386. DOI: 10.7554/eLife.05360. View

12.

Eldar E, Lievre G, Dayan P, Dolan R . The roles of online and offline replay in planning. Elife. 2020; 9. PMC: 7299337. DOI: 10.7554/eLife.56911. View

13.

Momennejad I, Russek E, Cheong J, Botvinick M, Daw N, Gershman S . The successor representation in human reinforcement learning. Nat Hum Behav. 2019; 1(9):680-692. PMC: 6941356. DOI: 10.1038/s41562-017-0180-8. View

14.

Man Kim K, Baratta M, Yang A, Lee D, Boyden E, Fiorillo C . Optogenetic mimicry of the transient activation of dopamine neurons by natural reward is sufficient for operant reinforcement. PLoS One. 2012; 7(4):e33612. PMC: 3323614. DOI: 10.1371/journal.pone.0033612. View

15.

Costa V, Tran V, Turchi J, Averbeck B . Dopamine modulates novelty seeking behavior during decision making. Behav Neurosci. 2014; 128(5):556-66. PMC: 5861725. DOI: 10.1037/a0037128. View

16.

Dodson P, Dreyer J, Jennings K, Syed E, Wade-Martins R, Cragg S . Representation of spontaneous movement by dopaminergic neurons is cell-type selective and disrupted in parkinsonism. Proc Natl Acad Sci U S A. 2016; 113(15):E2180-8. PMC: 4839395. DOI: 10.1073/pnas.1515941113. View

17.

Sadacca B, Jones J, Schoenbaum G . Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework. Elife. 2016; 5. PMC: 4805544. DOI: 10.7554/eLife.13665. View

18.

Watabe-Uchida M, Zhu L, Ogawa S, Vamanrao A, Uchida N . Whole-brain mapping of direct inputs to midbrain dopamine neurons. Neuron. 2012; 74(5):858-73. DOI: 10.1016/j.neuron.2012.03.017. View

19.

Popescu A, Zhou M, Poo M . Phasic dopamine release in the medial prefrontal cortex enhances stimulus discrimination. Proc Natl Acad Sci U S A. 2016; 113(22):E3169-76. PMC: 4896676. DOI: 10.1073/pnas.1606098113. View

20.

Tsai H, Zhang F, Adamantidis A, Stuber G, Bonci A, De Lecea L . Phasic firing in dopaminergic neurons is sufficient for behavioral conditioning. Science. 2009; 324(5930):1080-4. PMC: 5262197. DOI: 10.1126/science.1168878. View