Working Memory Load Strengthens Reward Prediction Errors

Overview

Journal J Neurosci

Specialty Neurology

Date 2017 Mar 22

PMID 28320846

Citations 44

Authors

Anne G E Collins

Brittany Ciullo

Michael J Frank

David Badre

Affiliations

Soon will be listed here.

Abstract

Reinforcement learning (RL) in simple instrumental tasks is usually modeled as a monolithic process in which reward prediction errors (RPEs) are used to update expected values of choice options. This modeling ignores the different contributions of different memory and decision-making systems thought to contribute even to simple learning. In an fMRI experiment, we investigated how working memory (WM) and incremental RL processes interact to guide human learning. WM load was manipulated by varying the number of stimuli to be learned across blocks. Behavioral results and computational modeling confirmed that learning was best explained as a mixture of two mechanisms: a fast, capacity-limited, and delay-sensitive WM process together with slower RL. Model-based analysis of fMRI data showed that striatum and lateral prefrontal cortex were sensitive to RPE, as shown previously, but, critically, these signals were reduced when the learning problem was within capacity of WM. The degree of this neural interaction related to individual differences in the use of WM to guide behavioral learning. These results indicate that the two systems do not process information independently, but rather interact during learning. Reinforcement learning (RL) theory has been remarkably productive at improving our understanding of instrumental learning as well as dopaminergic and striatal network function across many mammalian species. However, this neural network is only one contributor to human learning and other mechanisms such as prefrontal cortex working memory also play a key role. Our results also show that these other players interact with the dopaminergic RL system, interfering with its key computation of reward prediction errors.

Citing Articles

Policy Complexity Suppresses Dopamine Responses.

Gershman S, Lak A J Neurosci. 2025; 45(9).

PMID: 39788740 PMC: 11866995. DOI: 10.1523/JNEUROSCI.1756-24.2024.

Working memory gating in obesity is moderated by striatal dopaminergic gene variants.

Herzog N, Hartmann H, Janssen L, Kanyamibwa A, Waltmann M, Kovacs P Elife. 2024; 13.

PMID: 39431987 PMC: 11493406. DOI: 10.7554/eLife.93369.

Neural and Computational Mechanisms of Motivation and Decision-making.

Yee D J Cogn Neurosci. 2024; 36(12):2822-2830.

PMID: 39378176 PMC: 11602011. DOI: 10.1162/jocn_a_02258.

Policy complexity suppresses dopamine responses.

Gershman S, Lak A bioRxiv. 2024; .

PMID: 39345642 PMC: 11429712. DOI: 10.1101/2024.09.15.613150.

Altered learning from positive feedback in adolescents with anorexia nervosa.

Uniacke B, van den Bos W, Wonderlich J, Ojeda J, Posner J, Steinglass J J Int Neuropsychol Soc. 2024; 30(7):651-659.

PMID: 39291440 PMC: 11773347. DOI: 10.1017/S1355617724000237.

References

Poldrack R, Clark J, Shohamy D, Creso Moyano J, Myers C, Gluck M . Interactive memory systems in the human brain. Nature. 2001; 414(6863):546-50. DOI: 10.1038/35107080. View

Schultz W . Getting formal with dopamine and reward. Neuron. 2002; 36(2):241-63. DOI: 10.1016/s0896-6273(02)00967-4. View

Frank M, Seeberger L, OReilly R . By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science. 2004; 306(5703):1940-3. DOI: 10.1126/science.1102941. View

Ranganath C, Blumenfeld R . Doubts about double dissociations between short- and long-term memory. Trends Cogn Sci. 2005; 9(8):374-80. DOI: 10.1016/j.tics.2005.06.009. View

Daw N, Doya K . The computational neurobiology of learning and reward. Curr Opin Neurobiol. 2006; 16(2):199-204. DOI: 10.1016/j.conb.2006.03.006. View

Pessiglione M, Seymour B, Flandin G, Dolan R, Frith C . Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature. 2006; 442(7106):1042-5. PMC: 2636869. DOI: 10.1038/nature05051. View

Bjorklund A, Dunnett S . Dopamine neuron systems in the brain: an update. Trends Neurosci. 2007; 30(5):194-202. DOI: 10.1016/j.tins.2007.03.006. View

Badre D, DEsposito M . Functional magnetic resonance imaging evidence for a hierarchical organization of the prefrontal cortex. J Cogn Neurosci. 2007; 19(12):2082-99. DOI: 10.1162/jocn.2007.19.12.2082. View

Schonberg T, Daw N, Joel D, ODoherty J . Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making. J Neurosci. 2007; 27(47):12860-7. PMC: 6673291. DOI: 10.1523/JNEUROSCI.2496-07.2007. View

10.

Botvinick M, Niv Y, Barto A . Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition. 2008; 113(3):262-280. PMC: 2783353. DOI: 10.1016/j.cognition.2008.08.011. View

11.

Jocham G, Klein T, Ullsperger M . Dopamine-mediated reinforcement learning signals in the striatum and ventromedial prefrontal cortex underlie value-based choices. J Neurosci. 2011; 31(5):1606-13. PMC: 6623749. DOI: 10.1523/JNEUROSCI.3904-10.2011. View

12.

Daw N, Gershman S, Seymour B, Dayan P, Dolan R . Model-based influences on humans' choices and striatal prediction errors. Neuron. 2011; 69(6):1204-15. PMC: 3077926. DOI: 10.1016/j.neuron.2011.02.027. View

13.

Yeo B, Krienen F, Sepulcre J, Sabuncu M, Lashkari D, Hollinshead M . The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J Neurophysiol. 2011; 106(3):1125-65. PMC: 3174820. DOI: 10.1152/jn.00338.2011. View

14.

Frank M, Badre D . Mechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: computational analysis. Cereb Cortex. 2011; 22(3):509-26. PMC: 3278315. DOI: 10.1093/cercor/bhr114. View

15.

Badre D, Frank M . Mechanisms of hierarchical reinforcement learning in cortico-striatal circuits 2: evidence from fMRI. Cereb Cortex. 2011; 22(3):527-36. PMC: 3278316. DOI: 10.1093/cercor/bhr117. View

16.

Collins A, Koechlin E . Reasoning, learning, and creativity: frontal lobe function and human decision-making. PLoS Biol. 2012; 10(3):e1001293. PMC: 3313946. DOI: 10.1371/journal.pbio.1001293. View

17.

Collins A, Frank M . How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. Eur J Neurosci. 2012; 35(7):1024-35. PMC: 3390186. DOI: 10.1111/j.1460-9568.2011.07980.x. View

18.

Guitart-Masip M, Huys Q, Fuentemilla L, Dayan P, Duzel E, Dolan R . Go and no-go learning in reward and punishment: interactions between affect and effect. Neuroimage. 2012; 62(1):154-66. PMC: 3387384. DOI: 10.1016/j.neuroimage.2012.04.024. View

19.

Collins A, Frank M . Cognitive control over learning: creating, clustering, and generalizing task-set structure. Psychol Rev. 2013; 120(1):190-229. PMC: 3974273. DOI: 10.1037/a0030852. View

20.

Fedorenko E, Duncan J, Kanwisher N . Broad domain generality in focal regions of frontal and parietal cortex. Proc Natl Acad Sci U S A. 2013; 110(41):16616-21. PMC: 3799302. DOI: 10.1073/pnas.1315235110. View