Instructional Control of Reinforcement Learning: a Behavioral and Neurocomputational Investigation

Overview

Journal Brain Res

Specialty Neurology

Date 2009 Jul 15

PMID 19595993

Citations 103

Authors

Bradley B Doll

W Jake Jacobs

Alan G Sanfey

Michael J Frank

Affiliations

Soon will be listed here.

Abstract

Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is "overridden" at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract "Q-learning" and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a "confirmation bias" in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes.

Citing Articles

Transmission of societal stereotypes to individual-level prejudice through instrumental learning.

Schultner D, Stillerman B, Lindstrom B, Hackel L, Hagen D, Jostmann N Proc Natl Acad Sci U S A. 2024; 121(45):e2414518121.

PMID: 39485797 PMC: 11551433. DOI: 10.1073/pnas.2414518121.

Comparing experience- and description-based economic preferences across 11 countries.

Anllo H, Bavard S, Benmarrakchi F, Bonagura D, Cerrotti F, Cicue M Nat Hum Behav. 2024; 8(8):1554-1567.

PMID: 38877287 DOI: 10.1038/s41562-024-01894-9.

The challenge of learning adaptive mental behavior.

Hitchcock P, Frank M J Psychopathol Clin Sci. 2024; 133(5):413-426.

PMID: 38815082 PMC: 11229419. DOI: 10.1037/abn0000924.

Disentangling the contribution of individual and social learning processes in human advice-taking behavior.

Pereg M, Hertz U, Ben-Artzi I, Shahar N NPJ Sci Learn. 2024; 9(1):4.

PMID: 38245562 PMC: 10799906. DOI: 10.1038/s41539-024-00214-0.

Prefrontal signals precede striatal signals for biased credit assignment in motivational learning biases.

Algermissen J, Swart J, Scheeringa R, Cools R, den Ouden H Nat Commun. 2024; 15(1):19.

PMID: 38168089 PMC: 10762147. DOI: 10.1038/s41467-023-44632-x.

References

Nomura E, Maddox W, Filoteo J, Ing A, Gitelman D, Parrish T . Neural correlates of rule-based and information-integration visual category learning. Cereb Cortex. 2006; 17(1):37-43. DOI: 10.1093/cercor/bhj122. View

McClure S, Laibson D, Loewenstein G, Cohen J . Separate neural systems value immediate and delayed monetary rewards. Science. 2004; 306(5695):503-7. DOI: 10.1126/science.1100907. View

Daw N, ODoherty J, Dayan P, Seymour B, Dolan R . Cortical substrates for exploratory decisions in humans. Nature. 2006; 441(7095):876-9. PMC: 2635947. DOI: 10.1038/nature04766. View

Montague P, Dayan P, Sejnowski T . A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J Neurosci. 1996; 16(5):1936-47. PMC: 6578666. View

Samejima K, Ueda Y, Doya K, Kimura M . Representation of action-specific reward values in the striatum. Science. 2005; 310(5752):1337-40. DOI: 10.1126/science.1115270. View

Schultz W . Multiple dopamine functions at different time courses. Annu Rev Neurosci. 2007; 30:259-88. DOI: 10.1146/annurev.neuro.28.061604.135722. View

Sanfey A, Rilling J, Aronson J, Nystrom L, Cohen J . The neural basis of economic decision-making in the Ultimatum Game. Science. 2003; 300(5626):1755-8. DOI: 10.1126/science.1082976. View

Frank M, Loughry B, OReilly R . Interactions between frontal cortex and basal ganglia in working memory: a computational model. Cogn Affect Behav Neurosci. 2002; 1(2):137-60. DOI: 10.3758/cabn.1.2.137. View

Frank M, Woroch B, Curran T . Error-related negativity predicts reinforcement learning and conflict biases. Neuron. 2005; 47(4):495-501. DOI: 10.1016/j.neuron.2005.06.020. View

10.

Schonberg T, Daw N, Joel D, ODoherty J . Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making. J Neurosci. 2007; 27(47):12860-7. PMC: 6673291. DOI: 10.1523/JNEUROSCI.2496-07.2007. View

11.

Durstewitz D, Seamans J, Sejnowski T . Neurocomputational models of working memory. Nat Neurosci. 2000; 3 Suppl:1184-91. DOI: 10.1038/81460. View

12.

McClure S, Berns G, Montague P . Temporal prediction errors in a passive learning task activate human striatum. Neuron. 2003; 38(2):339-46. DOI: 10.1016/s0896-6273(03)00154-5. View

13.

Frank M, Samanta J, Moustafa A, Sherman S . Hold your horses: impulsivity, deep brain stimulation, and medication in parkinsonism. Science. 2007; 318(5854):1309-12. DOI: 10.1126/science.1146157. View

14.

Klein T, Neumann J, Reuter M, Hennig J, von Cramon D, Ullsperger M . Genetically determined differences in learning from errors. Science. 2007; 318(5856):1642-5. DOI: 10.1126/science.1145044. View

15.

Neal A, Hesketh B, Andrews S . Instance-based categorization: automatic versus intentional forms of retrieval. Mem Cognit. 1995; 23(2):227-42. DOI: 10.3758/bf03197224. View

16.

ODoherty J, Hampton A, Kim H . Model-based fMRI and its application to reward learning and decision making. Ann N Y Acad Sci. 2007; 1104:35-53. DOI: 10.1196/annals.1390.022. View

17.

ODoherty J, Dayan P, Schultz J, Deichmann R, Friston K, Dolan R . Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science. 2004; 304(5669):452-4. DOI: 10.1126/science.1094285. View

18.

Nishi A, Snyder G, Greengard P . Bidirectional regulation of DARPP-32 phosphorylation by dopamine. J Neurosci. 1997; 17(21):8147-55. PMC: 6573760. View

19.

Pasupathy A, Miller E . Different time courses of learning-related activity in the prefrontal cortex and striatum. Nature. 2005; 433(7028):873-6. DOI: 10.1038/nature03287. View

20.

Filoteo J, Maddox W, Simmons A, Ing A, Cagigas X, Matthews S . Cortical and subcortical brain regions involved in rule-based category learning. Neuroreport. 2005; 16(2):111-5. DOI: 10.1097/00001756-200502080-00007. View