A Reinforcement Learning Model with Choice Traces for a Progressive Ratio Schedule

Overview

Journal Front Behav Neurosci

Specialty Psychology

Date 2024 Jan 25

PMID 38268795

Authors

Keiko Ihara

Yu Shikano

Sae Kato

Sho Yagishita

Kenji F Tanaka

Norio Takata

Affiliations

Soon will be listed here.

Abstract

The progressive ratio (PR) lever-press task serves as a benchmark for assessing goal-oriented motivation. However, a well-recognized limitation of the PR task is that only a single data point, known as the breakpoint, is obtained from an entire session as a barometer of motivation. Because the breakpoint is defined as the final ratio of responses achieved in a PR session, variations in choice behavior during the PR task cannot be captured. We addressed this limitation by constructing four reinforcement learning models: a simple Q-learning model, an asymmetric model with two learning rates, a perseverance model with choice traces, and a perseverance model without learning. These models incorporated three behavioral choices: reinforced and non-reinforced lever presses and void magazine nosepokes, because we noticed that male mice performed frequent magazine nosepokes during PR tasks. The best model was the perseverance model, which predicted a gradual reduction in amplitudes of reward prediction errors (RPEs) upon void magazine nosepokes. We confirmed the prediction experimentally with fiber photometry of extracellular dopamine (DA) dynamics in the ventral striatum of male mice using a fluorescent protein (genetically encoded GPCR activation-based DA sensor: GRAB). We verified application of the model by acute intraperitoneal injection of low-dose methamphetamine (METH) before a PR task, which increased the frequency of magazine nosepokes during the PR session without changing the breakpoint. The perseverance model captured behavioral modulation as a result of increased initial action values, which are customarily set to zero and disregarded in reinforcement learning analysis. Our findings suggest that the perseverance model reveals the effects of psychoactive drugs on choice behaviors during PR tasks.

Citing Articles

Distinct sex differences in ethanol consumption and operant self-administration in C57BL/6J mice with uniform regulation by glutamate AMPAR activity.

Faccidomo S, Eastman V, Santanam T, Swaim K, Taylor S, Hodge C Front Behav Neurosci. 2025; 18:1498201.

PMID: 39911242 PMC: 11794300. DOI: 10.3389/fnbeh.2024.1498201.

Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts.

Colas J, ODoherty J, Grafton S PLoS Comput Biol. 2024; 20(3):e1011950.

PMID: 38552190 PMC: 10980507. DOI: 10.1371/journal.pcbi.1011950.

References

Tsutsui-Kimura I, Takiue H, Yoshida K, Xu M, Yano R, Ohta H . Dysfunction of ventrolateral striatal dopamine receptor type 2-expressing medium spiny neurons impairs instrumental motivation. Nat Commun. 2017; 8:14304. PMC: 5296642. DOI: 10.1038/ncomms14304. View

Kelley A . Measurement of rodent stereotyped behavior. Curr Protoc Neurosci. 2008; Chapter 8:Unit 8.8. DOI: 10.1002/0471142301.ns0808s04. View

Tsutsui-Kimura I, Natsubori A, Mori M, Kobayashi K, Drew M, de Kerchove dExaerde A . Distinct Roles of Ventromedial versus Ventrolateral Striatal Medium Spiny Neurons in Reward-Oriented Behavior. Curr Biol. 2017; 27(19):3042-3048.e4. DOI: 10.1016/j.cub.2017.08.061. View

Bernacer J, Corlett P, Ramachandra P, McFarlane B, Turner D, Clark L . Methamphetamine-induced disruption of frontostriatal reward learning signals: relation to psychotic symptoms. Am J Psychiatry. 2013; 170(11):1326-34. DOI: 10.1176/appi.ajp.2013.12070978. View

Bradshaw C, Killeen P . A theory of behaviour on progressive ratio schedules, with applications in behavioural pharmacology. Psychopharmacology (Berl). 2012; 222(4):549-64. DOI: 10.1007/s00213-012-2771-4. View

Hall D, Stanis J, Marquez Avila H, Gulley J . A comparison of amphetamine- and methamphetamine-induced locomotor activity in rats: evidence for qualitative differences in behavior. Psychopharmacology (Berl). 2007; 195(4):469-78. PMC: 2423722. DOI: 10.1007/s00213-007-0923-8. View

Kulig B, Calhoun W . Enhancement of successive discrimination reversal learning by methamphetamine. Psychopharmacologia. 1972; 27(3):233-40. DOI: 10.1007/BF00422803. View

Shen X, Purser C, Tien L, Chiu C, Paul I, Baker R . mu-Opioid receptor knockout mice are insensitive to methamphetamine-induced behavioral sensitization. J Neurosci Res. 2010; 88(10):2294-302. PMC: 3086557. DOI: 10.1002/jnr.22386. View

Hadamitzky M, McCunney S, Markou A, Kuczenski R . Development of stereotyped behaviors during prolonged escalation of methamphetamine self-administration in rats. Psychopharmacology (Berl). 2012; 223(3):259-69. PMC: 3586274. DOI: 10.1007/s00213-012-2713-1. View

10.

Roth M, Carroll M . Sex differences in the acquisition of IV methamphetamine self-administration and subsequent maintenance under a progressive ratio schedule in rats. Psychopharmacology (Berl). 2003; 172(4):443-9. DOI: 10.1007/s00213-003-1670-0. View

11.

Shikano Y, Yagishita S, Tanaka K, Takata N . Slow-rising and fast-falling dopaminergic dynamics jointly adjust negative prediction error in the ventral striatum. Eur J Neurosci. 2023; 58(12):4502-4522. DOI: 10.1111/ejn.15945. View

12.

Wanat M, Bonci A, Phillips P . CRF acts in the midbrain to attenuate accumbens dopamine release to rewards but not their predictors. Nat Neurosci. 2013; 16(4):383-5. PMC: 3609940. DOI: 10.1038/nn.3335. View

13.

Sun F, Zhou J, Dai B, Qian T, Zeng J, Li X . Next-generation GRAB sensors for monitoring dopaminergic activity in vivo. Nat Methods. 2020; 17(11):1156-1166. PMC: 7648260. DOI: 10.1038/s41592-020-00981-9. View

14.

Berditchevskaia A, Caze R, Schultz S . Performance in a GO/NOGO perceptual task reflects a balance between impulsive and instrumental components of behaviour. Sci Rep. 2016; 6:27389. PMC: 4895381. DOI: 10.1038/srep27389. View

15.

Mithoefer M, Feduccia A, Jerome L, Mithoefer A, Wagner M, Walsh Z . MDMA-assisted psychotherapy for treatment of PTSD: study design and rationale for phase 3 trials based on pooled analysis of six phase 2 randomized controlled trials. Psychopharmacology (Berl). 2019; 236(9):2735-2745. PMC: 6695343. DOI: 10.1007/s00213-019-05249-5. View

16.

Sugawara M, Katahira K . Dissociation between asymmetric value updating and perseverance in human reinforcement learning. Sci Rep. 2021; 11(1):3574. PMC: 7878894. DOI: 10.1038/s41598-020-80593-7. View

17.

Akaishi R, Umeda K, Nagase A, Sakai K . Autonomous mechanism of internal choice estimate underlies decision inertia. Neuron. 2013; 81(1):195-206. DOI: 10.1016/j.neuron.2013.10.018. View

18.

Grilly D, Loveland A . What is a "low dose" of d-amphetamine for inducing behavioral effects in laboratory rats?. Psychopharmacology (Berl). 2001; 153(2):155-69. DOI: 10.1007/s002130000580. View

19.

Zhou W, Kim K, Ali F, Pittenger S, Calarco C, Mineur Y . Activity of a direct VTA to ventral pallidum GABA pathway encodes unconditioned reward value and sustains motivation for reward. Sci Adv. 2022; 8(42):eabm5217. PMC: 9581470. DOI: 10.1126/sciadv.abm5217. View

20.

Palminteri S, Wyart V, Koechlin E . The Importance of Falsification in Computational Cognitive Modeling. Trends Cogn Sci. 2017; 21(6):425-433. DOI: 10.1016/j.tics.2017.03.011. View