Minimal Cross-trial Generalization in Learning the Representation of an Odor-guided Choice Task

Overview

Journal PLoS Comput Biol

Specialty Biology

Date 2022 Mar 25

PMID 35333867

Authors

Mingyu Song

Yuji K Takahashi

Amanda C Burton

Matthew R Roesch

Geoffrey Schoenbaum

Yael Niv

Angela J Langdon

Affiliations

Soon will be listed here.

Abstract

There is no single way to represent a task. Indeed, despite experiencing the same task events and contingencies, different subjects may form distinct task representations. As experimenters, we often assume that subjects represent the task as we envision it. However, such a representation cannot be taken for granted, especially in animal experiments where we cannot deliver explicit instruction regarding the structure of the task. Here, we tested how rats represent an odor-guided choice task in which two odor cues indicated which of two responses would lead to reward, whereas a third odor indicated free choice among the two responses. A parsimonious task representation would allow animals to learn from the forced trials what is the better option to choose in the free-choice trials. However, animals may not necessarily generalize across odors in this way. We fit reinforcement-learning models that use different task representations to trial-by-trial choice behavior of individual rats performing this task, and quantified the degree to which each animal used the more parsimonious representation, generalizing across trial types. Model comparison revealed that most rats did not acquire this representation despite extensive experience. Our results demonstrate the importance of formally testing possible task representations that can afford the observed behavior, rather than assuming that animals' task representations abide by the generative task structure that governs the experimental design.

Citing Articles

Prior cocaine use diminishes encoding of latent information by orbitofrontal, but not medial, prefrontal ensembles.

Mueller L, Konya C, Sharpe M, Wikenheiser A, Schoenbaum G Curr Biol. 2024; 34(22):5223-5238.e3.

PMID: 39454572 PMC: 11576232. DOI: 10.1016/j.cub.2024.09.064.

Neuronal implementation of the temporal difference learning algorithm in the midbrain dopaminergic system.

Stetsenko A, Koos T Proc Natl Acad Sci U S A. 2023; 120(45):e2309015120.

PMID: 37903252 PMC: 10636325. DOI: 10.1073/pnas.2309015120.

References

Zhou J, Jia C, Montesinos-Cartagena M, Gardner M, Zong W, Schoenbaum G . Evolving schema representations in orbitofrontal ensembles during learning. Nature. 2020; 590(7847):606-611. PMC: 7906913. DOI: 10.1038/s41586-020-03061-2. View

Courville A, Daw N, Touretzky D . Bayesian theories of conditioning in a changing world. Trends Cogn Sci. 2006; 10(7):294-300. DOI: 10.1016/j.tics.2006.05.004. View

Burton A, Bissonette G, Vazquez D, Blume E, Donnelly M, Heatley K . Previous cocaine self-administration disrupts reward expectancy encoding in ventral striatum. Neuropsychopharmacology. 2018; 43(12):2350-2360. PMC: 6180050. DOI: 10.1038/s41386-018-0058-0. View

Sweis B, Abram S, Schmidt B, Seeland K, MacDonald 3rd A, Thomas M . Sensitivity to "sunk costs" in mice, rats, and humans. Science. 2018; 361(6398):178-181. PMC: 6377599. DOI: 10.1126/science.aar8644. View

Roesch M, Taylor A, Schoenbaum G . Encoding of time-discounted rewards in orbitofrontal cortex is independent of value representation. Neuron. 2006; 51(4):509-20. PMC: 2561990. DOI: 10.1016/j.neuron.2006.06.027. View

Roesch M, Singh T, Brown P, Mullins S, Schoenbaum G . Ventral striatal neurons encode the value of the chosen action in rats deciding between differently delayed or sized rewards. J Neurosci. 2009; 29(42):13365-76. PMC: 2788608. DOI: 10.1523/JNEUROSCI.2572-09.2009. View

Yang G, Joglekar M, Song H, Newsome W, Wang X . Task representations in neural networks trained to perform many cognitive tasks. Nat Neurosci. 2019; 22(2):297-306. PMC: 11549734. DOI: 10.1038/s41593-018-0310-2. View

Robinson T, Yager L, Cogan E, Saunders B . On the motivational properties of reward cues: Individual differences. Neuropharmacology. 2013; 76 Pt B:450-9. PMC: 3796005. DOI: 10.1016/j.neuropharm.2013.05.040. View

Wassum K, Ostlund S, Balleine B, Maidment N . Differential dependence of Pavlovian incentive motivation and instrumental incentive learning processes on dopamine signaling. Learn Mem. 2011; 18(7):475-83. PMC: 3125614. DOI: 10.1101/lm.2229311. View

10.

Wesson D, Carey R, Verhagen J, Wachowiak M . Rapid encoding and perception of novel odors in the rat. PLoS Biol. 2008; 6(4):e82. PMC: 2288628. DOI: 10.1371/journal.pbio.0060082. View

11.

Carpenter B, Gelman A, Hoffman M, Lee D, Goodrich B, Betancourt M . Stan: A Probabilistic Programming Language. J Stat Softw. 2022; 76. PMC: 9788645. DOI: 10.18637/jss.v076.i01. View

12.

Zhou J, Gardner M, Stalnaker T, Ramus S, Wikenheiser A, Niv Y . Rat Orbitofrontal Ensemble Activity Contains Multiplexed but Dissociable Representations of Value and Task Structure in an Odor Sequence Task. Curr Biol. 2019; 29(6):897-907.e3. PMC: 9445914. DOI: 10.1016/j.cub.2019.01.048. View

13.

Botvinick M, Ritter S, Wang J, Kurth-Nelson Z, Blundell C, Hassabis D . Reinforcement Learning, Fast and Slow. Trends Cogn Sci. 2019; 23(5):408-422. DOI: 10.1016/j.tics.2019.02.006. View

14.

Gershman S, Jones C, Norman K, Monfils M, Niv Y . Gradual extinction prevents the return of fear: implications for the discovery of state. Front Behav Neurosci. 2013; 7:164. PMC: 3831154. DOI: 10.3389/fnbeh.2013.00164. View

15.

Ferrero D, Lemon J, Fluegge D, Pashkovski S, Korzan W, Datta S . Detection and avoidance of a carnivore odor by prey. Proc Natl Acad Sci U S A. 2011; 108(27):11235-40. PMC: 3131382. DOI: 10.1073/pnas.1103317108. View

16.

Wang J, Kurth-Nelson Z, Kumaran D, Tirumala D, Soyer H, Leibo J . Prefrontal cortex as a meta-reinforcement learning system. Nat Neurosci. 2018; 21(6):860-868. DOI: 10.1038/s41593-018-0147-8. View

17.

Redish A, Jensen S, Johnson A, Kurth-Nelson Z . Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. Psychol Rev. 2007; 114(3):784-805. DOI: 10.1037/0033-295X.114.3.784. View

18.

Takahashi Y, Batchelor H, Liu B, Khanna A, Morales M, Schoenbaum G . Dopamine Neurons Respond to Errors in the Prediction of Sensory Features of Expected Rewards. Neuron. 2017; 95(6):1395-1405.e3. PMC: 5658021. DOI: 10.1016/j.neuron.2017.08.025. View

19.

Niv Y . Learning task-state representations. Nat Neurosci. 2019; 22(10):1544-1553. PMC: 7241310. DOI: 10.1038/s41593-019-0470-8. View

20.

Bennett D, Niv Y, Langdon A . Value-free reinforcement learning: policy optimization as a minimal model of operant behavior. Curr Opin Behav Sci. 2022; 41:114-121. PMC: 9635588. DOI: 10.1016/j.cobeha.2021.04.020. View