Two Sides of the Same Coin: Beneficial and Detrimental Consequences of Range Adaptation in Human Reinforcement Learning

Overview

Journal Sci Adv

Specialties Biology
Science

Date 2021 Apr 3

PMID 33811071

Citations 13

Authors

Sophie Bavard

Aldo Rustichini

Stefano Palminteri

Affiliations

Soon will be listed here.

Abstract

Evidence suggests that economic values are rescaled as a function of the range of the available options. Although locally adaptive, range adaptation has been shown to lead to suboptimal choices, particularly notable in reinforcement learning (RL) situations when options are extrapolated from their original context to a new one. Range adaptation can be seen as the result of an adaptive coding process aiming at increasing the signal-to-noise ratio. However, this hypothesis leads to a counterintuitive prediction: Decreasing task difficulty should increase range adaptation and, consequently, extrapolation errors. Here, we tested the paradoxical relation between range adaptation and performance in a large sample of participants performing variants of an RL task, where we manipulated task difficulty. Results confirmed that range adaptation induces systematic extrapolation errors and is stronger when decreasing task difficulty. Last, we propose a range-adapting model and show that it is able to parsimoniously capture all the behavioral results.

Citing Articles

Comparing experience- and description-based economic preferences across 11 countries.

Anllo H, Bavard S, Benmarrakchi F, Bonagura D, Cerrotti F, Cicue M Nat Hum Behav. 2024; 8(8):1554-1567.

PMID: 38877287 DOI: 10.1038/s41562-024-01894-9.

Foraging in a non-foraging task: Fitness maximization explains human risk preference dynamics under changing environment.

Mochizuki Y, Harasawa N, Aggarwal M, Chen C, Fukuda H PLoS Comput Biol. 2024; 20(5):e1012080.

PMID: 38739672 PMC: 11115364. DOI: 10.1371/journal.pcbi.1012080.

Recent Opioid Use Impedes Range Adaptation in Reinforcement Learning in Human Addiction.

Gueguen M, Anllo H, Bonagura D, Kong J, Hafezi S, Palminteri S Biol Psychiatry. 2023; 95(10):974-984.

PMID: 38101503 PMC: 11065633. DOI: 10.1016/j.biopsych.2023.12.005.

Intrinsic rewards explain context-sensitive valuation in reinforcement learning.

Molinaro G, Collins A PLoS Biol. 2023; 21(7):e3002201.

PMID: 37459394 PMC: 10374061. DOI: 10.1371/journal.pbio.3002201.

The functional form of value normalization in human reinforcement learning.

Bavard S, Palminteri S Elife. 2023; 12.

PMID: 37428155 PMC: 10393293. DOI: 10.7554/eLife.83891.

References

Akaishi R, Umeda K, Nagase A, Sakai K . Autonomous mechanism of internal choice estimate underlies decision inertia. Neuron. 2013; 81(1):195-206. DOI: 10.1016/j.neuron.2013.10.018. View

Thrailkill E, Trask S, Vidal P, Alcala J, Bouton M . Stimulus control of actions and habits: A role for reinforcer predictability and attention in the development of habitual behavior. J Exp Psychol Anim Learn Cogn. 2018; 44(4):370-384. PMC: 6233324. DOI: 10.1037/xan0000188. View

Dumbalska T, Li V, Tsetsos K, Summerfield C . A map of decoy influence in human multialternative choice. Proc Natl Acad Sci U S A. 2020; 117(40):25169-25178. PMC: 7547229. DOI: 10.1073/pnas.2005058117. View

Elliott R, Agnew Z, Deakin J . Medial orbitofrontal cortex codes relative rather than absolute value of financial rewards in humans. Eur J Neurosci. 2008; 27(9):2213-8. DOI: 10.1111/j.1460-9568.2008.06202.x. View

Bavard S, Lebreton M, Khamassi M, Coricelli G, Palminteri S . Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences. Nat Commun. 2018; 9(1):4503. PMC: 6206161. DOI: 10.1038/s41467-018-06781-2. View

Palminteri S, Khamassi M, Joffily M, Coricelli G . Contextual modulation of value signals in reward and punishment learning. Nat Commun. 2015; 6:8096. PMC: 4560823. DOI: 10.1038/ncomms9096. View

Pompilio L, Kacelnik A . Context-dependent utility overrides absolute memory as a determinant of choice. Proc Natl Acad Sci U S A. 2009; 107(1):508-12. PMC: 2806750. DOI: 10.1073/pnas.0907250107. View

Frank M, Seeberger L, OReilly R . By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science. 2004; 306(5703):1940-3. DOI: 10.1126/science.1102941. View

Padoa-Schioppa C, Rustichini A . Rational Attention and Adaptive Coding: A Puzzle and a Solution. Am Econ Rev. 2014; 104(5):507-513. PMC: 4256040. DOI: 10.1257/aer.104.5.507. View

10.

Freidin E, Kacelnik A . Rational choice, context dependence, and the value of information in European starlings (Sturnus vulgaris). Science. 2011; 334(6058):1000-2. DOI: 10.1126/science.1209626. View

11.

Conen K, Padoa-Schioppa C . Partial Adaptation to the Value Range in the Macaque Orbitofrontal Cortex. J Neurosci. 2019; 39(18):3498-3513. PMC: 6495134. DOI: 10.1523/JNEUROSCI.2279-18.2019. View

12.

Collins A, Frank M . How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. Eur J Neurosci. 2012; 35(7):1024-35. PMC: 3390186. DOI: 10.1111/j.1460-9568.2011.07980.x. View

13.

Burke C, Baddeley M, Tobler P, Schultz W . Partial Adaptation of Obtained and Observed Value Signals Preserves Information about Gains and Losses. J Neurosci. 2016; 36(39):10016-25. PMC: 5039252. DOI: 10.1523/JNEUROSCI.0487-16.2016. View

14.

Louie K, Glimcher P, Webb R . Adaptive neural coding: from biological to behavioral decision-making. Curr Opin Behav Sci. 2016; 5:91-99. PMC: 4692189. DOI: 10.1016/j.cobeha.2015.08.008. View

15.

Rustichini A, Conen K, Cai X, Padoa-Schioppa C . Optimal coding and neuronal adaptation in economic decisions. Nat Commun. 2017; 8(1):1208. PMC: 5662730. DOI: 10.1038/s41467-017-01373-y. View

16.

Louie K, Glimcher P . Efficient coding and the neural representation of value. Ann N Y Acad Sci. 2012; 1251:13-32. DOI: 10.1111/j.1749-6632.2012.06496.x. View

17.

Cox K, Kable J . BOLD subjective value signals exhibit robust range adaptation. J Neurosci. 2014; 34(49):16533-43. PMC: 4252558. DOI: 10.1523/JNEUROSCI.3927-14.2014. View

18.

Palminteri S, Wyart V, Koechlin E . The Importance of Falsification in Computational Cognitive Modeling. Trends Cogn Sci. 2017; 21(6):425-433. DOI: 10.1016/j.tics.2017.03.011. View

19.

Gluth S, Kern N, Kortmann M, Vitali C . Value-based attention but not divisive normalization influences decisions with multiple alternatives. Nat Hum Behav. 2020; 4(6):634-645. PMC: 7306407. DOI: 10.1038/s41562-020-0822-0. View

20.

Pischedda D, Palminteri S, Coricelli G . The Effect of Counterfactual Information on Outcome Value Coding in Medial Prefrontal and Cingulate Cortex: From an Absolute to a Relative Neural Code. J Neurosci. 2020; 40(16):3268-3277. PMC: 7159892. DOI: 10.1523/JNEUROSCI.1712-19.2020. View