An Inductive Bias for Slowly Changing Features in Human Reinforcement Learning

Overview

Journal PLoS Comput Biol

Specialty Biology

Date 2024 Nov 25

PMID 39585903

Authors

Noa L Hedrich

Eric Schulz

Sam Hall-McMaster

Nicolas W Schuck

Affiliations

Soon will be listed here.

Abstract

Identifying goal-relevant features in novel environments is a central challenge for efficient behaviour. We asked whether humans address this challenge by relying on prior knowledge about common properties of reward-predicting features. One such property is the rate of change of features, given that behaviourally relevant processes tend to change on a slower timescale than noise. Hence, we asked whether humans are biased to learn more when task-relevant features are slow rather than fast. To test this idea, 295 human participants were asked to learn the rewards of two-dimensional bandits when either a slowly or quickly changing feature of the bandit predicted reward. Across two experiments and one preregistered replication, participants accrued more reward when a bandit's relevant feature changed slowly, and its irrelevant feature quickly, as compared to the opposite. We did not find a difference in the ability to generalise to unseen feature values between conditions. Testing how feature speed could affect learning with a set of four function approximation Kalman filter models revealed that participants had a higher learning rate for the slow feature, and adjusted their learning to both the relevance and the speed of feature changes. The larger the improvement in participants' performance for slow compared to fast bandits, the more strongly they adjusted their learning rates. These results provide evidence that human reinforcement learning favours slower features, suggesting a bias in how humans approach reward learning.

References

Posch M . Win-stay, lose-shift strategies for repeated games-memory length, aspiration levels and noise. J Theor Biol. 1999; 198(2):183-95. DOI: 10.1006/jtbi.1999.0909. View

Kemp C, Tenenbaum J . Structured statistical models of inductive reasoning. Psychol Rev. 2009; 116(1):20-58. DOI: 10.1037/a0014282. View

Balduzzi S, Rucker G, Schwarzer G . How to perform a meta-analysis with R: a practical tutorial. Evid Based Ment Health. 2019; 22(4):153-160. PMC: 10231495. DOI: 10.1136/ebmental-2019-300117. View

Lake B, Ullman T, Tenenbaum J, Gershman S . Building machines that learn and think like people. Behav Brain Sci. 2016; 40:e253. DOI: 10.1017/S0140525X16001837. View

Barr D, Levy R, Scheepers C, Tily H . Random effects structure for confirmatory hypothesis testing: Keep it maximal. J Mem Lang. 2014; 68(3). PMC: 3881361. DOI: 10.1016/j.jml.2012.11.001. View

Gigerenzer G, Gaissmaier W . Heuristic decision making. Annu Rev Psychol. 2010; 62:451-82. DOI: 10.1146/annurev-psych-120709-145346. View

Niv Y . Learning task-state representations. Nat Neurosci. 2019; 22(10):1544-1553. PMC: 7241310. DOI: 10.1038/s41593-019-0470-8. View

Wagenmakers E, Farrell S . AIC model selection using Akaike weights. Psychon Bull Rev. 2004; 11(1):192-6. DOI: 10.3758/bf03206482. View

Yu A, Dayan P . Uncertainty, neuromodulation, and attention. Neuron. 2005; 46(4):681-92. DOI: 10.1016/j.neuron.2005.04.026. View

10.

Gershman S, Niv Y . Perceptual estimation obeys Occam's razor. Front Psychol. 2013; 4:623. PMC: 3780620. DOI: 10.3389/fpsyg.2013.00623. View

11.

Weghenkel B, Wiskott L . Slowness as a Proxy for Temporal Predictability: An Empirical Comparison. Neural Comput. 2018; 30(5):1151-1179. DOI: 10.1162/NECO_a_01070. View

12.

Mnih V, Kavukcuoglu K, Silver D, Rusu A, Veness J, Bellemare M . Human-level control through deep reinforcement learning. Nature. 2015; 518(7540):529-33. DOI: 10.1038/nature14236. View

13.

Kuhl P, Conboy B, Coffey-Corina S, Padden D, Rivera-Gaxiola M, Nelson T . Phonetic learning as a pathway to language: new data and native language magnet theory expanded (NLM-e). Philos Trans R Soc Lond B Biol Sci. 2007; 363(1493):979-1000. PMC: 2606791. DOI: 10.1098/rstb.2007.2154. View

14.

Song P, Zhao C . Slow Down to Go Better: A Survey on Slow Feature Analysis. IEEE Trans Neural Netw Learn Syst. 2022; 35(3):3416-3436. DOI: 10.1109/TNNLS.2022.3201621. View

15.

Foldiak P . Learning Invariance from Transformation Sequences. Neural Comput. 2019; 3(2):194-200. DOI: 10.1162/neco.1991.3.2.194. View

16.

Braun D, Mehring C, Wolpert D . Structure learning in action. Behav Brain Res. 2009; 206(2):157-65. PMC: 2778795. DOI: 10.1016/j.bbr.2009.08.031. View

17.

Franzius M, Sprekeler H, Wiskott L . Slowness and sparseness lead to place, head-direction, and spatial-view cells. PLoS Comput Biol. 2007; 3(8):e166. PMC: 1963505. DOI: 10.1371/journal.pcbi.0030166. View

18.

Wittkuhn L, Chien S, Hall-McMaster S, Schuck N . Replay in minds and machines. Neurosci Biobehav Rev. 2021; 129:367-388. DOI: 10.1016/j.neubiorev.2021.08.002. View

19.

Kaplan R, Schuck N, Doeller C . The Role of Mental Maps in Decision-Making. Trends Neurosci. 2017; 40(5):256-259. DOI: 10.1016/j.tins.2017.03.002. View

20.

Gershman S, Niv Y . Novelty and Inductive Generalization in Human Reinforcement Learning. Top Cogn Sci. 2015; 7(3):391-415. PMC: 4537661. DOI: 10.1111/tops.12138. View