Model-based Reinforcement Learning Under Concurrent Schedules of Reinforcement in Rodents

Overview

Journal Learn Mem

Specialty Neurology

Date 2009 May 1

PMID 19403794

Citations 20

Authors

Namjung Huh

Suhyun Jo

Hoseok Kim

Jung Hoon Sul

Min Whan Jung

Affiliations

Soon will be listed here.

Abstract

Reinforcement learning theories postulate that actions are chosen to maximize a long-term sum of positive outcomes based on value functions, which are subjective estimates of future rewards. In simple reinforcement learning algorithms, value functions are updated only by trial-and-error, whereas they are updated according to the decision-maker's knowledge or model of the environment in model-based reinforcement learning algorithms. To investigate how animals update value functions, we trained rats under two different free-choice tasks. The reward probability of the unchosen target remained unchanged in one task, whereas it increased over time since the target was last chosen in the other task. The results show that goal choice probability increased as a function of the number of consecutive alternative choices in the latter, but not the former task, indicating that the animals were aware of time-dependent increases in arming probability and used this information in choosing goals. In addition, the choice behavior in the latter task was better accounted for by a model-based reinforcement learning algorithm. Our results show that rats adopt a decision-making process that cannot be accounted for by simple reinforcement learning models even in a relatively simple binary choice task, suggesting that rats can readily improve their decision-making strategy through the knowledge of their environments.

Citing Articles

Dual credit assignment processes underlie dopamine signals in a complex spatial environment.

Krausz T, Comrie A, Kahn A, Frank L, Daw N, Berke J Neuron. 2023; 111(21):3465-3478.e7.

PMID: 37611585 PMC: 10841332. DOI: 10.1016/j.neuron.2023.07.017.

Dual credit assignment processes underlie dopamine signals in a complex spatial environment.

Krausz T, Comrie A, Frank L, Daw N, Berke J bioRxiv. 2023; .

PMID: 36993482 PMC: 10054934. DOI: 10.1101/2023.02.15.528738.

Undermatching Is a Consequence of Policy Compression.

Bari B, Gershman S J Neurosci. 2023; 43(3):447-457.

PMID: 36639891 PMC: 9864556. DOI: 10.1523/JNEUROSCI.1003-22.2022.

Robust and distributed neural representation of action values.

Shin E, Jang Y, Kim S, Kim H, Cai X, Lee H Elife. 2021; 10.

PMID: 33876728 PMC: 8104958. DOI: 10.7554/eLife.53045.

Time elapsed between choices in a probabilistic task correlates with repeating the same decision.

Jablonska J, Szumiec L, Zielinski P, Parkitna J Eur J Neurosci. 2021; 53(8):2639-2654.

PMID: 33559232 PMC: 8248175. DOI: 10.1111/ejn.15144.