» Articles » PMID: 19404458

Reinforcement Learning: Computational Theory and Biological Mechanisms

Overview
Journal HFSP J
Specialty Biology
Date 2009 May 1
PMID 19404458
Citations 48
Authors
Affiliations
Soon will be listed here.
Abstract

Reinforcement learning is a computational framework for an active agent to learn behaviors on the basis of a scalar reward signal. The agent can be an animal, a human, or an artificial system such as a robot or a computer program. The reward can be food, water, money, or whatever measure of the performance of the agent. The theory of reinforcement learning, which was developed in an artificial intelligence community with intuitions from animal learning theory, is now giving a coherent account on the function of the basal ganglia. It now serves as the "common language" in which biologists, engineers, and social scientists can exchange their problems and findings. This article reviews the basic theoretical framework of reinforcement learning and discusses its recent and future contributions toward the understanding of animal behaviors and human decision making.

Citing Articles

Distributed representations of temporally accumulated reward prediction errors in the mouse cortex.

Makino H, Suhaimi A Sci Adv. 2025; 11(4):eadi4782.

PMID: 39841828 PMC: 11753378. DOI: 10.1126/sciadv.adi4782.


Motor synergy and energy efficiency emerge in whole-body locomotion learning.

Li G, Hayashibe M Sci Rep. 2025; 15(1):712.

PMID: 39753645 PMC: 11698959. DOI: 10.1038/s41598-024-82472-x.


Dopamine transients encode reward prediction errors independent of learning rates.

Mah A, Golden C, Constantinople C Cell Rep. 2024; 43(10):114840.

PMID: 39395170 PMC: 11571066. DOI: 10.1016/j.celrep.2024.114840.


Dopamine transients encode reward prediction errors independent of learning rates.

Mah A, Golden C, Constantinople C bioRxiv. 2024; .

PMID: 38659861 PMC: 11042285. DOI: 10.1101/2024.04.18.590090.


An opponent striatal circuit for distributional reinforcement learning.

Lowet A, Zheng Q, Meng M, Matias S, Drugowitsch J, Uchida N bioRxiv. 2024; .

PMID: 38260354 PMC: 10802299. DOI: 10.1101/2024.01.02.573966.


References
1.
Dorris M, Glimcher P . Activity in posterior parietal cortex is correlated with the relative subjective desirability of action. Neuron. 2004; 44(2):365-78. DOI: 10.1016/j.neuron.2004.09.009. View

2.
Lee D, McGreevy B, Barraclough D . Learning and decision making in monkeys during a rock-paper-scissors game. Brain Res Cogn Brain Res. 2005; 25(2):416-30. DOI: 10.1016/j.cogbrainres.2005.07.003. View

3.
Yoshida W, Ishii S . Resolution of uncertainty in prefrontal cortex. Neuron. 2006; 50(5):781-9. DOI: 10.1016/j.neuron.2006.05.006. View

4.
Redgrave P, Gurney K . The short-latency dopamine signal: a role in discovering novel actions?. Nat Rev Neurosci. 2006; 7(12):967-75. DOI: 10.1038/nrn2022. View

5.
Kawagoe R, Takikawa Y, Hikosaka O . Reward-predicting activity of dopamine and caudate neurons--a possible mechanism of motivational control of saccadic eye movement. J Neurophysiol. 2003; 91(2):1013-24. DOI: 10.1152/jn.00721.2003. View