» Articles » PMID: 19030101

When Does Reward Maximization Lead to Matching Law?

Overview
Journal PLoS One
Date 2008 Nov 26
PMID 19030101
Citations 11
Authors
Affiliations
Soon will be listed here.
Abstract

What kind of strategies subjects follow in various behavioral circumstances has been a central issue in decision making. In particular, which behavioral strategy, maximizing or matching, is more fundamental to animal's decision behavior has been a matter of debate. Here, we prove that any algorithm to achieve the stationary condition for maximizing the average reward should lead to matching when it ignores the dependence of the expected outcome on subject's past choices. We may term this strategy of partial reward maximization "matching strategy". Then, this strategy is applied to the case where the subject's decision system updates the information for making a decision. Such information includes subject's past actions or sensory stimuli, and the internal storage of this information is often called "state variables". We demonstrate that the matching strategy provides an easy way to maximize reward when combined with the exploration of the state variables that correctly represent the crucial information for reward maximization. Our results reveal for the first time how a strategy to achieve matching behavior is beneficial to reward maximization, achieving a novel insight into the relationship between maximizing and matching.

Citing Articles

Undermatching Is a Consequence of Policy Compression.

Bari B, Gershman S J Neurosci. 2023; 43(3):447-457.

PMID: 36639891 PMC: 9864556. DOI: 10.1523/JNEUROSCI.1003-22.2022.


Dynamic decision making and value computations in medial frontal cortex.

Bari B, Cohen J Int Rev Neurobiol. 2021; 158:83-113.

PMID: 33785157 PMC: 8162729. DOI: 10.1016/bs.irn.2020.12.001.


The Relevance of Operant Behavior in Conceptualizing the Psychological Well-Being of Captive Animals.

Rasmussen E, Newland M, Hemmelman E Perspect Behav Sci. 2020; 43(3):617-654.

PMID: 33029580 PMC: 7490306. DOI: 10.1007/s40614-020-00259-7.


Stable Representations of Decision Variables for Flexible Behavior.

Bari B, Grossman C, Lubin E, Rajagopalan A, Cressy J, Cohen J Neuron. 2019; 103(5):922-933.e7.

PMID: 31280924 PMC: 7169950. DOI: 10.1016/j.neuron.2019.06.001.


A Free-Operant Reward-Tracking Paradigm to Study Neural Mechanisms and Neurochemical Modulation of Adaptive Behavior in Rats.

Stoilova V, Wette S, Stuttgen M Int J Mol Sci. 2019; 20(12).

PMID: 31242610 PMC: 6627494. DOI: 10.3390/ijms20123098.


References
1.
Herrnstein R . Relative and absolute strength of response as a function of frequency of reinforcement. J Exp Anal Behav. 1961; 4:267-72. PMC: 1404074. DOI: 10.1901/jeab.1961.4-267. View

2.
DeCarlo L . Matching and maximizing with variable-time schedules. J Exp Anal Behav. 1985; 43(1):75-81. PMC: 1348096. DOI: 10.1901/jeab.1985.43-75. View

3.
Schultz W . Predictive reward signal of dopamine neurons. J Neurophysiol. 1998; 80(1):1-27. DOI: 10.1152/jn.1998.80.1.1. View

4.
Tanaka S, Doya K, Okada G, Ueda K, Okamoto Y, Yamawaki S . Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat Neurosci. 2004; 7(8):887-93. DOI: 10.1038/nn1279. View

5.
Soltani A, Wang X . A biophysically based neural model of matching law behavior: melioration by stochastic synapses. J Neurosci. 2006; 26(14):3731-44. PMC: 6674121. DOI: 10.1523/JNEUROSCI.5159-05.2006. View