» Articles » PMID: 36639891

Undermatching Is a Consequence of Policy Compression

Overview
Journal J Neurosci
Specialty Neurology
Date 2023 Jan 14
PMID 36639891
Authors
Affiliations
Soon will be listed here.
Abstract

The matching law describes the tendency of agents to match the ratio of choices allocated to the ratio of rewards received when choosing among multiple options (Herrnstein, 1961). Perfect matching, however, is infrequently observed. Instead, agents tend to undermatch or bias choices toward the poorer option. Overmatching, or the tendency to bias choices toward the richer option, is rarely observed. Despite the ubiquity of undermatching, it has received an inadequate normative justification. Here, we assume agents not only seek to maximize reward, but also seek to minimize cognitive cost, which we formalize as policy complexity (the mutual information between actions and states of the environment). Policy complexity measures the extent to which the policy of an agent is state dependent. Our theory states that capacity-constrained agents (i.e., agents that must compress their policies to reduce complexity) can only undermatch or perfectly match, but not overmatch, consistent with the empirical evidence. Moreover, using mouse behavioral data (male), we validate a novel prediction about which task conditions exaggerate undermatching. Finally, in patients with Parkinson's disease (male and female), we argue that a reduction in undermatching with higher dopamine levels is consistent with an increased policy complexity. The matching law describes the tendency of agents to match the ratio of choices allocated to different options to the ratio of reward received. For example, if option a yields twice as much reward as option b, matching states that agents will choose option a twice as much. However, agents typically undermatch: they choose the poorer option more frequently than expected. Here, we assume that agents seek to simultaneously maximize reward and minimize the complexity of their action policies. We show that this theory explains when and why undermatching occurs. Neurally, we show that policy complexity, and by extension undermatching, is controlled by tonic dopamine, consistent with other evidence that dopamine plays an important role in cognitive resource allocation.

Citing Articles

Policy Complexity Suppresses Dopamine Responses.

Gershman S, Lak A J Neurosci. 2025; 45(9).

PMID: 39788740 PMC: 11866995. DOI: 10.1523/JNEUROSCI.1756-24.2024.


Policy complexity suppresses dopamine responses.

Gershman S, Lak A bioRxiv. 2024; .

PMID: 39345642 PMC: 11429712. DOI: 10.1101/2024.09.15.613150.


Surprising sounds influence risky decision making.

Feng G, Rutledge R Nat Commun. 2024; 15(1):8027.

PMID: 39271674 PMC: 11399252. DOI: 10.1038/s41467-024-51729-4.


Resource-rational psychopathology.

Bari B, Gershman S Behav Neurosci. 2024; 138(4):221-234.

PMID: 38753400 PMC: 11423359. DOI: 10.1037/bne0000600.


Human decision making balances reward maximization and policy compression.

Lai L, Gershman S PLoS Comput Biol. 2024; 20(4):e1012057.

PMID: 38669280 PMC: 11078408. DOI: 10.1371/journal.pcbi.1012057.


References
1.
Parush N, Tishby N, Bergman H . Dopaminergic Balance between Reward Maximization and Policy Complexity. Front Syst Neurosci. 2011; 5:22. PMC: 3093748. DOI: 10.3389/fnsys.2011.00022. View

2.
Saito H, Katahira K, Okanoya K, Okada M . Bayesian deterministic decision making: a normative account of the operant matching law and heavy-tailed reward history dependency of choices. Front Comput Neurosci. 2014; 8:18. PMC: 3940885. DOI: 10.3389/fncom.2014.00018. View

3.
HOEHN M, Yahr M . Parkinsonism: onset, progression and mortality. Neurology. 1967; 17(5):427-42. DOI: 10.1212/wnl.17.5.427. View

4.
Vullings C, Madelain L . Control of saccadic latency in a dynamic environment: allocation of saccades in time follows the matching law. J Neurophysiol. 2017; 119(2):413-421. DOI: 10.1152/jn.00634.2017. View

5.
Bari B, Grossman C, Lubin E, Rajagopalan A, Cressy J, Cohen J . Stable Representations of Decision Variables for Flexible Behavior. Neuron. 2019; 103(5):922-933.e7. PMC: 7169950. DOI: 10.1016/j.neuron.2019.06.001. View