» Articles » PMID: 25815510

Theory of Choice in Bandit, Information Sampling and Foraging Tasks

Overview
Specialty Biology
Date 2015 Mar 28
PMID 25815510
Citations 49
Authors
Affiliations
Soon will be listed here.
Abstract

Decision making has been studied with a wide array of tasks. Here we examine the theoretical structure of bandit, information sampling and foraging tasks. These tasks move beyond tasks where the choice in the current trial does not affect future expected rewards. We have modeled these tasks using Markov decision processes (MDPs). MDPs provide a general framework for modeling tasks in which decisions affect the information on which future choices will be made. Under the assumption that agents are maximizing expected rewards, MDPs provide normative solutions. We find that all three classes of tasks pose choices among actions which trade-off immediate and future expected rewards. The tasks drive these trade-offs in unique ways, however. For bandit and information sampling tasks, increasing uncertainty or the time horizon shifts value to actions that pay-off in the future. Correspondingly, decreasing uncertainty increases the relative value of actions that pay-off immediately. For foraging tasks the time-horizon plays the dominant role, as choices do not affect future uncertainty in these tasks.

Citing Articles

Biased expectations about future choice options predict sequential economic decisions.

van de Wouw D, McKay R, Furl N Commun Psychol. 2024; 2(1):119.

PMID: 39695326 PMC: 11655870. DOI: 10.1038/s44271-024-00172-8.


A causal role of the right dorsolateral prefrontal cortex in random exploration.

Toghi A, Chizari M, Khosrowabadi R Sci Rep. 2024; 14(1):24796.

PMID: 39433838 PMC: 11493979. DOI: 10.1038/s41598-024-76025-5.


Complex behavior from intrinsic motivation to occupy future action-state path space.

Ramirez-Ruiz J, Grytskyy D, Mastrogiuseppe C, Habib Y, Moreno-Bote R Nat Commun. 2024; 15(1):6368.

PMID: 39075046 PMC: 11286966. DOI: 10.1038/s41467-024-49711-1.


Electrophysiological Markers of Aberrant Cue-Specific Exploration in Hazardous Drinkers.

Campbell E, Singh G, Claus E, Witkiewitz K, Costa V, Hogeveen J Comput Psychiatr. 2024; 7(1):47-59.

PMID: 38774639 PMC: 11104413. DOI: 10.5334/cpsy.96.


Variability and harshness shape flexible strategy-use in support of the constrained flexibility framework.

Pope-Caldwell S, Deffner D, Maurits L, Neumann T, Haun D Sci Rep. 2024; 14(1):7236.

PMID: 38538731 PMC: 10973413. DOI: 10.1038/s41598-024-57800-w.


References
1.
Averbeck B, Djamshidian A, OSullivan S, Housden C, Roiser J, Lees A . Uncertainty about mapping future actions into rewards may underlie performance on multiple measures of impulsivity in behavioral addiction: evidence from Parkinson's disease. Behav Neurosci. 2013; 127(2):245-55. PMC: 3935250. DOI: 10.1037/a0032079. View

2.
Costa V, Tran V, Turchi J, Averbeck B . Dopamine modulates novelty seeking behavior during decision making. Behav Neurosci. 2014; 128(5):556-66. PMC: 5861725. DOI: 10.1037/a0037128. View

3.
Pessiglione M, Seymour B, Flandin G, Dolan R, Frith C . Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature. 2006; 442(7106):1042-5. PMC: 2636869. DOI: 10.1038/nature05051. View

4.
Wilson R, Geana A, White J, Ludvig E, Cohen J . Humans use directed and random exploration to solve the explore-exploit dilemma. J Exp Psychol Gen. 2014; 143(6):2074-81. PMC: 5635655. DOI: 10.1037/a0038199. View

5.
Drugowitsch J, Moreno-Bote R, Churchland A, Shadlen M, Pouget A . The cost of accumulating evidence in perceptual decision making. J Neurosci. 2012; 32(11):3612-28. PMC: 3329788. DOI: 10.1523/JNEUROSCI.4010-11.2012. View