Meta-control of the Exploration-exploitation Dilemma Emerges from Probabilistic Inference over a Hierarchy of Time Scales

Overview

Journal Cogn Affect Behav Neurosci

Publisher Springer

Specialties Neurology
Social Sciences

Date 2020 Dec 29

PMID 33372237

Citations 5

Authors

Dimitrije Markovic

Thomas Goschke

Stefan J Kiebel

Affiliations

Soon will be listed here.

Abstract

Cognitive control is typically understood as a set of mechanisms that enable humans to reach goals that require integrating the consequences of actions over longer time scales. Importantly, using routine behaviour or making choices beneficial only at short time scales would prevent one from attaining these goals. During the past two decades, researchers have proposed various computational cognitive models that successfully account for behaviour related to cognitive control in a wide range of laboratory tasks. As humans operate in a dynamic and uncertain environment, making elaborate plans and integrating experience over multiple time scales is computationally expensive. Importantly, it remains poorly understood how uncertain consequences at different time scales are integrated into adaptive decisions. Here, we pursue the idea that cognitive control can be cast as active inference over a hierarchy of time scales, where inference, i.e., planning, at higher levels of the hierarchy controls inference at lower levels. We introduce the novel concept of meta-control states, which link higher-level beliefs with lower-level policy inference. Specifically, we conceptualize cognitive control as inference over these meta-control states, where solutions to cognitive control dilemmas emerge through surprisal minimisation at different hierarchy levels. We illustrate this concept using the exploration-exploitation dilemma based on a variant of a restless multi-armed bandit task. We demonstrate that beliefs about contexts and meta-control states at a higher level dynamically modulate the balance of exploration and exploitation at the lower level of a single action. Finally, we discuss the generalisation of this meta-control concept to other control dilemmas.

Citing Articles

Post-injury pain and behaviour: a control theory perspective.

Seymour B, Crook R, Chen Z Nat Rev Neurosci. 2023; 24(6):378-392.

PMID: 37165018 PMC: 10465160. DOI: 10.1038/s41583-023-00699-5.

Cognitive effort and active inference.

Parr T, Holmes E, Friston K, Pezzulo G Neuropsychologia. 2023; 184:108562.

PMID: 37080424 PMC: 10636588. DOI: 10.1016/j.neuropsychologia.2023.108562.

The Willpower Paradox: Possible and Impossible Conceptions of Self-Control.

Goschke T, Job V Perspect Psychol Sci. 2023; 18(6):1339-1367.

PMID: 36791675 PMC: 10623621. DOI: 10.1177/17456916221146158.

The exploration-exploitation trade-off in a foraging task is affected by mood-related arousal and valence.

van Dooren R, de Kleijn R, Hommel B, Sjoerds Z Cogn Affect Behav Neurosci. 2021; 21(3):549-560.

PMID: 34086199 PMC: 8208924. DOI: 10.3758/s13415-021-00917-6.

Neural Dynamics under Active Inference: Plausibility and Efficiency of Information Processing.

Da Costa L, Parr T, Sengupta B, Friston K Entropy (Basel). 2021; 23(4).

PMID: 33921298 PMC: 8069154. DOI: 10.3390/e23040454.

References

Daw N, Doya K . The computational neurobiology of learning and reward. Curr Opin Neurobiol. 2006; 16(2):199-204. DOI: 10.1016/j.conb.2006.03.006. View

Garbusow M, Schad D, Sommer C, Junger E, Sebold M, Friedel E . Pavlovian-to-instrumental transfer in alcohol dependence: a pilot study. Neuropsychobiology. 2014; 70(2):111-21. DOI: 10.1159/000363507. View

Friston K . The free-energy principle: a unified brain theory?. Nat Rev Neurosci. 2010; 11(2):127-38. DOI: 10.1038/nrn2787. View

Doya K . Metalearning and neuromodulation. Neural Netw. 2002; 15(4-6):495-506. DOI: 10.1016/s0893-6080(02)00044-8. View

Nassar M, Wilson R, Heasly B, Gold J . An approximately Bayesian delta-rule model explains the dynamics of belief updating in a changing environment. J Neurosci. 2010; 30(37):12366-78. PMC: 2945906. DOI: 10.1523/JNEUROSCI.0822-10.2010. View

Parr T, Friston K . Generalised free energy and active inference. Biol Cybern. 2019; 113(5-6):495-513. PMC: 6848054. DOI: 10.1007/s00422-019-00805-w. View

Hasson U, Yang E, Vallines I, Heeger D, Rubin N . A hierarchy of temporal receptive windows in human cortex. J Neurosci. 2008; 28(10):2539-50. PMC: 2556707. DOI: 10.1523/JNEUROSCI.5487-07.2008. View

Scherbaum S, Dshemuchadse M, Leiberg S, Goschke T . Harder than expected: increased conflict in clearly disadvantageous delayed choices in a computer game. PLoS One. 2013; 8(11):e79310. PMC: 3829829. DOI: 10.1371/journal.pone.0079310. View

Dai J, Pleskac T, Pachur T . Dynamic cognitive models of intertemporal choice. Cogn Psychol. 2018; 104:29-56. DOI: 10.1016/j.cogpsych.2018.03.001. View

10.

Scherbaum S, Dshemuchadse M, Ruge H, Goschke T . Dynamic goal states: adjusting cognitive control without conflict monitoring. Neuroimage. 2012; 63(1):126-36. DOI: 10.1016/j.neuroimage.2012.06.021. View

11.

Cogliati Dezza I, Yu A, Cleeremans A, Alexander W . Learning the value of information and reward over time when solving exploration-exploitation problems. Sci Rep. 2017; 7(1):16919. PMC: 5717252. DOI: 10.1038/s41598-017-17237-w. View

12.

McGuire J, Nassar M, Gold J, Kable J . Functionally dissociable influences on learning rate in a dynamic environment. Neuron. 2014; 84(4):870-81. PMC: 4437663. DOI: 10.1016/j.neuron.2014.10.013. View

13.

Friston K, Rosch R, Parr T, Price C, Bowman H . Deep temporal models and active inference. Neurosci Biobehav Rev. 2018; 90:486-501. PMC: 5998386. DOI: 10.1016/j.neubiorev.2018.04.004. View

14.

Kiebel S, Daunizeau J, Friston K . A hierarchy of time-scales and the brain. PLoS Comput Biol. 2008; 4(11):e1000209. PMC: 2568860. DOI: 10.1371/journal.pcbi.1000209. View

15.

Goschke T, Dreisbach G . Conflict-triggered goal shielding: response conflicts attenuate background monitoring for prospective memory cues. Psychol Sci. 2008; 19(1):25-32. DOI: 10.1111/j.1467-9280.2008.02042.x. View

16.

FitzGerald T, Schwartenbeck P, Moutoussis M, Dolan R, Friston K . Active inference, evidence accumulation, and the urn task. Neural Comput. 2014; 27(2):306-28. PMC: 4426890. DOI: 10.1162/NECO_a_00699. View

17.

Shenhav A, Botvinick M, Cohen J . The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron. 2013; 79(2):217-40. PMC: 3767969. DOI: 10.1016/j.neuron.2013.07.007. View

18.

Schlagenhauf F, Huys Q, Deserno L, Rapp M, Beck A, Heinze H . Striatal dysfunction during reversal learning in unmedicated schizophrenia patients. Neuroimage. 2013; 89:171-80. PMC: 3991847. DOI: 10.1016/j.neuroimage.2013.11.034. View

19.

Meyniel F, Sigman M, Mainen Z . Confidence as Bayesian Probability: From Neural Origins to Behavior. Neuron. 2015; 88(1):78-92. DOI: 10.1016/j.neuron.2015.09.039. View

20.

Badre D, Nee D . Frontal Cortex and the Hierarchical Control of Behavior. Trends Cogn Sci. 2017; 22(2):170-188. PMC: 5841250. DOI: 10.1016/j.tics.2017.11.005. View