Active Inference and the Two-step Task

Overview

Journal Sci Rep

Specialty Science

Date 2022 Oct 22

PMID 36271279

Authors

Sam Gijsen

Miro Grundei

Felix Blankenburg

Affiliations

Soon will be listed here.

Abstract

Sequential decision problems distill important challenges frequently faced by humans. Through repeated interactions with an uncertain world, unknown statistics need to be learned while balancing exploration and exploitation. Reinforcement learning is a prominent method for modeling such behaviour, with a prevalent application being the two-step task. However, recent studies indicate that the standard reinforcement learning model sometimes describes features of human task behaviour inaccurately and incompletely. We investigated whether active inference, a framework proposing a trade-off to the exploration-exploitation dilemma, could better describe human behaviour. Therefore, we re-analysed four publicly available datasets of the two-step task, performed Bayesian model selection, and compared behavioural model predictions. Two datasets, which revealed more model-based inference and behaviour indicative of directed exploration, were better described by active inference, while the models scored similarly for the remaining datasets. Learning using probability distributions appears to contribute to the improved model fits. Further, approximately half of all participants showed sensitivity to information gain as formulated under active inference, although behavioural exploration effects were not fully captured. These results contribute to the empirical validation of active inference as a model of human behaviour and the study of alternative models for the influential two-step task.

Citing Articles

Signatures of Perseveration and Heuristic-Based Directed Exploration in Two-Step Sequential Decision Task Behaviour.

Brands A, Mathar D, Peters J Comput Psychiatr. 2025; 9(1):39-62.

PMID: 39959565 PMC: 11827566. DOI: 10.5334/cpsy.101.

References

Cogliati Dezza I, Yu A, Cleeremans A, Alexander W . Learning the value of information and reward over time when solving exploration-exploitation problems. Sci Rep. 2017; 7(1):16919. PMC: 5717252. DOI: 10.1038/s41598-017-17237-w. View

Smith R, Schwartenbeck P, Stewart J, Kuplicki R, Ekhtiari H, Paulus M . Imprecise action selection in substance use disorder: Evidence for active learning impairments when solving the explore-exploit dilemma. Drug Alcohol Depend. 2020; 215:108208. PMC: 7502502. DOI: 10.1016/j.drugalcdep.2020.108208. View

Karl F . A Free Energy Principle for Biological Systems. Entropy (Basel). 2012; 14(11):2100-2121. PMC: 3510653. DOI: 10.3390/e14112100. View

Daw N, ODoherty J, Dayan P, Seymour B, Dolan R . Cortical substrates for exploratory decisions in humans. Nature. 2006; 441(7095):876-9. PMC: 2635947. DOI: 10.1038/nature04766. View

Wilson R, Geana A, White J, Ludvig E, Cohen J . Humans use directed and random exploration to solve the explore-exploit dilemma. J Exp Psychol Gen. 2014; 143(6):2074-81. PMC: 5635655. DOI: 10.1037/a0038199. View

Friston K, Rigoli F, Ognibene D, Mathys C, Fitzgerald T, Pezzulo G . Active inference and epistemic value. Cogn Neurosci. 2015; 6(4):187-214. DOI: 10.1080/17588928.2015.1020053. View

Liakoni V, Modirshanechi A, Gerstner W, Brea J . Learning in Volatile Environments With the Bayes Factor Surprise. Neural Comput. 2021; 33(2):269-340. DOI: 10.1162/neco_a_01352. View

Friston K, Kilner J, Harrison L . A free energy principle for the brain. J Physiol Paris. 2006; 100(1-3):70-87. DOI: 10.1016/j.jphysparis.2006.10.001. View

Mirza M, Adams R, Friston K, Parr T . Introducing a Bayesian model of selective attention based on active inference. Sci Rep. 2019; 9(1):13915. PMC: 6763492. DOI: 10.1038/s41598-019-50138-8. View

10.

Wilson R, Bonawitz E, Costa V, Ebitz R . Balancing exploration and exploitation with information and randomization. Curr Opin Behav Sci. 2020; 38:49-56. PMC: 7654823. DOI: 10.1016/j.cobeha.2020.10.001. View

11.

Rigoux L, Stephan K, Friston K, Daunizeau J . Bayesian model selection for group studies - revisited. Neuroimage. 2013; 84:971-85. DOI: 10.1016/j.neuroimage.2013.08.065. View

12.

Toyama A, Katahira K, Ohira H . A simple computational algorithm of model-based choice preference. Cogn Affect Behav Neurosci. 2017; 17(4):764-783. DOI: 10.3758/s13415-017-0511-2. View

13.

Schwartenbeck P, FitzGerald T, Mathys C, Dolan R, Kronbichler M, Friston K . Evidence for surprise minimization over value maximization in choice behavior. Sci Rep. 2015; 5:16575. PMC: 4643240. DOI: 10.1038/srep16575. View

14.

Gijsen S, Grundei M, Lange R, Ostwald D, Blankenburg F . Neural surprise in somatosensory Bayesian learning. PLoS Comput Biol. 2021; 17(2):e1008068. PMC: 7880500. DOI: 10.1371/journal.pcbi.1008068. View

15.

Parr T, Markovic D, Kiebel S, Friston K . Neuronal message passing using Mean-field, Bethe, and Marginal approximations. Sci Rep. 2019; 9(1):1889. PMC: 6374414. DOI: 10.1038/s41598-018-38246-3. View

16.

Smith R, Kuplicki R, Feinstein J, Forthman K, Stewart J, Paulus M . A Bayesian computational model reveals a failure to adapt interoceptive precision estimates across depression, anxiety, eating, and substance use disorders. PLoS Comput Biol. 2020; 16(12):e1008484. PMC: 7769623. DOI: 10.1371/journal.pcbi.1008484. View

17.

Smith R, Taylor S, Stewart J, Guinjoan S, Ironside M, Kirlic N . Slower Learning Rates from Negative Outcomes in Substance Use Disorder over a 1-Year Period and Their Potential Predictive Utility. Comput Psychiatr. 2024; 6(1):117-141. PMC: 11104312. DOI: 10.5334/cpsy.85. View

18.

Sajid N, Ball P, Parr T, Friston K . Active Inference: Demystified and Compared. Neural Comput. 2021; 33(3):674-712. DOI: 10.1162/neco_a_01357. View

19.

FitzGerald T, Schwartenbeck P, Moutoussis M, Dolan R, Friston K . Active inference, evidence accumulation, and the urn task. Neural Comput. 2014; 27(2):306-28. PMC: 4426890. DOI: 10.1162/NECO_a_00699. View

20.

Kool W, Cushman F, Gershman S . When Does Model-Based Control Pay Off?. PLoS Comput Biol. 2016; 12(8):e1005090. PMC: 5001643. DOI: 10.1371/journal.pcbi.1005090. View