Heterarchical Reinforcement-learning Model for Integration of Multiple Cortico-striatal Loops: FMRI Examination in Stimulus-action-reward Association Learning

Overview

Journal Neural Netw

Specialties Biology
Neurology

Date 2006 Sep 22

PMID 16987637

Citations 54

Authors

Masahiko Haruno

Mitsuo Kawato

Affiliations

Soon will be listed here.

Abstract

The brain's most difficult computation in decision-making learning is searching for essential information related to rewards among vast multimodal inputs and then integrating it into beneficial behaviors. Contextual cues consisting of limbic, cognitive, visual, auditory, somatosensory, and motor signals need to be associated with both rewards and actions by utilizing an internal representation such as reward prediction and reward prediction error. Previous studies have suggested that a suitable brain structure for such integration is the neural circuitry associated with multiple cortico-striatal loops. However, computational exploration still remains into how the information in and around these multiple closed loops can be shared and transferred. Here, we propose a "heterarchical reinforcement learning" model, where reward prediction made by more limbic and cognitive loops is propagated to motor loops by spiral projections between the striatum and substantia nigra, assisted by cortical projections to the pedunculopontine tegmental nucleus, which sends excitatory input to the substantia nigra. The model makes several fMRI-testable predictions of brain activity during stimulus-action-reward association learning. The caudate nucleus and the cognitive cortical areas are correlated with reward prediction error, while the putamen and motor-related areas are correlated with stimulus-action-dependent reward prediction. Furthermore, a heterogeneous activity pattern within the striatum is predicted depending on learning difficulty, i.e., the anterior medial caudate nucleus will be correlated more with reward prediction error when learning becomes difficult, while the posterior putamen will be correlated more with stimulus-action-dependent reward prediction in easy learning. Our fMRI results revealed that different cortico-striatal loops are operating, as suggested by the proposed model.

Citing Articles

Two Separate Brain Networks for Predicting Trainability and Tracking Training-Related Plasticity in Working Dogs.

Deshpande G, Zhao S, Waggoner P, Beyers R, Morrison E, Huynh N Animals (Basel). 2024; 14(7).

PMID: 38612321 PMC: 11010877. DOI: 10.3390/ani14071082.

Dopamine transients follow a striatal gradient of reward time horizons.

Mohebi A, Wei W, Pelattini L, Kim K, Berke J Nat Neurosci. 2024; 27(4):737-746.

PMID: 38321294 PMC: 11001583. DOI: 10.1038/s41593-023-01566-3.

Neural Correlates of Positive Outcome Expectancy for Aggression: Evidence from Voxel-Based Morphometry and Resting-State Functional Connectivity Analysis.

Wei J, Xia L Brain Sci. 2024; 14(1).

PMID: 38248258 PMC: 10813425. DOI: 10.3390/brainsci14010043.

Motor Cortex Response to Pleasant Odor Perception and Imagery: The Differential Role of Personality Dimensions and Imagery Ability.

Infortuna C, Gualano F, Freedberg D, Patel S, Sheikh A, Muscatello M Front Hum Neurosci. 2022; 16:943469.

PMID: 35903786 PMC: 9314567. DOI: 10.3389/fnhum.2022.943469.

Value signals guide abstraction during learning.

Cortese A, Yamamoto A, Hashemzadeh M, Sepulveda P, Kawato M, de Martino B Elife. 2021; 10.

PMID: 34254586 PMC: 8331191. DOI: 10.7554/eLife.68943.