Linking Confidence Biases to Reinforcement-learning Processes

Overview

Journal Psychol Rev

Publisher American Psychological Association

Specialty Psychology

Date 2023 May 8

PMID 37155268

Authors

Nahuel Salem-Garcia

Stefano Palminteri

Mael Lebreton

Affiliations

Soon will be listed here.

Abstract

We systematically misjudge our own performance in simple economic tasks. First, we generally overestimate our ability to make correct choices-a bias called overconfidence. Second, we are more confident in our choices when we seek gains than when we try to avoid losses-a bias we refer to as the valence-induced confidence bias. Strikingly, these two biases are also present in reinforcement-learning (RL) contexts, despite the fact that outcomes are provided trial-by-trial and could, in principle, be used to recalibrate confidence judgments online. How confidence biases emerge and are maintained in reinforcement-learning contexts is thus puzzling and still unaccounted for. To explain this paradox, we propose that confidence biases stem from learning biases, and test this hypothesis using data from multiple experiments, where we concomitantly assessed instrumental choices and confidence judgments, during learning and transfer phases. Our results first show that participants' choices in both tasks are best accounted for by a reinforcement-learning model featuring context-dependent learning and confirmatory updating. We then demonstrate that the complex, biased pattern of confidence judgments elicited during both tasks can be explained by an overweighting of the learned value of the chosen option in the computation of confidence judgments. We finally show that, consequently, the individual learning model parameters responsible for the learning biases-confirmatory updating and outcome context-dependency-are predictive of the individual metacognitive biases. We conclude suggesting that the metacognitive biases originate from fundamentally biased learning computations. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

Citing Articles

Distorted learning from local metacognition supports transdiagnostic underconfidence.

Katyal S, Huys Q, Dolan R, Fleming S Nat Commun. 2025; 16(1):1854.

PMID: 39984460 PMC: 11845503. DOI: 10.1038/s41467-025-57040-0.

Touch-driven advantages in reaction time but not in performance in a cross-sensory comparison of reinforcement learning.

Sun W, Ripp I, Borrmann A, Moll M, Fairhurst M Heliyon. 2025; 11(1):e41330.

PMID: 39839521 PMC: 11748724. DOI: 10.1016/j.heliyon.2024.e41330.

Time is Confidence: Monetary Incentives Metacognitive Profile on Duration Judgment.

Taghizadeh Sarabi M, Zimmermann E J Cogn. 2025; 8(1):8.

PMID: 39803177 PMC: 11721049. DOI: 10.5334/joc.414.

Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts.

Colas J, ODoherty J, Grafton S PLoS Comput Biol. 2024; 20(3):e1011950.

PMID: 38552190 PMC: 10980507. DOI: 10.1371/journal.pcbi.1011950.

Learning and metacognition under volatility in GD: Lower learning rates and distorted coupling between action and confidence.

Hoven M, Luigjes J, van Holst R J Behav Addict. 2024; 13(1):226-235.

PMID: 38340145 PMC: 10988407. DOI: 10.1556/2006.2023.00082.