The Computational Development of Reinforcement Learning During Adolescence

Overview

Journal PLoS Comput Biol

Specialty Biology

Date 2016 Jun 21

PMID 27322574

Citations 53

Authors

Stefano Palminteri

Emma J Kilford

Giorgio Coricelli

Sarah-Jayne Blakemore

Affiliations

Soon will be listed here.

Abstract

Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents' behaviour was better explained by a basic reinforcement learning algorithm, adults' behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.

Citing Articles

Electrical brain activations in preadolescents during a probabilistic reward-learning task reflect cognitive processes and behavior strategies.

Chung Y, van den Berg B, Roberts K, Bagdasarov A, Woldorff M, Gaffrey M Front Hum Neurosci. 2025; 19:1460584.

PMID: 39949988 PMC: 11821623. DOI: 10.3389/fnhum.2025.1460584.

Interpretation of individual differences in computational neuroscience using a latent input approach.

Schaaf J, Miletic S, van Duijvenvoorde A, Huizenga H Dev Cogn Neurosci. 2025; 72:101512.

PMID: 39854872 PMC: 11804603. DOI: 10.1016/j.dcn.2025.101512.

The preference for surprise in reinforcement learning underlies the differences in developmental changes in risk preference between autistic and neurotypical youth.

Sumiya M, Katahira K, Akechi H, Senju A Mol Autism. 2025; 16(1):3.

PMID: 39819491 PMC: 11740557. DOI: 10.1186/s13229-025-00637-5.

The connecting brain in context: How adolescent plasticity supports learning and development.

Baker A, Galvan A, Fuligni A Dev Cogn Neurosci. 2024; 71():101486.

PMID: 39631105 PMC: 11653146. DOI: 10.1016/j.dcn.2024.101486.

Decrease in decision noise from adolescence into adulthood mediates an increase in more sophisticated choice behaviors and performance gain.

Scholz V, Waltmann M, Herzog N, Horstmann A, Deserno L PLoS Biol. 2024; 22(11):e3002877.

PMID: 39541313 PMC: 11563475. DOI: 10.1371/journal.pbio.3002877.

References

Giedd J, Blumenthal J, Jeffries N, Castellanos F, Liu H, Zijdenbos A . Brain development during childhood and adolescence: a longitudinal MRI study. Nat Neurosci. 1999; 2(10):861-3. DOI: 10.1038/13158. View

Benes F, Taylor J, Cunningham M . Convergence and plasticity of monoaminergic systems in the medial prefrontal cortex during the postnatal period: implications for the development of psychopathology. Cereb Cortex. 2000; 10(10):1014-27. DOI: 10.1093/cercor/10.10.1014. View

ODoherty J, Dayan P, Schultz J, Deichmann R, Friston K, Dolan R . Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science. 2004; 304(5669):452-4. DOI: 10.1126/science.1094285. View

Gogtay N, Giedd J, Lusk L, Hayashi K, Greenstein D, Vaituzis A . Dynamic mapping of human cortical development during childhood through early adulthood. Proc Natl Acad Sci U S A. 2004; 101(21):8174-9. PMC: 419576. DOI: 10.1073/pnas.0402680101. View

Frank M, Seeberger L, OReilly R . By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science. 2004; 306(5703):1940-3. DOI: 10.1126/science.1102941. View

Steinberg L . Cognitive and affective development in adolescence. Trends Cogn Sci. 2005; 9(2):69-74. DOI: 10.1016/j.tics.2004.12.005. View

Casey B, Galvan A, Hare T . Changes in cerebral functional organization during cognitive development. Curr Opin Neurobiol. 2005; 15(2):239-44. DOI: 10.1016/j.conb.2005.03.012. View

Daw N, Niv Y, Dayan P . Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci. 2005; 8(12):1704-11. DOI: 10.1038/nn1560. View

Pessiglione M, Seymour B, Flandin G, Dolan R, Frith C . Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature. 2006; 442(7106):1042-5. PMC: 2636869. DOI: 10.1038/nature05051. View

10.

ODoherty J, Hampton A, Kim H . Model-based fMRI and its application to reward learning and decision making. Ann N Y Acad Sci. 2007; 1104:35-53. DOI: 10.1196/annals.1390.022. View

11.

Olsson A, Phelps E . Social learning of fear. Nat Neurosci. 2007; 10(9):1095-102. DOI: 10.1038/nn1968. View

12.

Brenhouse H, Sonntag K, Andersen S . Transient D1 dopamine receptor expression on prefrontal cortex projection neurons: relationship to enhanced motivational salience of drug cues in adolescence. J Neurosci. 2008; 28(10):2375-82. PMC: 4028226. DOI: 10.1523/JNEUROSCI.5064-07.2008. View

13.

Rangel A, Camerer C, Montague P . A framework for studying the neurobiology of value-based decision making. Nat Rev Neurosci. 2008; 9(7):545-56. PMC: 4332708. DOI: 10.1038/nrn2357. View

14.

van Duijvenvoorde A, Zanolie K, Rombouts S, Raijmakers M, Crone E . Evaluating the negative or valuing the positive? Neural mechanisms supporting feedback-based learning across development. J Neurosci. 2008; 28(38):9495-503. PMC: 6671119. DOI: 10.1523/JNEUROSCI.1485-08.2008. View

15.

Paus T, Keshavan M, Giedd J . Why do many psychiatric disorders emerge during adolescence?. Nat Rev Neurosci. 2008; 9(12):947-57. PMC: 2762785. DOI: 10.1038/nrn2513. View

16.

Ernst M, Fudge J . A developmental neurobiological model of motivated behavior: anatomy, connectivity and ontogeny of the triadic nodes. Neurosci Biobehav Rev. 2008; 33(3):367-82. PMC: 2696617. DOI: 10.1016/j.neubiorev.2008.10.009. View

17.

Hsu M, Krajbich I, Zhao C, Camerer C . Neural response to reward anticipation under risk is nonlinear in probabilities. J Neurosci. 2009; 29(7):2231-7. PMC: 6666337. DOI: 10.1523/JNEUROSCI.5296-08.2009. View

18.

Figner B, Mackinlay R, Wilkening F, Weber E . Affective and deliberative processes in risky choice: age differences in risk taking in the Columbia Card Task. J Exp Psychol Learn Mem Cogn. 2009; 35(3):709-30. DOI: 10.1037/a0014983. View

19.

Palminteri S, Lebreton M, Worbe Y, Grabli D, Hartmann A, Pessiglione M . Pharmacological modulation of subliminal learning in Parkinson's and Tourette's syndromes. Proc Natl Acad Sci U S A. 2009; 106(45):19179-84. PMC: 2776465. DOI: 10.1073/pnas.0904035106. View

20.

Maia T . Two-factor theory, the actor-critic model, and conditioned avoidance. Learn Behav. 2010; 38(1):50-67. DOI: 10.3758/LB.38.1.50. View