» Articles » PMID: 36042196

Synthesizing Theories of Human Language with Bayesian Program Induction

Overview
Journal Nat Commun
Specialty Biology
Date 2022 Aug 30
PMID 36042196
Authors
Affiliations
Soon will be listed here.
Abstract

Automated, data-driven construction and evaluation of scientific models and theories is a long-standing challenge in artificial intelligence. We present a framework for algorithmically synthesizing models of a basic part of human language: morpho-phonology, the system that builds word forms from sounds. We integrate Bayesian inference with program synthesis and representations inspired by linguistic theory and cognitive models of learning and discovery. Across 70 datasets from 58 diverse languages, our system synthesizes human-interpretable models for core aspects of each language's morpho-phonology, sometimes approaching models posited by human linguists. Joint inference across all 70 data sets automatically synthesizes a meta-model encoding interpretable cross-language typological tendencies. Finally, the same algorithm captures few-shot learning dynamics, acquiring new morphophonological rules from just one or a few examples. These results suggest routes to more powerful machine-enabled discovery of interpretable models in linguistics and other scientific domains.

Citing Articles

Symbolic metaprogram search improves learning efficiency and explains rule learning in humans.

Rule J, Piantadosi S, Cropper A, Ellis K, Nye M, Tenenbaum J Nat Commun. 2024; 15(1):6847.

PMID: 39127796 PMC: 11316799. DOI: 10.1038/s41467-024-50966-x.


Systematic testing of three Language Models reveals low language accuracy, absence of response stability, and a yes-response bias.

Dentella V, Gunther F, Leivada E Proc Natl Acad Sci U S A. 2023; 120(51):e2309583120.

PMID: 38091290 PMC: 10743380. DOI: 10.1073/pnas.2309583120.


Emotion prediction as computation over a generative theory of mind.

Houlihan S, Kleiman-Weiner M, Hewitt L, Tenenbaum J, Saxe R Philos Trans A Math Phys Eng Sci. 2023; 381(2251):20220047.

PMID: 37271174 PMC: 10239682. DOI: 10.1098/rsta.2022.0047.

References
1.
Gerken L . Decisions, decisions: infant language learning when multiple generalizations are possible. Cognition. 2005; 98(3):B67-74. DOI: 10.1016/j.cognition.2005.03.003. View

2.
Frank M, Tenenbaum J, Gibson E . Learning and long-term retention of large-scale artificial languages. PLoS One. 2013; 8(1):e52500. PMC: 3534673. DOI: 10.1371/journal.pone.0052500. View

3.
Gerken L . Infants use rational decision criteria for choosing among models of their input. Cognition. 2010; 115(2):362-6. PMC: 2835817. DOI: 10.1016/j.cognition.2010.01.006. View

4.
Frank M, Tenenbaum J . Three ideal observer models for rule learning in simple languages. Cognition. 2010; 120(3):360-71. DOI: 10.1016/j.cognition.2010.10.005. View

5.
Perfors A, Tenenbaum J, Regier T . The learnability of abstract syntactic principles. Cognition. 2010; 118(3):306-38. DOI: 10.1016/j.cognition.2010.11.001. View