Context Matters: Recovering Human Semantic Structure from Machine Learning Analysis of Large-Scale Text Corpora

Overview

Journal Cogn Sci

Specialty Psychology

Date 2022 Feb 11

PMID 35146779

Authors

Marius Catalin Iordan

Tyler Giallanza

Cameron T Ellis

Nicole M Beckage

Jonathan D Cohen

Affiliations

Soon will be listed here.

Abstract

Applying machine learning algorithms to automatically infer relationships between concepts from large-scale collections of documents presents a unique opportunity to investigate at scale how human semantic knowledge is organized, how people use it to make fundamental judgments ("How similar are cats and bears?"), and how these judgments depend on the features that describe concepts (e.g., size, furriness). However, efforts to date have exhibited a substantial discrepancy between algorithm predictions and human empirical judgments. Here, we introduce a novel approach to generating embeddings for this purpose motivated by the idea that semantic context plays a critical role in human judgment. We leverage this idea by constraining the topic or domain from which documents used for generating embeddings are drawn (e.g., referring to the natural world vs. transportation apparatus). Specifically, we trained state-of-the-art machine learning algorithms using contextually-constrained text corpora (domain-specific subsets of Wikipedia articles, 50+ million words each) and showed that this procedure greatly improved predictions of empirical similarity judgments and feature ratings of contextually relevant concepts. Furthermore, we describe a novel, computationally tractable method for improving predictions of contextually-unconstrained embedding models based on dimensionality reduction of their internal representation to a small number of contextually relevant semantic features. By improving the correspondence between predictions derived automatically by machine learning methods using vast amounts of data and more limited, but direct empirical measurements of human judgments, our approach may help leverage the availability of online corpora to better understand the structure of human semantic representations and how people make judgments based on those.

Citing Articles

THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior.

Hebart M, Contier O, Teichmann L, Rockter A, Zheng C, Kidder A Elife. 2023; 12.

PMID: 36847339 PMC: 10038662. DOI: 10.7554/eLife.82580.

Beyond the Benchmarks: Toward Human-Like Lexical Representations.

Stevenson S, Merlo P Front Artif Intell. 2022; 5:796741.

PMID: 35685444 PMC: 9170951. DOI: 10.3389/frai.2022.796741.

Semantic projection recovers rich human knowledge of multiple object features from word embeddings.

Grand G, Blank I, Pereira F, Fedorenko E Nat Hum Behav. 2022; 6(7):975-987.

PMID: 35422527 PMC: 10349641. DOI: 10.1038/s41562-022-01316-8.

Behavioral correlates of cortical semantic representations modeled by word vectors.

Nishida S, Blanc A, Maeda N, Kado M, Nishimoto S PLoS Comput Biol. 2021; 17(6):e1009138.

PMID: 34161315 PMC: 8260002. DOI: 10.1371/journal.pcbi.1009138.

Revealing the multidimensional mental representations of natural objects underlying human similarity judgements.

Hebart M, Zheng C, Pereira F, Baker C Nat Hum Behav. 2020; 4(11):1173-1185.

PMID: 33046861 PMC: 7666026. DOI: 10.1038/s41562-020-00951-3.

References

McRae K, Cree G, Seidenberg M, McNorgan C . Semantic feature production norms for a large set of living and nonliving things. Behav Res Methods. 2006; 37(4):547-59. DOI: 10.3758/bf03192726. View

Huth A, Nishimoto S, Vu A, Gallant J . A continuous semantic space describes the representation of thousands of object and action categories across the human brain. Neuron. 2012; 76(6):1210-24. PMC: 3556488. DOI: 10.1016/j.neuron.2012.10.014. View

Cukur T, Nishimoto S, Huth A, Gallant J . Attention during natural vision warps semantic representation across the human brain. Nat Neurosci. 2013; 16(6):763-70. PMC: 3929490. DOI: 10.1038/nn.3381. View

Huth A, de Heer W, Griffiths T, Theunissen F, Gallant J . Natural speech reveals the semantic maps that tile human cerebral cortex. Nature. 2016; 532(7600):453-8. PMC: 4852309. DOI: 10.1038/nature17637. View

Carvalho P, Goldstone R . The sequence of study changes what information is attended to, encoded, and remembered during category learning. J Exp Psychol Learn Mem Cogn. 2017; 43(11):1699-1719. DOI: 10.1037/xlm0000406. View

DiCarlo J, Cox D . Untangling invariant object recognition. Trends Cogn Sci. 2007; 11(8):333-41. DOI: 10.1016/j.tics.2007.06.010. View

Nosofsky R . Attention, similarity, and the identification-categorization relationship. J Exp Psychol Gen. 1986; 115(1):39-61. DOI: 10.1037//0096-3445.115.1.39. View

Hebart M, Zheng C, Pereira F, Baker C . Revealing the multidimensional mental representations of natural objects underlying human similarity judgements. Nat Hum Behav. 2020; 4(11):1173-1185. PMC: 7666026. DOI: 10.1038/s41562-020-00951-3. View

Nosofsky R . Choice, similarity, and the context theory of classification. J Exp Psychol Learn Mem Cogn. 1984; 10(1):104-14. DOI: 10.1037//0278-7393.10.1.104. View

10.

Pereira F, Gershman S, Ritter S, Botvinick M . A comparative evaluation of off-the-shelf distributed semantic representations for modelling behavioural data. Cogn Neuropsychol. 2016; 33(3-4):175-90. DOI: 10.1080/02643294.2016.1176907. View

11.

Brown R . How shall a thing be called. Psychol Rev. 1958; 65(1):14-21. DOI: 10.1037/h0041727. View

12.

Goldstone R, Medin D, Halberstadt J . Similarity in context. Mem Cognit. 1997; 25(2):237-55. DOI: 10.3758/bf03201115. View

13.

Catalin Iordan M, Greene M, Beck D, Fei-Fei L . Basic level category structure emerges gradually across human ventral visual cortex. J Cogn Neurosci. 2015; 27(7):1427-46. DOI: 10.1162/jocn_a_00790. View

14.

Pereira F, Lou B, Pritchett B, Ritter S, Gershman S, Kanwisher N . Toward a universal decoder of linguistic meaning from brain activation. Nat Commun. 2018; 9(1):963. PMC: 5840373. DOI: 10.1038/s41467-018-03068-4. View

15.

Barsalou L . Context-independent and context-dependent information in concepts. Mem Cognit. 1982; 10(1):82-93. DOI: 10.3758/bf03197629. View

16.

Caliskan A, Bryson J, Narayanan A . Semantics derived automatically from language corpora contain human-like biases. Science. 2017; 356(6334):183-186. DOI: 10.1126/science.aal4230. View

17.

Richie R, Bhatia S . Similarity Judgment Within and Across Categories: A Comprehensive Model Comparison. Cogn Sci. 2021; 45(8):e13030. DOI: 10.1111/cogs.13030. View

18.

Tversky B, Hemenway K . Objects, parts, and categories. J Exp Psychol Gen. 1984; 113(2):169-97. View

19.

Biederman I . Recognition-by-components: a theory of human image understanding. Psychol Rev. 1987; 94(2):115-147. DOI: 10.1037/0033-295X.94.2.115. View

20.

Ashby F, Lee W . Predicting similarity and categorization from identification. J Exp Psychol Gen. 1991; 120(2):150-72. DOI: 10.1037//0096-3445.120.2.150. View