» Articles » PMID: 39554804

Theoretical Foundations and Limits of Word Embeddings: What Types of Meaning Can They Capture?

Overview
Publisher Sage Publications
Date 2024 Nov 18
PMID 39554804
Authors
Affiliations
Soon will be listed here.
Abstract

Measuring meaning is a central problem in cultural sociology and word embeddings may offer powerful new tools to do so. But like any tool, they build on and exert theoretical assumptions. In this paper I theorize the ways in which word embeddings model three core premises of a structural linguistic theory of meaning: that meaning is coherent, relational, and may be analyzed as a static system. In certain ways, word embeddings are vulnerable to the enduring critiques of these premises. In other ways, word embeddings offer novel solutions to these critiques. More broadly, formalizing the study of meaning with word embeddings offers theoretical opportunities to clarify core concepts and debates in cultural sociology, such as the coherence of meaning. Just as network analysis specified the once vague notion of social relations, formalizing meaning with embeddings can push us to specify and reimagine meaning itself.

Citing Articles

Natural Language Processing Methods for the Study of Protein-Ligand Interactions.

Michels J, Bandarupalli R, Ahangar Akbari A, Le T, Xiao H, Li J J Chem Inf Model. 2025; 65(5):2191-2213.

PMID: 39993834 PMC: 11898065. DOI: 10.1021/acs.jcim.4c01907.


Schizophrenia more employable than depression? Language-based artificial intelligence model ratings for employability of psychiatric diagnoses and somatic and healthy controls.

Lange M, Koliousis A, Fayez F, Gogarty E, Twumasi R PLoS One. 2025; 20(1):e0315768.

PMID: 39774560 PMC: 11709238. DOI: 10.1371/journal.pone.0315768.


Natural Language Processing Methods for the Study of Protein-Ligand Interactions.

Michels J, Bandarupalli R, Ahangar Akbari A, Le T, Xiao H, Li J ArXiv. 2024; .

PMID: 39483353 PMC: 11527106.

References
1.
Charlesworth T, Caliskan A, Banaji M . Historical representations of social groups across 200 years of word embeddings from Google Books. Proc Natl Acad Sci U S A. 2022; 119(28):e2121798119. PMC: 9282454. DOI: 10.1073/pnas.2121798119. View

2.
Caliskan A, Bryson J, Narayanan A . Semantics derived automatically from language corpora contain human-like biases. Science. 2017; 356(6334):183-186. DOI: 10.1126/science.aal4230. View

3.
Hollis G . Estimating the average need of semantic knowledge from distributional semantic models. Mem Cognit. 2017; 45(8):1350-1370. DOI: 10.3758/s13421-017-0732-1. View

4.
Gunther F, Rinaldi L, Marelli M . Vector-Space Models of Semantic Representation From a Cognitive Perspective: A Discussion of Common Misconceptions. Perspect Psychol Sci. 2019; 14(6):1006-1033. DOI: 10.1177/1745691619861372. View

5.
Charlesworth T, Yang V, Mann T, Kurdi B, Banaji M . Gender Stereotypes in Natural Language: Word Embeddings Show Robust Consistency Across Child and Adult Language Corpora of More Than 65 Million Words. Psychol Sci. 2021; 32(2):218-240. DOI: 10.1177/0956797620963619. View