Inducing Domain-Specific Sentiment Lexicons from Unlabeled Corpora

Overview

Journal Proc Conf Empir Methods Nat Lang Process

Date 2017 Jun 30

PMID 28660257

Citations 15

Authors

William L Hamilton

Kevin Clark

Jure Leskovec

Dan Jurafsky

Affiliations

Soon will be listed here.

Abstract

A word's sentiment depends on the domain in which it is used. Computational social science research thus requires sentiment lexicons that are specific to the domains being studied. We combine domain-specific word embeddings with a label propagation framework to induce accurate domain-specific sentiment lexicons using small sets of seed words. We show that our approach achieves state-of-the-art performance on inducing sentiment lexicons from domain-specific corpora and that our purely corpus-based approach outperforms methods that rely on hand-curated resources (e.g., WordNet). Using our framework, we induce and release historical sentiment lexicons for 150 years of English and community-specific sentiment lexicons for 250 online communities from the social media forum Reddit. The historical lexicons we induce show that more than 5% of sentiment-bearing (non-neutral) English words completely switched polarity during the last 150 years, and the community-specific lexicons highlight how sentiment varies drastically between different communities.

Citing Articles

The linguistic and emotional effects of weather on UK social media users.

Young J, Arthur R, Williams H Sci Rep. 2025; 15(1):8009.

PMID: 40055332 PMC: 11889188. DOI: 10.1038/s41598-024-82384-w.

Moral Association Graph: A Cognitive Model for Automated Moral Inference.

Ramezani A, Xu Y Top Cogn Sci. 2024; 17(1):120-138.

PMID: 39585761 PMC: 11792775. DOI: 10.1111/tops.12774.

CIDER: Context-sensitive polarity measurement for short-form text.

Young J, Arthur R, Williams H PLoS One. 2024; 19(4):e0299490.

PMID: 38635650 PMC: 11025856. DOI: 10.1371/journal.pone.0299490.

Evaluating criminal justice reform during COVID-19: The need for a novel sentiment analysis package.

Ramjee D, Smith L, Doanvo A, Charpignon M, McNulty-Nebel A, Lett E PLOS Digit Health. 2023; 1(7):e0000063.

PMID: 36812565 PMC: 9931240. DOI: 10.1371/journal.pdig.0000063.

Text Mining Oral Histories in Historical Archaeology.

Brown M, Shackel P Int J Hist Archaeol. 2023; :1-17.

PMID: 36686603 PMC: 9838340. DOI: 10.1007/s10761-022-00680-5.

References

Asghar M, Khan A, Ahmad S, Ali Khan I, Kundi F . A Unified Framework for Creating Domain Dependent Polarity Lexicons from User Generated Reviews. PLoS One. 2015; 10(10):e0140204. PMC: 4605590. DOI: 10.1371/journal.pone.0140204. View

Warriner A, Kuperman V, Brysbaert M . Norms of valence, arousal, and dominance for 13,915 English lemmas. Behav Res Methods. 2013; 45(4):1191-207. DOI: 10.3758/s13428-012-0314-x. View

Bullinaria J, Levy J . Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD. Behav Res Methods. 2012; 44(3):890-907. DOI: 10.3758/s13428-011-0183-8. View

Dunphy D, Stone P, Smith M . The general inquirer: further developments in a computer system for content analysis of verbal data in the social sciences. Behav Sci. 1965; 10(4):468-80. View

Pechenick E, Danforth C, Dodds P . Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution. PLoS One. 2015; 10(10):e0137041. PMC: 4596490. DOI: 10.1371/journal.pone.0137041. View