» Articles » PMID: 14872004

Finding Scientific Topics

Overview
Specialty Science
Date 2004 Feb 12
PMID 14872004
Citations 307
Authors
Affiliations
Soon will be listed here.
Abstract

A first step in identifying the content of a document is determining which topics that document addresses. We describe a generative model for documents, introduced by Blei, Ng, and Jordan [Blei, D. M., Ng, A. Y. & Jordan, M. I. (2003) J. Machine Learn. Res. 3, 993-1022], in which each document is generated by choosing a distribution over topics and then choosing each word in the document from a topic selected according to this distribution. We then present a Markov chain Monte Carlo algorithm for inference in this model. We use this algorithm to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics. We show that the extracted topics capture meaningful structure in the data, consistent with the class designations provided by the authors of the articles, and outline further applications of this analysis, including identifying "hot topics" by examining temporal dynamics and tagging abstracts to illustrate semantic content.

Citing Articles

Depression and Anxiety in Patients with Psoriasis: A Comprehensive Analysis Combining Bibliometrics, Latent Dirichlet Allocation, and HJ-Biplot.

Siteneski A, Montes-Escobar K, de la Hoz-M J, Lapo-Talledo G, Gutierrez Moreno G, Carlin Chavez E Healthcare (Basel). 2025; 13(5).

PMID: 40077004 PMC: 11899133. DOI: 10.3390/healthcare13050441.


Trends and Challenges in Plant Cryopreservation Research: A Meta-Analysis of Cryoprotective Agent Development and Research Focus.

Kang P, Kim S, Park H, Han S, Kim I, Lee H Plants (Basel). 2025; 14(3).

PMID: 39943009 PMC: 11821117. DOI: 10.3390/plants14030447.


Users' experiences of park accessibility and attractiveness based on online review analytics.

Mohamed A, Kronenberg J Sci Rep. 2025; 15(1):4268.

PMID: 39905208 PMC: 11794715. DOI: 10.1038/s41598-025-88500-8.


An overview of the literature on assistance dogs using text mining and topic analysis.

Bassan E, Mair A, De Santis M, Bugianelli M, Loretti E, Capecci A Front Vet Sci. 2024; 11:1463332.

PMID: 39723180 PMC: 11669006. DOI: 10.3389/fvets.2024.1463332.


LDAPrototype: a model selection algorithm to improve reliability of latent Dirichlet allocation.

Rieger J, Jentsch C, Rahnenfuhrer J PeerJ Comput Sci. 2024; 10:e2279.

PMID: 39678270 PMC: 11639148. DOI: 10.7717/peerj-cs.2279.


References
1.
Findlay C . Fundamental theorem of natural selection under gene-culture transmission. Proc Natl Acad Sci U S A. 1991; 88(11):4874-6. PMC: 51769. DOI: 10.1073/pnas.88.11.4874. View

2.
Geman S, Geman D . Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell. 2012; 6(6):721-41. DOI: 10.1109/tpami.1984.4767596. View