» Articles » PMID: 39643729

Simple and Effective Embedding Model for Single-cell Biology Built from ChatGPT

Overview
Journal Nat Biomed Eng
Publisher Springer Nature
Date 2024 Dec 6
PMID 39643729
Authors
Affiliations
Soon will be listed here.
Abstract

Large-scale gene-expression data are being leveraged to pretrain models that implicitly learn gene and cellular functions. However, such models require extensive data curation and training. Here we explore a much simpler alternative: leveraging ChatGPT embeddings of genes based on the literature. We used GPT-3.5 to generate gene embeddings from text descriptions of individual genes and to then generate single-cell embeddings by averaging the gene embeddings weighted by each gene's expression level. We also created a sentence embedding for each cell by using only the gene names ordered by their expression level. On many downstream tasks used to evaluate pretrained single-cell embedding models-particularly, tasks of gene-property and cell-type classifications-our model, which we named GenePT, achieved comparable or better performance than models pretrained from gene-expression profiles of millions of cells. GenePT shows that large-language-model embeddings of the literature provide a simple and effective path to encoding single-cell biological knowledge.

Citing Articles

Small, Open-Source Text-Embedding Models as Substitutes to OpenAI Models for Gene Analysis.

Gan D, Li J bioRxiv. 2025; .

PMID: 40027770 PMC: 11870524. DOI: 10.1101/2025.02.15.638462.


EpiFoundation: A Foundation Model for Single-Cell ATAC-seq via Peak-to-Gene Alignment.

Wu J, Wan C, Ji Z, Zhou Y, Hou W bioRxiv. 2025; .

PMID: 39975086 PMC: 11839112. DOI: 10.1101/2025.02.05.636688.


Benchmarking large language models for genomic knowledge with GeneTuring.

Hou W, Shang X, Ji Z bioRxiv. 2023; .

PMID: 36993670 PMC: 10054955. DOI: 10.1101/2023.03.11.532238.

References
1.
Theodoris C, Xiao L, Chopra A, Chaffin M, Al Sayed Z, Hill M . Transfer learning enables predictions in network biology. Nature. 2023; 618(7965):616-624. PMC: 10949956. DOI: 10.1038/s41586-023-06139-9. View

2.
Cui H, Wang C, Maan H, Pang K, Luo F, Duan N . scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat Methods. 2024; 21(8):1470-1480. DOI: 10.1038/s41592-024-02201-0. View

3.
Biswas S . Role of Chat GPT in Public Health. Ann Biomed Eng. 2023; 51(5):868-869. DOI: 10.1007/s10439-023-03172-7. View

4.
Ayers J, Poliak A, Dredze M, Leas E, Zhu Z, Kelley J . Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. JAMA Intern Med. 2023; 183(6):589-596. PMC: 10148230. DOI: 10.1001/jamainternmed.2023.1838. View

5.
Strong E, DiGiammarino A, Weng Y, Kumar A, Hosamani P, Hom J . Chatbot vs Medical Student Performance on Free-Response Clinical Reasoning Examinations. JAMA Intern Med. 2023; 183(9):1028-1030. PMC: 10352923. DOI: 10.1001/jamainternmed.2023.2909. View