» Articles » PMID: 39998433

A Large Language Model Framework for Literature-based Disease-gene Association Prediction

Overview
Journal Brief Bioinform
Specialty Biology
Date 2025 Feb 25
PMID 39998433
Authors
Affiliations
Soon will be listed here.
Abstract

With the exponential growth of biomedical literature, leveraging Large Language Models (LLMs) for automated medical knowledge understanding has become increasingly critical for advancing precision medicine. However, current approaches face significant challenges in reliability, verifiability, and scalability when extracting complex biological relationships from scientific literature using LLMs. To overcome the obstacles of LLM development in biomedical literature understating, we propose LORE, a novel unsupervised two-stage reading methodology with LLM that models literature as a knowledge graph of verifiable factual statements and, in turn, as semantic embeddings in Euclidean space. LORE captured essential gene pathogenicity information when applied to PubMed abstracts for large-scale understanding of disease-gene relationships. We demonstrated that modeling a latent pathogenic flow in the semantic embedding with supervision from the ClinVar database led to a 90% mean average precision in identifying relevant genes across 2097 diseases. This work provides a scalable and reproducible approach for leveraging LLMs in biomedical literature analysis, offering new opportunities for researchers to identify therapeutic targets efficiently.

References
1.
Tate J, Bamford S, Jubb H, Sondka Z, Beare D, Bindal N . COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res. 2018; 47(D1):D941-D947. PMC: 6323903. DOI: 10.1093/nar/gky1015. View

2.
Wei C, Allot A, Lai P, Leaman R, Tian S, Luo L . PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge. Nucleic Acids Res. 2024; 52(W1):W540-W546. PMC: 11223843. DOI: 10.1093/nar/gkae235. View

3.
Pinto B, Oliveira A, Singh Y, Jimenez L, Goncalves A, Ogava R . ACE2 Expression Is Increased in the Lungs of Patients With Comorbidities Associated With Severe COVID-19. J Infect Dis. 2020; 222(4):556-563. PMC: 7377288. DOI: 10.1093/infdis/jiaa332. View

4.
Landrum M, Lee J, Benson M, Brown G, Chao C, Chitipiralla S . ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 2015; 44(D1):D862-8. PMC: 4702865. DOI: 10.1093/nar/gkv1222. View

5.
Pinero J, Bravo A, Queralt-Rosinach N, Gutierrez-Sacristan A, Deu-Pons J, Centeno E . DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res. 2016; 45(D1):D833-D839. PMC: 5210640. DOI: 10.1093/nar/gkw943. View