» Articles » PMID: 32591513

Building a PubMed Knowledge Graph

Overview
Journal Sci Data
Specialty Science
Date 2020 Jun 28
PMID 32591513
Citations 38
Authors
Affiliations
Soon will be listed here.
Abstract

PubMed is an essential resource for the medical domain, but useful concepts are either difficult to extract or are ambiguous, which has significantly hindered knowledge discovery. To address this issue, we constructed a PubMed knowledge graph (PKG) by extracting bio-entities from 29 million PubMed abstracts, disambiguating author names, integrating funding data through the National Institutes of Health (NIH) ExPORTER, collecting affiliation history and educational background of authors from ORCID, and identifying fine-grained affiliation data from MapAffil. Through the integration of these credible multi-source data, we could create connections among the bio-entities, authors, articles, affiliations, and funding. Data validation revealed that the BioBERT deep learning method of bio-entity extraction significantly outperformed the state-of-the-art models based on the F1 score (by 0.51%), with the author name disambiguation (AND) achieving an F1 score of 98.09%. PKG can trigger broader innovations, not only enabling us to measure scholarly impact, knowledge usage, and knowledge transfer, but also assisting us in profiling authors and organizations based on their connections with bio-entities.

Citing Articles

A knowledge graph for crop diseases and pests in China.

Yan R, An P, Meng X, Li Y, Li D, Xu F Sci Data. 2025; 12(1):222.

PMID: 39915513 PMC: 11802884. DOI: 10.1038/s41597-025-04492-0.


Historiography of Scientific Publishing across Cultures and Disciplines.

Hosur B, Tripathi M, Vyas S, Shaikh S, Ahuja C Indian J Radiol Imaging. 2025; 35(Suppl 1):S2-S8.

PMID: 39802709 PMC: 11717451. DOI: 10.1055/s-0044-1800865.


Construction, Deployment, and Usage of the Human Reference Atlas Knowledge Graph for Linked Open Data.

Bueckle A, Herr 2nd B, Herr B, Hardi J, Quardokus E, Musen M bioRxiv. 2025; .

PMID: 39764040 PMC: 11703146. DOI: 10.1101/2024.12.22.630006.


Lack of diffusion of popular scientific ideas marks the presence of epistemic 'bubbles'.

Nat Hum Behav. 2025; 9(2):250-251.

PMID: 39747406 DOI: 10.1038/s41562-024-02042-z.


Limited diffusion of scientific knowledge forecasts collapse.

Kang D, Danziger R, Rehman J, Evans J Nat Hum Behav. 2024; 9(2):268-276.

PMID: 39622978 DOI: 10.1038/s41562-024-02041-0.


References
1.
Li J, Sun Y, Johnson R, Sciaky D, Wei C, Leaman R . BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database (Oxford). 2016; 2016. PMC: 4860626. DOI: 10.1093/database/baw068. View

2.
Yue W, Yang C, DiPaola R, Tan X . Repurposing of metformin and aspirin by targeting AMPK-mTOR and inflammation for pancreatic cancer prevention and treatment. Cancer Prev Res (Phila). 2014; 7(4):388-97. DOI: 10.1158/1940-6207.CAPR-13-0337. View

3.
Lipscomb C . Medical Subject Headings (MeSH). Bull Med Libr Assoc. 2000; 88(3):265-6. PMC: 35238. View

4.
Sherry S, Ward M, Kholodov M, Baker J, Phan L, Smigielski E . dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2000; 29(1):308-11. PMC: 29783. DOI: 10.1093/nar/29.1.308. View

5.
Liu W, Islamaj Dogan R, Kim S, Comeau D, Kim W, Yeganova L . Author Name Disambiguation for PubMed. J Assoc Inf Sci Technol. 2017; 65(4):765-781. PMC: 5530597. DOI: 10.1002/asi.23063. View