Knowledge-based Biomedical Data Science
Overview
Authors
Affiliations
Computational manipulation of knowledge is an important, and often under-appreciated, aspect of biomedical Data Science. The first Data Science initiative from the US National Institutes of Health was entitled "Big Data to Knowledge (BD2K)." The main emphasis of the more than $200M allocated to that program has been on "Big Data;" the "Knowledge" component has largely been the implicit assumption that the work will lead to new biomedical knowledge. However, there is long-standing and highly productive work in knowledge representation and reasoning, and computational processing of knowledge has a role in the world of Data Science. Knowledge-based biomedical Data Science involves the design and implementation of computer systems that about biomedicine. There are many ways in which a computational approach might act as if it knew something: for example, it might be able to answer a natural language question about a biomedical topic, or pass an exam; it might be able to use existing biomedical knowledge to rank or evaluate hypotheses; it might explain or interpret data in light of prior knowledge, either in a Bayesian or other sort of framework. These are all examples of automated reasoning that act on computational representations of knowledge. After a brief survey of existing approaches to knowledge-based data science, this position paper argues that such research is ripe for expansion, and expanded application.
Knowledge-based approaches to drug discovery for rare diseases.
Alves V, Korn D, Pervitsky V, Thieme A, Capuzzi S, Baker N Drug Discov Today. 2021; 27(2):490-502.
PMID: 34718207 PMC: 9124594. DOI: 10.1016/j.drudis.2021.10.014.
Knowledge-Based Biomedical Data Science.
Callahan T, Tripodi I, Pielke-Lombardo H, Hunter L Annu Rev Biomed Data Sci. 2021; 3:23-41.
PMID: 33954284 PMC: 8095730. DOI: 10.1146/annurev-biodatasci-010820-091627.
Korn D, Pervitsky V, Bobrowski T, Alves V, Schmitt C, Bizon C ChemRxiv. 2020; .
PMID: 33269341 PMC: 7709174. DOI: 10.26434/chemrxiv.13289222.
Pathway information extracted from 25 years of pathway figures.
Hanspers K, Riutta A, Summer-Kutmon M, Pico A Genome Biol. 2020; 21(1):273.
PMID: 33168034 PMC: 7649569. DOI: 10.1186/s13059-020-02181-2.