Incorporating Biological Structure into Machine Learning Models in Biomedicine
Overview
Authors
Affiliations
In biomedical applications of machine learning, relevant information often has a rich structure that is not easily encoded as real-valued predictors. Examples of such data include DNA or RNA sequences, gene sets or pathways, gene interaction or coexpression networks, ontologies, and phylogenetic trees. We highlight recent examples of machine learning models that use structure to constrain model architecture or incorporate structured data into model training. For machine learning in biomedicine, where sample size is limited and model interpretability is crucial, incorporating prior knowledge in the form of structured data can be particularly useful. The area of research would benefit from performant open source implementations and independent benchmarking efforts.
Semisupervised Contrastive Learning for Bioactivity Prediction Using Cell Painting Image Data.
Bushiri Pwesombo D, Beese C, Schmied C, Sun H J Chem Inf Model. 2025; 65(2):528-543.
PMID: 39761993 PMC: 11776044. DOI: 10.1021/acs.jcim.4c00835.
Zhang W, Wu C, Huang H, Bleu P, Zambare W, Alvarez J Transl Oncol. 2025; 52():102238.
PMID: 39754813 PMC: 11754497. DOI: 10.1016/j.tranon.2024.102238.
Liu Z, Park T Front Genet. 2024; 15:1488683.
PMID: 39720180 PMC: 11666520. DOI: 10.3389/fgene.2024.1488683.
Cantor E, Guauque-Olarte S, Leon R, Chabert S, Salas R BioData Min. 2024; 17(1):34.
PMID: 39256872 PMC: 11389072. DOI: 10.1186/s13040-024-00388-8.
Cheemalavagu N, Shoger K, Cao Y, Michalides B, Botta S, Faeder J Cell Syst. 2024; 15(1):37-48.e4.
PMID: 38198893 PMC: 10812086. DOI: 10.1016/j.cels.2023.12.006.