Incorporating Biological Structure into Machine Learning Models in Biomedicine

Overview

Journal Curr Opin Biotechnol

Publisher Elsevier

Specialty Biotechnology

Date 2020 Jan 22

PMID 31962244

Citations 13

Authors

Jake Crawford

Casey S Greene

Affiliations

Soon will be listed here.

Abstract

In biomedical applications of machine learning, relevant information often has a rich structure that is not easily encoded as real-valued predictors. Examples of such data include DNA or RNA sequences, gene sets or pathways, gene interaction or coexpression networks, ontologies, and phylogenetic trees. We highlight recent examples of machine learning models that use structure to constrain model architecture or incorporate structured data into model training. For machine learning in biomedicine, where sample size is limited and model interpretability is crucial, incorporating prior knowledge in the form of structured data can be particularly useful. The area of research would benefit from performant open source implementations and independent benchmarking efforts.

Citing Articles

Semisupervised Contrastive Learning for Bioactivity Prediction Using Cell Painting Image Data.

Bushiri Pwesombo D, Beese C, Schmied C, Sun H J Chem Inf Model. 2025; 65(2):528-543.

PMID: 39761993 PMC: 11776044. DOI: 10.1021/acs.jcim.4c00835.

Enhancing chemotherapy response prediction via matched colorectal tumor-organoid gene expression analysis and network-based biomarker selection.

Zhang W, Wu C, Huang H, Bleu P, Zambare W, Alvarez J Transl Oncol. 2025; 52():102238.

PMID: 39754813 PMC: 11754497. DOI: 10.1016/j.tranon.2024.102238.

DMOIT: denoised multi-omics integration approach based on transformer multi-head self-attention mechanism.

Liu Z, Park T Front Genet. 2024; 15:1488683.

PMID: 39720180 PMC: 11666520. DOI: 10.3389/fgene.2024.1488683.

Knowledge-slanted random forest method for high-dimensional data and small sample size with a feature selection application for gene expression data.

Cantor E, Guauque-Olarte S, Leon R, Chabert S, Salas R BioData Min. 2024; 17(1):34.

PMID: 39256872 PMC: 11389072. DOI: 10.1186/s13040-024-00388-8.

Predicting gene-level sensitivity to JAK-STAT signaling perturbation using a mechanistic-to-machine learning framework.

Cheemalavagu N, Shoger K, Cao Y, Michalides B, Botta S, Faeder J Cell Syst. 2024; 15(1):37-48.e4.

PMID: 38198893 PMC: 10812086. DOI: 10.1016/j.cels.2023.12.006.

References

Hofree M, Shen J, Carter H, Gross A, Ideker T . Network-based stratification of tumor mutations. Nat Methods. 2013; 10(11):1108-15. PMC: 3866081. DOI: 10.1038/nmeth.2651. View

Sahraeian S, Liu R, Lau B, Podesta K, Mohiyuddin M, Lam H . Deep convolutional neural networks for accurate somatic mutation detection. Nat Commun. 2019; 10(1):1041. PMC: 6399298. DOI: 10.1038/s41467-019-09027-x. View

Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J . STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2014; 43(Database issue):D447-52. PMC: 4383874. DOI: 10.1093/nar/gku1003. View

Xi J, Li A, Wang M . A novel network regularized matrix decomposition method to detect mutated cancer genes in tumour samples with inter-patient heterogeneity. Sci Rep. 2017; 7(1):2855. PMC: 5460199. DOI: 10.1038/s41598-017-03141-w. View

Staiger C, Cadot S, Kooter R, Dittrich M, Muller T, Klau G . A critical evaluation of network and pathway-based classifiers for outcome prediction in breast cancer. PLoS One. 2012; 7(4):e34796. PMC: 3338754. DOI: 10.1371/journal.pone.0034796. View

Kulmanov M, Khan M, Hoehndorf R, Wren J . DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics. 2017; 34(4):660-668. PMC: 5860606. DOI: 10.1093/bioinformatics/btx624. View

Cheng W, Zhang X, Guo Z, Shi Y, Wang W . Graph-regularized dual Lasso for robust eQTL mapping. Bioinformatics. 2014; 30(12):i139-48. PMC: 4058913. DOI: 10.1093/bioinformatics/btu293. View

Xiao J, Chen L, Yu Y, Zhang X, Chen J . A Phylogeny-Regularized Sparse Regression Model for Predictive Modeling of Microbial Community Data. Front Microbiol. 2019; 9:3112. PMC: 6305753. DOI: 10.3389/fmicb.2018.03112. View

Kirby M, Miranda R . Circular nodes in neural networks. Neural Comput. 1996; 8(2):390-402. DOI: 10.1162/neco.1996.8.2.390. View

10.

Manica M, Cadow J, Mathis R, Rodriguez Martinez M . PIMKL: Pathway-Induced Multiple Kernel Learning. NPJ Syst Biol Appl. 2019; 5:8. PMC: 6401099. DOI: 10.1038/s41540-019-0086-3. View

11.

Cowen L, Ideker T, Raphael B, Sharan R . Network propagation: a universal amplifier of genetic associations. Nat Rev Genet. 2017; 18(9):551-562. DOI: 10.1038/nrg.2017.38. View

12.

Bogard N, Linder J, Rosenberg A, Seelig G . A Deep Neural Network for Predicting and Engineering Alternative Polyadenylation. Cell. 2019; 178(1):91-106.e23. PMC: 6599575. DOI: 10.1016/j.cell.2019.04.046. View

13.

Collado-Torres L, Nellore A, Kammers K, Ellis S, Taub M, Hansen K . Reproducible RNA-seq analysis using recount2. Nat Biotechnol. 2017; 35(4):319-321. PMC: 6742427. DOI: 10.1038/nbt.3838. View

14.

Mao W, Zaslavsky E, Hartmann B, Sealfon S, Chikina M . Pathway-level information extractor (PLIER) for gene expression data. Nat Methods. 2019; 16(7):607-610. PMC: 7262669. DOI: 10.1038/s41592-019-0456-1. View

15.

Lin C, Jain S, Kim H, Bar-Joseph Z . Using neural networks for reducing the dimensions of single-cell RNA-Seq data. Nucleic Acids Res. 2017; 45(17):e156. PMC: 5737331. DOI: 10.1093/nar/gkx681. View

16.

Sekhon A, Singh R, Qi Y . DeepDiff: DEEP-learning for predicting DIFFerential gene expression from histone modifications. Bioinformatics. 2018; 34(17):i891-i900. DOI: 10.1093/bioinformatics/bty612. View

17.

Luo R, Sedlazeck F, Lam T, Schatz M . A multi-task convolutional deep neural network for variant calling in single molecule sequencing. Nat Commun. 2019; 10(1):998. PMC: 6397153. DOI: 10.1038/s41467-019-09025-z. View

18.

Gao B, Liu X, Li H, Cui Y . Integrative analysis of genetical genomics data incorporating network structures. Biometrics. 2019; 75(4):1063-1075. PMC: 6810723. DOI: 10.1111/biom.13072. View

19.

Dirmeier S, Fuchs C, Mueller N, Theis F . netReg: network-regularized linear models for biological association studies. Bioinformatics. 2017; 34(5):896-898. PMC: 6030897. DOI: 10.1093/bioinformatics/btx677. View

20.

Hao J, Kim Y, Kim T, Kang M . PASNet: pathway-associated sparse deep neural network for prognosis prediction from high-throughput data. BMC Bioinformatics. 2018; 19(1):510. PMC: 6296065. DOI: 10.1186/s12859-018-2500-z. View