» Articles » PMID: 30467459

Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities

Overview
Journal Inf Fusion
Publisher Elsevier
Date 2018 Nov 24
PMID 30467459
Citations 154
Authors
Affiliations
Soon will be listed here.
Abstract

New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include myriad properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.

Citing Articles

Unifying fragmented perspectives with additive deep learning for high-dimensional models from partial faceted datasets.

Wu Y, Wu P, Chambliss A, Wirtz D, Sun S NPJ Biol Phys Mech. 2025; 2(1):5.

PMID: 40012561 PMC: 11850287. DOI: 10.1038/s44341-025-00009-3.


Identification of WDR74 and TNFRSF12A as biomarkers for early osteoarthritis using machine learning and immunohistochemistry.

Chen Y, Lin J, Shi D, Miao Y, Xue F, Liu K Front Immunol. 2025; 16:1517646.

PMID: 39935469 PMC: 11810735. DOI: 10.3389/fimmu.2025.1517646.


Comparison of Machine Learning Models in Predicting Mental Health Sequelae Following Concussion in Youth.

Peng J, Chen J, Yin C, Zhang P, Yang J medRxiv. 2025; .

PMID: 39802784 PMC: 11722470. DOI: 10.1101/2025.01.02.24319733.


Prediction of Composite Clinical Outcomes for Childhood Neuroblastoma Using Multi-Omics Data and Machine Learning.

Wang P, Zhang J Int J Mol Sci. 2025; 26(1.

PMID: 39795994 PMC: 11720239. DOI: 10.3390/ijms26010136.


Predicting postoperative adhesive small bowel obstruction in infants under 3 months with intestinal malrotation: a random forest approach.

Chen P, Xiong H, Cao J, Cui M, Hou J, Guo Z J Pediatr (Rio J). 2025; 101(2):282-289.

PMID: 39765335 PMC: 11889664. DOI: 10.1016/j.jped.2024.11.011.


References
1.
Prensner J, Chinnaiyan A . The emergence of lncRNAs in cancer biology. Cancer Discov. 2011; 1(5):391-407. PMC: 3215093. DOI: 10.1158/2159-8290.CD-11-0209. View

2.
Setty M, Helmy K, Khan A, Silber J, Arvey A, Neezen F . Inferring transcriptional and microRNA-mediated regulatory programs in glioblastoma. Mol Syst Biol. 2012; 8:605. PMC: 3435504. DOI: 10.1038/msb.2012.37. View

3.
He Q, Johnston J, Zeitlinger J . ChIP-nexus enables improved detection of in vivo transcription factor binding footprints. Nat Biotechnol. 2015; 33(4):395-401. PMC: 4390430. DOI: 10.1038/nbt.3121. View

4.
Bailey T, Gribskov M . Combining evidence using p-values: application to sequence homology searches. Bioinformatics. 1998; 14(1):48-54. DOI: 10.1093/bioinformatics/14.1.48. View

5.
Zheng W, Lin H, Luo L, Zhao Z, Li Z, Zhang Y . An attention-based effective neural model for drug-drug interactions extraction. BMC Bioinformatics. 2017; 18(1):445. PMC: 5634850. DOI: 10.1186/s12859-017-1855-x. View