Class Prediction and Feature Selection with Linear Optimization for Metagenomic Count Data
Overview
Authors
Affiliations
The amount of metagenomic data is growing rapidly while the computational methods for metagenome analysis are still in their infancy. It is important to develop novel statistical learning tools for the prediction of associations between bacterial communities and disease phenotypes and for the detection of differentially abundant features. In this study, we presented a novel statistical learning method for simultaneous association prediction and feature selection with metagenomic samples from two or multiple treatment populations on the basis of count data. We developed a linear programming based support vector machine with L(1) and joint L(1,∞) penalties for binary and multiclass classifications with metagenomic count data (metalinprog). We evaluated the performance of our method on several real and simulation datasets. The proposed method can simultaneously identify features and predict classes with the metagenomic count data.
DAdamo G, Chonwerawong M, Gearing L, Marcelino V, Gould J, Rutten E Cell Rep Med. 2023; 4(7):101124.
PMID: 37467722 PMC: 10394256. DOI: 10.1016/j.xcrm.2023.101124.
Non-invasive monitoring of multiple wildlife health factors by fecal microbiome analysis.
Pannoni S, Proffitt K, Holben W Ecol Evol. 2022; 12(2):e8564.
PMID: 35154651 PMC: 8826075. DOI: 10.1002/ece3.8564.
Sparse support vector machines with L approximation for ultra-high dimensional omics data.
Liu Z, Elashoff D, Piantadosi S Artif Intell Med. 2019; 96:134-141.
PMID: 31164207 PMC: 6553498. DOI: 10.1016/j.artmed.2019.04.004.
Opportunities and obstacles for deep learning in biology and medicine.
Ching T, Himmelstein D, Beaulieu-Jones B, Kalinin A, Do B, Way G J R Soc Interface. 2018; 15(141).
PMID: 29618526 PMC: 5938574. DOI: 10.1098/rsif.2017.0387.