Class Prediction and Feature Selection with Linear Optimization for Metagenomic Count Data

Overview

Journal PLoS One

Specialties General Medicine
Science

Date 2013 Apr 5

PMID 23555553

Citations 4

Authors

Zhenqiu Liu

Dechang Chen

Li Sheng

Amy Y Liu

Affiliations

Soon will be listed here.

Abstract

The amount of metagenomic data is growing rapidly while the computational methods for metagenome analysis are still in their infancy. It is important to develop novel statistical learning tools for the prediction of associations between bacterial communities and disease phenotypes and for the detection of differentially abundant features. In this study, we presented a novel statistical learning method for simultaneous association prediction and feature selection with metagenomic samples from two or multiple treatment populations on the basis of count data. We developed a linear programming based support vector machine with L(1) and joint L(1,∞) penalties for binary and multiclass classifications with metagenomic count data (metalinprog). We evaluated the performance of our method on several real and simulation datasets. The proposed method can simultaneously identify features and predict classes with the metagenomic count data.

Citing Articles

Bacterial clade-specific analysis identifies distinct epithelial responses in inflammatory bowel disease.

DAdamo G, Chonwerawong M, Gearing L, Marcelino V, Gould J, Rutten E Cell Rep Med. 2023; 4(7):101124.

PMID: 37467722 PMC: 10394256. DOI: 10.1016/j.xcrm.2023.101124.

Non-invasive monitoring of multiple wildlife health factors by fecal microbiome analysis.

Pannoni S, Proffitt K, Holben W Ecol Evol. 2022; 12(2):e8564.

PMID: 35154651 PMC: 8826075. DOI: 10.1002/ece3.8564.

Sparse support vector machines with L approximation for ultra-high dimensional omics data.

Liu Z, Elashoff D, Piantadosi S Artif Intell Med. 2019; 96:134-141.

PMID: 31164207 PMC: 6553498. DOI: 10.1016/j.artmed.2019.04.004.

Opportunities and obstacles for deep learning in biology and medicine.

Ching T, Himmelstein D, Beaulieu-Jones B, Kalinin A, Do B, Way G J R Soc Interface. 2018; 15(141).

PMID: 29618526 PMC: 5938574. DOI: 10.1098/rsif.2017.0387.

References

Liu Z, Hsiao W, Cantarel B, Drabek E, Fraser-Liggett C . Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data. Bioinformatics. 2011; 27(23):3242-9. PMC: 3223360. DOI: 10.1093/bioinformatics/btr547. View

White J, Nagarajan N, Pop M . Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol. 2009; 5(4):e1000352. PMC: 2661018. DOI: 10.1371/journal.pcbi.1000352. View

Schloss P, Westcott S, Ryabin T, Hall J, Hartmann M, Hollister E . Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009; 75(23):7537-41. PMC: 2786419. DOI: 10.1128/AEM.01541-09. View

Fierer N, Lauber C, Zhou N, McDonald D, Costello E, Knight R . Forensic identification using skin bacterial communities. Proc Natl Acad Sci U S A. 2010; 107(14):6477-81. PMC: 2852011. DOI: 10.1073/pnas.1000162107. View

Liu Z, Lin S, Tan M . Sparse support vector machines with Lp penalty for biomarker identification. IEEE/ACM Trans Comput Biol Bioinform. 2010; 7(1):100-7. DOI: 10.1109/TCBB.2008.17. View

Wooley J, Godzik A, Friedberg I . A primer on metagenomics. PLoS Comput Biol. 2010; 6(2):e1000667. PMC: 2829047. DOI: 10.1371/journal.pcbi.1000667. View

Rosen G, Reichenberger E, Rosenfeld A . NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics. 2010; 27(1):127-9. PMC: 3008645. DOI: 10.1093/bioinformatics/btq619. View

Brady A, Salzberg S . Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat Methods. 2009; 6(9):673-6. PMC: 2762791. DOI: 10.1038/nmeth.1358. View

Gerlach W, Stoye J . Taxonomic classification of metagenomic shotgun sequences with CARMA3. Nucleic Acids Res. 2011; 39(14):e91. PMC: 3152360. DOI: 10.1093/nar/gkr225. View

10.

Turnbaugh P, Ley R, Hamady M, Fraser-Liggett C, Knight R, Gordon J . The human microbiome project. Nature. 2007; 449(7164):804-10. PMC: 3709439. DOI: 10.1038/nature06244. View

11.

Huson D, Mitra S, Ruscheweyh H, Weber N, Schuster S . Integrative analysis of environmental sequences using MEGAN4. Genome Res. 2011; 21(9):1552-60. PMC: 3166839. DOI: 10.1101/gr.120618.111. View

12.

Lozupone C, Lladser M, Knights D, Stombaugh J, Knight R . UniFrac: an effective distance metric for microbial community comparison. ISME J. 2010; 5(2):169-72. PMC: 3105689. DOI: 10.1038/ismej.2010.133. View

13.

Stark M, Berger S, Stamatakis A, von Mering C . MLTreeMap--accurate Maximum Likelihood placement of environmental DNA sequences into taxonomic and functional reference phylogenies. BMC Genomics. 2010; 11:461. PMC: 3091657. DOI: 10.1186/1471-2164-11-461. View

14.

Zheng H, Wu H . Short prokaryotic DNA fragment binning using a hierarchical classifier based on linear discriminant analysis and principal component analysis. J Bioinform Comput Biol. 2010; 8(6):995-1011. DOI: 10.1142/s0219720010005051. View

15.

Altschul S, Gish W, Miller W, Myers E, Lipman D . Basic local alignment search tool. J Mol Biol. 1990; 215(3):403-10. DOI: 10.1016/S0022-2836(05)80360-2. View

16.

Teeling H, Waldmann J, Lombardot T, Bauer M, Glockner F . TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics. 2004; 5:163. PMC: 529438. DOI: 10.1186/1471-2105-5-163. View

17.

Antonov A, Tetko I, Prokopenko V, Kosykh D, Mewes H . A web portal for classification of expression data using maximal margin linear programming. Bioinformatics. 2004; 20(17):3284-5. DOI: 10.1093/bioinformatics/bth376. View

18.

Knights D, Costello E, Knight R . Supervised classification of human microbiota. FEMS Microbiol Rev. 2010; 35(2):343-59. DOI: 10.1111/j.1574-6976.2010.00251.x. View

19.

Mohammed M, Ghosh T, Reddy R, Reddy C, Singh N, Mande S . INDUS - a composition-based approach for rapid and accurate taxonomic classification of metagenomic sequences. BMC Genomics. 2012; 12 Suppl 3:S4. PMC: 3333187. DOI: 10.1186/1471-2164-12-S3-S4. View

20.

Parks D, MacDonald N, Beiko R . Classifying short genomic fragments from novel lineages using composition and homology. BMC Bioinformatics. 2011; 12:328. PMC: 3173459. DOI: 10.1186/1471-2105-12-328. View