» Articles » PMID: 21984758

Sparse Distance-based Learning for Simultaneous Multiclass Classification and Feature Selection of Metagenomic Data

Overview
Journal Bioinformatics
Specialty Biology
Date 2011 Oct 11
PMID 21984758
Citations 23
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Direct sequencing of microbes in human ecosystems (the human microbiome) has complemented single genome cultivation and sequencing to understand and explore the impact of commensal microbes on human health. As sequencing technologies improve and costs decline, the sophistication of data has outgrown available computational methods. While several existing machine learning methods have been adapted for analyzing microbiome data recently, there is not yet an efficient and dedicated algorithm available for multiclass classification of human microbiota.

Results: By combining instance-based and model-based learning, we propose a novel sparse distance-based learning method for simultaneous class prediction and feature (variable or taxa, which is used interchangeably) selection from multiple treatment populations on the basis of 16S rRNA sequence count data. Our proposed method simultaneously minimizes the intraclass distance and maximizes the interclass distance with many fewer estimated parameters than other methods. It is very efficient for problems with small sample sizes and unbalanced classes, which are common in metagenomic studies. We implemented this method in a MATLAB toolbox called MetaDistance. We also propose several approaches for data normalization and variance stabilization transformation in MetaDistance. We validate this method on several real and simulated 16S rRNA datasets to show that it outperforms existing methods for classifying metagenomic data. This article is the first to address simultaneous multifeature selection and class prediction with metagenomic count data.

Availability: The MATLAB toolbox is freely available online at http://metadistance.igs.umaryland.edu/.

Contact: zliu@umm.edu

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

A toolbox of machine learning software to support microbiome analysis.

Marcos-Zambrano L, Lopez-Molina V, Bakir-Gungor B, Frohme M, Karaduzovic-Hadziabdic K, Klammsteiner T Front Microbiol. 2023; 14:1250806.

PMID: 38075858 PMC: 10704913. DOI: 10.3389/fmicb.2023.1250806.


Overview of data preprocessing for machine learning applications in human microbiome research.

Ibrahimi E, Lopes M, Dhamo X, Simeon A, Shigdel R, Hron K Front Microbiol. 2023; 14:1250909.

PMID: 37869650 PMC: 10588656. DOI: 10.3389/fmicb.2023.1250909.


Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment.

Marcos-Zambrano L, Karaduzovic-Hadziabdic K, Loncar Turukalo T, Przymus P, Trajkovik V, Aasmets O Front Microbiol. 2021; 12:634511.

PMID: 33737920 PMC: 7962872. DOI: 10.3389/fmicb.2021.634511.


DCMD: Distance-based classification using mixture distributions on microbiome data.

Shestopaloff K, Dong M, Gao F, Xu W PLoS Comput Biol. 2021; 17(3):e1008799.

PMID: 33711013 PMC: 7990174. DOI: 10.1371/journal.pcbi.1008799.


TaxoNN: ensemble of neural networks on stratified microbiome data for disease prediction.

Sharma D, Paterson A, Xu W Bioinformatics. 2020; 36(17):4544-4550.

PMID: 32449747 PMC: 7750934. DOI: 10.1093/bioinformatics/btaa542.


References
1.
Mitra S, Klar B, Huson D . Visual and statistical comparison of metagenomes. Bioinformatics. 2009; 25(15):1849-55. DOI: 10.1093/bioinformatics/btp341. View

2.
White J, Nagarajan N, Pop M . Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol. 2009; 5(4):e1000352. PMC: 2661018. DOI: 10.1371/journal.pcbi.1000352. View

3.
Knights D, Costello E, Knight R . Supervised classification of human microbiota. FEMS Microbiol Rev. 2010; 35(2):343-59. DOI: 10.1111/j.1574-6976.2010.00251.x. View

4.
Schloss P, Westcott S, Ryabin T, Hall J, Hartmann M, Hollister E . Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009; 75(23):7537-41. PMC: 2786419. DOI: 10.1128/AEM.01541-09. View

5.
Qin J, Li R, Raes J, Arumugam M, Burgdorf K, Manichanh C . A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010; 464(7285):59-65. PMC: 3779803. DOI: 10.1038/nature08821. View