» Articles » PMID: 33381156

A T-SNE Based Classification Approach to Compositional Microbiome Data

Overview
Journal Front Genet
Date 2020 Dec 31
PMID 33381156
Citations 16
Authors
Affiliations
Soon will be listed here.
Abstract

As a data-driven dimensionality reduction and visualization tool, t-distributed stochastic neighborhood embedding (t-SNE) has been successfully applied to a variety of fields. In recent years, it has also received increasing attention for classification and regression analysis. This study presented a t-SNE based classification approach for compositional microbiome data, which enabled us to build classifiers and classify new samples in the reduced dimensional space produced by t-SNE. The Aitchison distance was employed to modify the conditional probabilities in t-SNE to account for the compositionality of microbiome data. To classify a new sample, its low-dimensional features were obtained as the weighted mean vector of its nearest neighbors in the training set. Using the low-dimensional features as input, three commonly used machine learning algorithms, logistic regression (LR), support vector machine (SVM), and decision tree (DT) were considered for classification tasks in this study. The proposed approach was applied to two disease-associated microbiome datasets, achieving better classification performance compared with the classifiers built in the original high-dimensional space. The analytic results also showed that t-SNE with Aitchison distance led to improvement of classification accuracy in both datasets. In conclusion, we have developed a t-SNE based classification approach that is suitable for compositional microbiome data and may also serve as a baseline for more complex classification models.

Citing Articles

Combination ATR-FTIR with Multiple Classification Algorithms for Authentication of the Four Medicinal Plants from L. in Rhizomes and Tuberous Roots.

Wen Q, Wei W, Li Y, Chen D, Zhang J, Li Z Sensors (Basel). 2025; 25(1.

PMID: 39796841 PMC: 11722871. DOI: 10.3390/s25010050.


Peripheral Single-Cell Immune Characteristics Contribute to the Diagnosis of Alzheimer's Disease and Dementia With Lewy Bodies.

Qiu C, Zhang D, Wang M, Mei X, Chen W, Yu H CNS Neurosci Ther. 2025; 31(1):e70204.

PMID: 39754303 PMC: 11702477. DOI: 10.1111/cns.70204.


Exploring and exploiting the rice phytobiome to tackle climate change challenges.

Hosseiniyan Khatibi S, Dimaano N, Veliz E, Sundaresan V, Ali J Plant Commun. 2024; 5(12):101078.

PMID: 39233440 PMC: 11671768. DOI: 10.1016/j.xplc.2024.101078.


AFITbin: a metagenomic contig binning method using aggregate l-mer frequency based on initial and terminal nucleotides.

Darabi A, Sobhani S, Aghdam R, Eslahchi C BMC Bioinformatics. 2024; 25(1):241.

PMID: 39014300 PMC: 11253361. DOI: 10.1186/s12859-024-05859-7.


Effect of Altitude Gradients on the Spatial Distribution Mechanism of Soil Bacteria in Temperate Deciduous Broad-Leaved Forests.

Liu W, Guo S, Zhang H, Chen Y, Shao Y, Yuan Z Microorganisms. 2024; 12(6).

PMID: 38930416 PMC: 11206066. DOI: 10.3390/microorganisms12061034.


References
1.
Turnbaugh P, Ley R, Mahowald M, Magrini V, Mardis E, Gordon J . An obesity-associated gut microbiome with increased capacity for energy harvest. Nature. 2006; 444(7122):1027-31. DOI: 10.1038/nature05414. View

2.
Lee D, Seung H . Learning the parts of objects by non-negative matrix factorization. Nature. 1999; 401(6755):788-91. DOI: 10.1038/44565. View

3.
Matthews B . Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta. 1975; 405(2):442-51. DOI: 10.1016/0005-2795(75)90109-9. View

4.
Chicco D, Jurman G . The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 2020; 21(1):6. PMC: 6941312. DOI: 10.1186/s12864-019-6413-7. View

5.
TORGERSON W . Multidimensional scaling of similarity. Psychometrika. 1965; 30(4):379-93. DOI: 10.1007/BF02289530. View