» Articles » PMID: 32746888

Assessment of Statistical Methods from Single Cell, Bulk RNA-seq, and Metagenomics Applied to Microbiome Data

Overview
Journal Genome Biol
Specialties Biology
Genetics
Date 2020 Aug 5
PMID 32746888
Citations 57
Authors
Affiliations
Soon will be listed here.
Abstract

Background: The correct identification of differentially abundant microbial taxa between experimental conditions is a methodological and computational challenge. Recent work has produced methods to deal with the high sparsity and compositionality characteristic of microbiome data, but independent benchmarks comparing these to alternatives developed for RNA-seq data analysis are lacking.

Results: We compare methods developed for single-cell and bulk RNA-seq, and specifically for microbiome data, in terms of suitability of distributional assumptions, ability to control false discoveries, concordance, power, and correct identification of differentially abundant genera. We benchmark these methods using 100 manually curated datasets from 16S and whole metagenome shotgun sequencing.

Conclusions: The multivariate and compositional methods developed specifically for microbiome analysis did not outperform univariate methods developed for differential expression analysis of RNA-seq data. We recommend a careful exploratory data analysis prior to application of any inferential model and we present a framework to help scientists make an informed choice of analysis methods in a dataset-specific manner.

Citing Articles

Computational Study Protocol: Leveraging Synthetic Data to Validate a Benchmark Study for Differential Abundance Tests for 16S Microbiome Sequencing Data.

Kohnert E, Kreutz C F1000Res. 2025; 13:1180.

PMID: 39866725 PMC: 11757917. DOI: 10.12688/f1000research.155230.2.


PreLect: Prevalence leveraged consistent feature selection decodes microbial signatures across cohorts.

Chen Y, Su Y, Chu T, Wu M, Huang C, Lin C NPJ Biofilms Microbiomes. 2025; 11(1):3.

PMID: 39753565 PMC: 11698977. DOI: 10.1038/s41522-024-00598-2.


PhyImpute and UniFracImpute: two imputation approaches incorporating phylogeny information for microbial count data.

Luo Q, Zhang S, Butt H, Chen Y, Jiang H, An L Brief Bioinform. 2024; 26(1).

PMID: 39708838 PMC: 11663024. DOI: 10.1093/bib/bbae653.


Species specificity and specificity diversity (SSD) framework: a novel method for detecting the unique and enriched species associated with disease by leveraging the microbiome heterogeneity.

Ma Z BMC Biol. 2024; 22(1):283.

PMID: 39639304 PMC: 11619696. DOI: 10.1186/s12915-024-02024-7.


lefser: implementation of metagenomic biomarker discovery tool, LEfSe, in R.

Khleborodova A, Gamboa-Tuz S, Ramos M, Segata N, Waldron L, Oh S Bioinformatics. 2024; 40(12).

PMID: 39585730 PMC: 11665633. DOI: 10.1093/bioinformatics/btae707.


References
1.
Bullard J, Purdom E, Hansen K, Dudoit S . Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics. 2010; 11:94. PMC: 2838869. DOI: 10.1186/1471-2105-11-94. View

2.
Love M, Huber W, Anders S . Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014; 15(12):550. PMC: 4302049. DOI: 10.1186/s13059-014-0550-8. View

3.
Lu J, Shi P, Li H . Generalized linear models with linear constraints for microbiome compositional data. Biometrics. 2018; 75(1):235-244. DOI: 10.1111/biom.12956. View

4.
Thorsen J, Brejnrod A, Mortensen M, Rasmussen M, Stokholm J, Abu Al-Soud W . Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies. Microbiome. 2016; 4(1):62. PMC: 5123278. DOI: 10.1186/s40168-016-0208-8. View

5.
Butler A, Hoffman P, Smibert P, Papalexi E, Satija R . Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018; 36(5):411-420. PMC: 6700744. DOI: 10.1038/nbt.4096. View