» Articles » PMID: 37869650

Overview of Data Preprocessing for Machine Learning Applications in Human Microbiome Research

Overview
Journal Front Microbiol
Specialty Microbiology
Date 2023 Oct 23
PMID 37869650
Authors
Affiliations
Soon will be listed here.
Abstract

Although metagenomic sequencing is now the preferred technique to study microbiome-host interactions, analyzing and interpreting microbiome sequencing data presents challenges primarily attributed to the statistical specificities of the data (e.g., sparse, over-dispersed, compositional, inter-variable dependency). This mini review explores preprocessing and transformation methods applied in recent human microbiome studies to address microbiome data analysis challenges. Our results indicate a limited adoption of transformation methods targeting the statistical characteristics of microbiome sequencing data. Instead, there is a prevalent usage of relative and normalization-based transformations that do not specifically account for the specific attributes of microbiome data. The information on preprocessing and transformations applied to the data before analysis was incomplete or missing in many publications, leading to reproducibility concerns, comparability issues, and questionable results. We hope this mini review will provide researchers and newcomers to the field of human microbiome research with an up-to-date point of reference for various data transformation tools and assist them in choosing the most suitable transformation method based on their research questions, objectives, and data characteristics.

Citing Articles

Deep learning in microbiome analysis: a comprehensive review of neural network models.

Przymus P, Rykaczewski K, Martin-Segura A, Truu J, Carrillo De Santa Pau E, Kolev M Front Microbiol. 2025; 15:1516667.

PMID: 39911715 PMC: 11794229. DOI: 10.3389/fmicb.2024.1516667.


Effects of data transformation and model selection on feature importance in microbiome classification data.

Karwowska Z, Aasmets O, Kosciolek T, Org E Microbiome. 2025; 13(1):2.

PMID: 39754220 PMC: 11699698. DOI: 10.1186/s40168-024-01996-6.


Domain adaptation in small-scale and heterogeneous biological datasets.

Orouji S, Liu M, Korem T, Peters M Sci Adv. 2024; 10(51):eadp6040.

PMID: 39705361 PMC: 11661433. DOI: 10.1126/sciadv.adp6040.


MetaBakery: a Singularity implementation of bioBakery tools as a skeleton application for efficient HPC deconvolution of microbiome metagenomic sequencing data to machine learning ready information.

Murovec B, Deutsch L, Osredkar D, Stres B Front Microbiol. 2024; 15:1426465.

PMID: 39139377 PMC: 11321593. DOI: 10.3389/fmicb.2024.1426465.


Explainable artificial intelligence and microbiome data for food geographical origin: the Mozzarella di Bufala Campana PDO Case of Study.

Magarelli M, Novielli P, De Filippis F, Magliulo R, Di Bitonto P, Diacono D Front Microbiol. 2024; 15:1393243.

PMID: 38887708 PMC: 11180736. DOI: 10.3389/fmicb.2024.1393243.


References
1.
Fukui H, Nishida A, Matsuda S, Kira F, Watanabe S, Kuriyama M . Usefulness of Machine Learning-Based Gut Microbiome Analysis for Identifying Patients with Irritable Bowels Syndrome. J Clin Med. 2020; 9(8). PMC: 7464323. DOI: 10.3390/jcm9082403. View

2.
Greenacre M, Martinez-Alvaro M, Blasco A . Compositional Data Analysis of Microbiome and Any-Omics Datasets: A Validation of the Additive Logratio Transformation. Front Microbiol. 2021; 12:727398. PMC: 8561721. DOI: 10.3389/fmicb.2021.727398. View

3.
Vangay P, Hillmann B, Knights D . Microbiome Learning Repo (ML Repo): A public repository of microbiome regression and classification tasks. Gigascience. 2019; 8(5). PMC: 6493971. DOI: 10.1093/gigascience/giz042. View

4.
DElia D, Truu J, Lahti L, Berland M, Papoutsoglou G, Ceci M . Advancing microbiome research with machine learning: key findings from the ML4Microbiome COST action. Front Microbiol. 2023; 14:1257002. PMC: 10558209. DOI: 10.3389/fmicb.2023.1257002. View

5.
Gupta A, Dhakan D, Maji A, Saxena R, Prasoodanan P K V, Mahajan S . Association of Flavonifractor plautii, a Flavonoid-Degrading Bacterium, with the Gut Microbiome of Colorectal Cancer Patients in India. mSystems. 2019; 4(6). PMC: 7407896. DOI: 10.1128/mSystems.00438-19. View