» Articles » PMID: 20064214

Machine Learning Methods for Metabolic Pathway Prediction

Overview
Publisher Biomed Central
Specialty Biology
Date 2010 Jan 13
PMID 20064214
Citations 69
Authors
Affiliations
Soon will be listed here.
Abstract

Background: A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism.

Results: To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways.

Conclusions: ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations.

Citing Articles

Screening of genes co-associated with osteoporosis and chronic HBV infection based on bioinformatics analysis and machine learning.

Yang J, Yang W, Hu Y, Tong L, Liu R, Liu L Front Immunol. 2024; 15:1472354.

PMID: 39351238 PMC: 11439653. DOI: 10.3389/fimmu.2024.1472354.


Multi-label classification with XGBoost for metabolic pathway prediction.

Joe H, Kim H BMC Bioinformatics. 2024; 25(1):52.

PMID: 38297220 PMC: 10832249. DOI: 10.1186/s12859-024-05666-0.


Machine learning for metabolic pathway optimization: A review.

Cheng Y, Bi X, Xu Y, Liu Y, Li J, Du G Comput Struct Biotechnol J. 2024; 21:2381-2393.

PMID: 38213889 PMC: 10781721. DOI: 10.1016/j.csbj.2023.03.045.


Predicting pathways for old and new metabolites through clustering.

Siddharth T, Lewis N J Theor Biol. 2023; 578:111684.

PMID: 38048983 PMC: 11139542. DOI: 10.1016/j.jtbi.2023.111684.


Predicting metabolic fluxes from omics data via machine learning: Moving from knowledge-driven towards data-driven approaches.

Goncalves D, Henriques R, Costa R Comput Struct Biotechnol J. 2023; 21:4960-4973.

PMID: 37876626 PMC: 10590844. DOI: 10.1016/j.csbj.2023.10.002.


References
1.
Seo S, Lewin H . Reconstruction of metabolic pathways for the cattle genome. BMC Syst Biol. 2009; 3:33. PMC: 2669051. DOI: 10.1186/1752-0509-3-33. View

2.
Kastenmuller G, Schenk M, Gasteiger J, Mewes H . Uncovering metabolic pathways relevant to phenotypic traits of microbial genomes. Genome Biol. 2009; 10(3):R28. PMC: 2690999. DOI: 10.1186/gb-2009-10-3-r28. View

3.
Cakmak A, Ozsoyoglu G . Mining biological networks for unknown pathways. Bioinformatics. 2007; 23(20):2775-83. DOI: 10.1093/bioinformatics/btm409. View

4.
Feist A, Henry C, Reed J, Krummenacker M, Joyce A, Karp P . A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol Syst Biol. 2007; 3:121. PMC: 1911197. DOI: 10.1038/msb4100155. View

5.
Zhang P, Foerster H, Tissier C, Mueller L, Paley S, Karp P . MetaCyc and AraCyc. Metabolic pathway databases for plant research. Plant Physiol. 2005; 138(1):27-37. PMC: 1104157. DOI: 10.1104/pp.105.060376. View