Leveraging Heterogeneous Network Embedding for Metabolic Pathway Prediction
Overview
Affiliations
Motivation: Metabolic pathway reconstruction from genomic sequence information is a key step in predicting regulatory and functional potential of cells at the individual, population and community levels of organization. Although the most common methods for metabolic pathway reconstruction are gene-centric e.g. mapping annotated proteins onto known pathways using a reference database, pathway-centric methods based on heuristics or machine learning to infer pathway presence provide a powerful engine for hypothesis generation in biological systems. Such methods rely on rule sets or rich feature information that may not be known or readily accessible.
Results: Here, we present pathway2vec, a software package consisting of six representational learning modules used to automatically generate features for pathway inference. Specifically, we build a three-layered network composed of compounds, enzymes and pathways, where nodes within a layer manifest inter-interactions and nodes between layers manifest betweenness interactions. This layered architecture captures relevant relationships used to learn a neural embedding-based low-dimensional space of metabolic features. We benchmark pathway2vec performance based on node-clustering, embedding visualization and pathway prediction using MetaCyc as a trusted source. In the pathway prediction task, results indicate that it is possible to leverage embeddings to improve prediction outcomes.
Availability And Implementation: The software package and installation instructions are published on http://github.com/pathway2vec.
Supplementary Information: Supplementary data are available at Bioinformatics online.
Product Manifold Representations for Learning on Biological Pathways.
McNeela D, Sala F, Gitter A ArXiv. 2025; .
PMID: 39975438 PMC: 11838783.
Multi-label classification with XGBoost for metabolic pathway prediction.
Joe H, Kim H BMC Bioinformatics. 2024; 25(1):52.
PMID: 38297220 PMC: 10832249. DOI: 10.1186/s12859-024-05666-0.
Graph embedding on mass spectrometry- and sequencing-based biomedical data.
Alvarez-Mamani E, Dechant R, Beltran-Castanon C, Ibanez A BMC Bioinformatics. 2024; 25(1):1.
PMID: 38166530 PMC: 10763173. DOI: 10.1186/s12859-023-05612-6.
Anstett J, Plominsky A, DeLong E, Kiesser A, Jurgens K, Morgan-Lang C Sci Data. 2023; 10(1):332.
PMID: 37244914 PMC: 10224968. DOI: 10.1038/s41597-023-02222-y.