» Articles » PMID: 33305310

Leveraging Heterogeneous Network Embedding for Metabolic Pathway Prediction

Overview
Journal Bioinformatics
Specialty Biology
Date 2020 Dec 11
PMID 33305310
Citations 3
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Metabolic pathway reconstruction from genomic sequence information is a key step in predicting regulatory and functional potential of cells at the individual, population and community levels of organization. Although the most common methods for metabolic pathway reconstruction are gene-centric e.g. mapping annotated proteins onto known pathways using a reference database, pathway-centric methods based on heuristics or machine learning to infer pathway presence provide a powerful engine for hypothesis generation in biological systems. Such methods rely on rule sets or rich feature information that may not be known or readily accessible.

Results: Here, we present pathway2vec, a software package consisting of six representational learning modules used to automatically generate features for pathway inference. Specifically, we build a three-layered network composed of compounds, enzymes and pathways, where nodes within a layer manifest inter-interactions and nodes between layers manifest betweenness interactions. This layered architecture captures relevant relationships used to learn a neural embedding-based low-dimensional space of metabolic features. We benchmark pathway2vec performance based on node-clustering, embedding visualization and pathway prediction using MetaCyc as a trusted source. In the pathway prediction task, results indicate that it is possible to leverage embeddings to improve prediction outcomes.

Availability And Implementation: The software package and installation instructions are published on http://github.com/pathway2vec.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Product Manifold Representations for Learning on Biological Pathways.

McNeela D, Sala F, Gitter A ArXiv. 2025; .

PMID: 39975438 PMC: 11838783.


Multi-label classification with XGBoost for metabolic pathway prediction.

Joe H, Kim H BMC Bioinformatics. 2024; 25(1):52.

PMID: 38297220 PMC: 10832249. DOI: 10.1186/s12859-024-05666-0.


Graph embedding on mass spectrometry- and sequencing-based biomedical data.

Alvarez-Mamani E, Dechant R, Beltran-Castanon C, Ibanez A BMC Bioinformatics. 2024; 25(1):1.

PMID: 38166530 PMC: 10763173. DOI: 10.1186/s12859-023-05612-6.


A compendium of bacterial and archaeal single-cell amplified genomes from oxygen deficient marine waters.

Anstett J, Plominsky A, DeLong E, Kiesser A, Jurgens K, Morgan-Lang C Sci Data. 2023; 10(1):332.

PMID: 37244914 PMC: 10224968. DOI: 10.1038/s41597-023-02222-y.

References
1.
Abubucker S, Segata N, Goll J, Schubert A, Izard J, Cantarel B . Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput Biol. 2012; 8(6):e1002358. PMC: 3374609. DOI: 10.1371/journal.pcbi.1002358. View

2.
Lawson C, Harcombe W, Hatzenpichler R, Lindemann S, Loffler F, OMalley M . Common principles and best practices for engineering microbiomes. Nat Rev Microbiol. 2019; 17(12):725-741. PMC: 8323346. DOI: 10.1038/s41579-019-0255-9. View

3.
Wang C, Liu X, Song Y, Han J . Towards Interactive Construction of Topical Hierarchy: A Recursive Tensor Decomposition Approach. KDD. 2015; 2015:1225-1234. PMC: 4688012. DOI: 10.1145/2783258.2783288. View

4.
Ansorge W . Next-generation DNA sequencing techniques. N Biotechnol. 2009; 25(4):195-203. DOI: 10.1016/j.nbt.2008.12.009. View

5.
Gui H, Liu J, Tao F, Jiang M, Norick B, Kaplan L . Embedding Learning with Events in Heterogeneous Information Networks. IEEE Trans Knowl Data Eng. 2017; 29(11):2428-2441. PMC: 5726307. DOI: 10.1109/TKDE.2017.2733530. View