MINEs: Open Access Databases of Computationally Predicted Enzyme Promiscuity Products for Untargeted Metabolomics

Overview

Journal J Cheminform

Publisher Biomed Central

Specialty Chemistry

Date 2015 Sep 1

PMID 26322134

Citations 80

Authors

James G Jeffryes

Ricardo L Colastani

Mona Elbadawi-Sidhu

Tobias Kind

Thomas D Niehaus

Linda J Broadbelt

Andrew D Hanson

Oliver Fiehn

Keith E J Tyo

Christopher S Henry

Affiliations

Soon will be listed here.

Abstract

Background: In spite of its great promise, metabolomics has proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography-mass spectrometry (LC-MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases.

Description: Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likely to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC-MS accurate mass data enabled the identity of an unknown peak to be confidently predicted.

Conclusions: MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results include irrelevant synthetic compounds. Furthermore, MINEs complement and expand on previous in silico generated compound databases that focus on human metabolism. We are actively developing the database; future versions of this resource will incorporate transformation rules for spontaneous chemical reactions and more advanced filtering and prioritization of candidate structures. Graphical abstractMINE database construction and access methods. The process of constructing a MINE database from the curated source databases is depicted on the left. The methods for accessing the database are shown on the right.

Citing Articles

Predicting Collision-Induced-Dissociation Tandem Mass Spectra (CID-MS/MS) Using Ab Initio Molecular Dynamics.

Lee J, Tantillo D, Wang L, Fiehn O J Chem Inf Model. 2024; 64(19):7470-7487.

PMID: 39329407 PMC: 11492810. DOI: 10.1021/acs.jcim.4c00760.

Knowledge-based in silico fragmentation and annotation of mass spectra for natural products with MassKG.

Zhu B, Li Z, Jin Z, Zhong Y, Lv T, Ge Z Comput Struct Biotechnol J. 2024; 23:3327-3341.

PMID: 39310281 PMC: 11415640. DOI: 10.1016/j.csbj.2024.09.001.

Introducing 'identification probability' for automated and transferable assessment of metabolite identification confidence in metabolomics and related studies.

Metz T, Chang C, Gautam V, Anjum A, Tian S, Wang F bioRxiv. 2024; .

PMID: 39131324 PMC: 11312557. DOI: 10.1101/2024.07.30.605945.

Extending PROXIMAL to predict degradation pathways of phenolic compounds in the human gut microbiota.

Balzerani F, Blasco T, Perez-Burillo S, Valcarcel L, Hassoun S, Planes F NPJ Syst Biol Appl. 2024; 10(1):56.

PMID: 38802371 PMC: 11130242. DOI: 10.1038/s41540-024-00381-1.

MetaboAnalystR 4.0: a unified LC-MS workflow for global metabolomics.

Pang Z, Xu L, Viau C, Lu Y, Salavati R, Basu N Nat Commun. 2024; 15(1):3675.

PMID: 38693118 PMC: 11063062. DOI: 10.1038/s41467-024-48009-6.

References

OBrien P, Herschlag D . Catalytic promiscuity and the evolution of new enzymatic activities. Chem Biol. 1999; 6(4):R91-R105. DOI: 10.1016/S1074-5521(99)80033-7. View

Gonzalez-Lergier J, Broadbelt L, Hatzimanikatis V . Theoretical considerations and computational analysis of the complexity in polyketide synthesis pathways. J Am Chem Soc. 2005; 127(27):9930-8. DOI: 10.1021/ja051586y. View

Henry C, Jankowski M, Broadbelt L, Hatzimanikatis V . Genome-scale thermodynamic analysis of Escherichia coli metabolism. Biophys J. 2005; 90(4):1453-61. PMC: 1367295. DOI: 10.1529/biophysj.105.071720. View

Stein S, Babushok V, Brown R, Linstrom P . Estimation of Kováts retention indices using group contributions. J Chem Inf Model. 2007; 47(3):975-80. DOI: 10.1021/ci600548y. View

OBoyle N, Morley C, Hutchison G . Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit. Chem Cent J. 2008; 2:5. PMC: 2270842. DOI: 10.1186/1752-153X-2-5. View

Fenner K, Gao J, Kramer S, Ellis L, Wackett L . Data-driven extraction of relative reasoning rules to limit combinatorial explosion in biodegradation pathway prediction. Bioinformatics. 2008; 24(18):2079-85. DOI: 10.1093/bioinformatics/btn378. View

Sanchez-Moreno I, Iturrate L, Martin-Hoyos R, Jimeno M, Mena M, Bastida A . From kinase to cyclase: an unusual example of catalytic promiscuity modulated by metal switching. Chembiochem. 2008; 10(2):225-9. DOI: 10.1002/cbic.200800573. View

de Groot M, van Berlo R, van Winden W, Verheijen P, Reinders M, De Ridder D . Metabolite and reaction inference based on enzyme specificities. Bioinformatics. 2009; 25(22):2975-82. PMC: 2773254. DOI: 10.1093/bioinformatics/btp507. View

Scalbert A, Brennan L, Fiehn O, Hankemeier T, Kristal B, van Ommen B . Mass-spectrometry-based metabolomics: limitations and recommendations for future progress with particular focus on nutrition research. Metabolomics. 2010; 5(4):435-458. PMC: 2794347. DOI: 10.1007/s11306-009-0168-0. View

10.

Henry C, Broadbelt L, Hatzimanikatis V . Discovery and analysis of novel metabolic pathways for the biosynthesis of industrial chemicals: 3-hydroxypropanoate. Biotechnol Bioeng. 2010; 106(3):462-73. DOI: 10.1002/bit.22673. View

11.

Wolf S, Schmidt S, Muller-Hannemann M, Neumann S . In silico fragmentation for computer assisted identification of metabolite mass spectra. BMC Bioinformatics. 2010; 11:148. PMC: 2853470. DOI: 10.1186/1471-2105-11-148. View

12.

Moriya Y, Shigemizu D, Hattori M, Tokimatsu T, Kotera M, Goto S . PathPred: an enzyme-catalyzed metabolic pathway prediction server. Nucleic Acids Res. 2010; 38(Web Server issue):W138-43. PMC: 2896155. DOI: 10.1093/nar/gkq318. View

13.

Horai H, Arita M, Kanaya S, Nihei Y, Ikeda T, Suwa K . MassBank: a public repository for sharing mass spectral data for life sciences. J Mass Spectrom. 2010; 45(7):703-14. DOI: 10.1002/jms.1777. View

14.

Roux A, Lison D, Junot C, Heilier J . Applications of liquid chromatography coupled to mass spectrometry-based metabolomics in clinical chemistry and toxicology: A review. Clin Biochem. 2010; 44(1):119-35. DOI: 10.1016/j.clinbiochem.2010.08.016. View

15.

Mu F, Unkefer C, Unkefer P, Hlavacek W . Prediction of metabolic reactions based on atomic and molecular properties of small-molecule compounds. Bioinformatics. 2011; 27(11):1537-45. PMC: 3102224. DOI: 10.1093/bioinformatics/btr177. View

16.

Gao J, Ellis L, Wackett L . The University of Minnesota Pathway Prediction System: multi-level prediction and visualization. Nucleic Acids Res. 2011; 39(Web Server issue):W406-11. PMC: 3125723. DOI: 10.1093/nar/gkr200. View

17.

Bar-Even A, Noor E, Savir Y, Liebermeister W, Davidi D, Tawfik D . The moderately efficient enzyme: evolutionary and physicochemical trends shaping enzyme parameters. Biochemistry. 2011; 50(21):4402-10. DOI: 10.1021/bi2002289. View

18.

Fiehn O, Barupal D, Kind T . Extending biochemical databases by metabolomic surveys. J Biol Chem. 2011; 286(27):23637-43. PMC: 3129143. DOI: 10.1074/jbc.R110.173617. View

19.

Lang M, Stelzer M, Schomburg D . BKM-react, an integrated biochemical reaction database. BMC Biochem. 2011; 12:42. PMC: 3167764. DOI: 10.1186/1471-2091-12-42. View

20.

Jewison T, Knox C, Neveu V, Djoumbou Y, Guo A, Lee J . YMDB: the Yeast Metabolome Database. Nucleic Acids Res. 2011; 40(Database issue):D815-20. PMC: 3245085. DOI: 10.1093/nar/gkr916. View