MolFind: a Software Package Enabling HPLC/MS-based Identification of Unknown Chemical Structures

Overview

Journal Anal Chem

Specialty Chemistry

Date 2012 Oct 9

PMID 23039714

Citations 24

Authors

Lochana C Menikarachchi

Shannon Cawley

Dennis W Hill

L Mark Hall

Lowell Hall

Steven Lai

Janine Wilder

David F Grant

Affiliations

Soon will be listed here.

Abstract

In this paper, we present MolFind, a highly multithreaded pipeline type software package for use as an aid in identifying chemical structures in complex biofluids and mixtures. MolFind is specifically designed for high-performance liquid chromatography/mass spectrometry (HPLC/MS) data inputs typical of metabolomics studies where structure identification is the ultimate goal. MolFind enables compound identification by matching HPLC/MS-based experimental data obtained for an unknown compound with computationally derived HPLC/MS values for candidate compounds downloaded from chemical databases such as PubChem. The downloaded "bins" consist of all compounds matching the monoisotopic molecular weight of the unknown. The computational HPLC/MS values predicted include retention index (RI), ECOM(50) (energy required to fragment 50% of a selected precursor ion), drift time, and collision induced dissociation (CID) spectrum. RI, ECOM(50), and drift-time models are used for filtering compounds downloaded from PubChem. The remaining candidates are then ranked based on CID spectra matching. Current RI and ECOM(50) models allow for the removal of about 28% of compounds from PubChem bins. Our estimates suggest that this could be improved to as much as 87% with additional chemical structures included in the computational models. Quantitative structure property relationship-based modeling of drift times showed a better correlation with experimentally determined drift times than did Mobcal cross-sectional areas. In 23 of 35 example cases, filtering PubChem bins with RI and ECOM(50) predictive models resulted in improved ranking of the unknown compounds compared to previous studies using CID spectra matching alone. In 19 of 35 examples, the correct candidate was ranked within the top 20 compounds in bins containing an average of 1635 compounds.

Citing Articles

Recent advances in proteomics and metabolomics in plants.

Yan S, Bhawal R, Yin Z, Thannhauser T, Zhang S Mol Hortic. 2023; 2(1):17.

PMID: 37789425 PMC: 10514990. DOI: 10.1186/s43897-022-00038-9.

Highly accurate and large-scale collision cross sections prediction with graph neural networks.

Guo R, Zhang Y, Liao Y, Yang Q, Xie T, Fan X Commun Chem. 2023; 6(1):139.

PMID: 37402835 PMC: 10319785. DOI: 10.1038/s42004-023-00939-w.

MAD HATTER Correctly Annotates 98% of Small Molecule Tandem Mass Spectra Searching in PubChem.

Hoffmann M, Kretschmer F, Ludwig M, Bocker S Metabolites. 2023; 13(3).

PMID: 36984753 PMC: 10053663. DOI: 10.3390/metabo13030314.

Machine learning for identification of silylated derivatives from mass spectra.

Ljoncheva M, Stepisnik T, Kosjek T, Dzeroski S J Cheminform. 2022; 14(1):62.

PMID: 36109826 PMC: 9476372. DOI: 10.1186/s13321-022-00636-1.

Plants Metabolome Study: Emerging Tools and Techniques.

Patel M, Pandey S, Kumar M, Haque M, Pal S, Yadav N Plants (Basel). 2021; 10(11).

PMID: 34834772 PMC: 8621461. DOI: 10.3390/plants10112409.

References

Williams J, Bugarcic T, Habtemariam A, Giles K, Campuzano I, Rodger P . Isomer separation and gas-phase configurations of organoruthenium anticancer complexes: ion mobility mass spectrometry and modeling. J Am Soc Mass Spectrom. 2009; 20(6):1119-22. DOI: 10.1016/j.jasms.2009.02.016. View

Borsdorf H, Nazarov E, Miller R . Atmospheric-pressure ionization studies and field dependence of ion mobilities of isomeric hydrocarbons using a miniature differential mobility spectrometer. Anal Chim Acta. 2007; 575(1):76-88. DOI: 10.1016/j.aca.2006.05.066. View

Dear G, Munoz-Muriedas J, Beaumont C, Roberts A, Kirk J, Williams J . Sites of metabolic substitution: investigating metabolite structures utilising ion mobility and molecular modelling. Rapid Commun Mass Spectrom. 2010; 24(21):3157-62. DOI: 10.1002/rcm.4742. View

Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen E . Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. Curr Pharm Des. 2006; 12(17):2111-20. DOI: 10.2174/138161206777585274. View

Kertesz T, Hall L, Hill D, Grant D . CE50: quantifying collision induced dissociation energy for small molecule characterization and identification. J Am Soc Mass Spectrom. 2009; 20(9):1759-67. DOI: 10.1016/j.jasms.2009.06.002. View

Wishart D, Tzur D, Knox C, Eisner R, Guo A, Young N . HMDB: the Human Metabolome Database. Nucleic Acids Res. 2007; 35(Database issue):D521-6. PMC: 1899095. DOI: 10.1093/nar/gkl923. View

Hill D, Baveghems C, Albaugh D, Kormos T, Lai S, Ng H . Correlation of Ecom50 values between mass spectrometers: effect of collision cell radiofrequency voltage on calculated survival yield. Rapid Commun Mass Spectrom. 2012; 26(19):2303-10. PMC: 3439163. DOI: 10.1002/rcm.6353. View

Mihaleva V, Verhoeven H, de Vos R, Hall R, van Ham R . Automated procedure for candidate compound selection in GC-MS metabolomics based on prediction of Kovats retention index. Bioinformatics. 2009; 25(6):787-94. DOI: 10.1093/bioinformatics/btp056. View

Campuzano I, Bush M, Robinson C, Beaumont C, Richardson K, Kim H . Structural characterization of drug-like compounds by ion mobility mass spectrometry: comparison of theoretical and experimentally derived nitrogen collision cross sections. Anal Chem. 2011; 84(2):1026-33. DOI: 10.1021/ac202625t. View

10.

Kanehisa M, Goto S . KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 1999; 28(1):27-30. PMC: 102409. DOI: 10.1093/nar/28.1.27. View

11.

Schymanski E, Gallampois C, Krauss M, Meringer M, Neumann S, Schulze T . Consensus structure elucidation combining GC/EI-MS, structure generation, and calculated properties. Anal Chem. 2012; 84(7):3287-95. DOI: 10.1021/ac203471y. View

12.

Wishart D . Advances in metabolite identification. Bioanalysis. 2011; 3(15):1769-82. DOI: 10.4155/bio.11.155. View

13.

Albaugh D, Hall L, Hill D, Kertesz T, Parham M, Hall L . Prediction of HPLC retention index using artificial neural networks and IGroup E-state indices. J Chem Inf Model. 2009; 49(4):788-99. DOI: 10.1021/ci9000162. View

14.

Zwiener C, Frimmel F . LC-MS analysis in the aquatic environment and in water treatment technology--a critical review. Part II: Applications for emerging contaminants and related pollutants, microorganisms and humic acids. Anal Bioanal Chem. 2003; 378(4):862-74. DOI: 10.1007/s00216-003-2412-1. View

15.

Kertesz T, Hill D, Albaugh D, Hall L, Hall L, Grant D . Database searching for structural identification of metabolites in complex biofluids for mass spectrometry-based metabonomics. Bioanalysis. 2010; 1(9):1627-43. DOI: 10.4155/bio.09.145. View

16.

Wolf S, Schmidt S, Muller-Hannemann M, Neumann S . In silico fragmentation for computer assisted identification of metabolite mass spectra. BMC Bioinformatics. 2010; 11:148. PMC: 2853470. DOI: 10.1186/1471-2105-11-148. View

17.

Zakharova N, Crawford C, Hauck B, Quinton J, Seims W, Hill Jr H . An assessment of computational methods for obtaining structural information of moderately flexible biomolecules from ion mobility spectrometry. J Am Soc Mass Spectrom. 2012; 23(5):792-805. DOI: 10.1007/s13361-012-0339-5. View

18.

Dwivedi P, Wu C, Matz L, Clowers B, Siems W, Hill Jr H . Gas-phase chiral separations by ion mobility spectrometry. Anal Chem. 2006; 78(24):8200-6. PMC: 3633475. DOI: 10.1021/ac0608772. View

19.

Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E . The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics. J Chem Inf Comput Sci. 2003; 43(2):493-500. PMC: 4901983. DOI: 10.1021/ci025584y. View

20.

Hill D, Kertesz T, Fontaine D, Friedman R, Grant D . Mass spectral metabonomics beyond elemental formula: chemical database querying by matching experimental with computational fragmentation spectra. Anal Chem. 2008; 80(14):5574-82. DOI: 10.1021/ac800548g. View