» Articles » PMID: 31862874

The METLIN Small Molecule Dataset for Machine Learning-based Retention Time prediction

Overview
Journal Nat Commun
Specialty Biology
Date 2019 Dec 22
PMID 31862874
Citations 54
Authors
Affiliations
Soon will be listed here.
Abstract

Machine learning has been extensively applied in small molecule analysis to predict a wide range of molecular properties and processes including mass spectrometry fragmentation or chromatographic retention time. However, current approaches for retention time prediction lack sufficient accuracy due to limited available experimental data. Here we introduce the METLIN small molecule retention time (SMRT) dataset, an experimentally acquired reverse-phase chromatography retention time dataset covering up to 80,038 small molecules. To demonstrate the utility of this dataset, we deployed a deep learning model for retention time prediction applied to small molecule annotation. Results showed that in 70[Formula: see text] of the cases, the correct molecular identity was ranked among the top 3 candidates based on their predicted retention time. We anticipate that this dataset will enable the community to apply machine learning or first principles strategies to generate better models for retention time prediction.

Citing Articles

FIORA: Local neighborhood-based prediction of compound mass spectra from single fragmentation events.

Nowatzky Y, Russo F, Lisec J, Kister A, Reinert K, Muth T Nat Commun. 2025; 16(1):2298.

PMID: 40055306 PMC: 11889238. DOI: 10.1038/s41467-025-57422-4.


ROASMI: accelerating small molecule identification by repurposing retention data.

Sun F, Yin Y, Liu H, Shen L, Kang X, Xin G J Cheminform. 2025; 17(1):20.

PMID: 39953609 PMC: 11829455. DOI: 10.1186/s13321-025-00968-8.


Development of an Efficient and Generalized MTSCAM Model to Predict Liquid Chromatography Retention Times of Organic Compounds.

Fan M, Sang C, Li H, Wei Y, Zhang B, Xing Y Research (Wash D C). 2025; 8:0607.

PMID: 39925484 PMC: 11803058. DOI: 10.34133/research.0607.


Application of artificial intelligence to quantitative structure-retention relationship calculations in chromatography.

Xie J, Chen S, Zhao L, Dong X J Pharm Anal. 2025; 15(1):101155.

PMID: 39896319 PMC: 11782803. DOI: 10.1016/j.jpha.2024.101155.


Explicit relation between thin film chromatography and column chromatography conditions from statistics and machine learning.

Xu H, Wu W, Chen Y, Zhang D, Mo F Nat Commun. 2025; 16(1):832.

PMID: 39828717 PMC: 11743788. DOI: 10.1038/s41467-025-56136-x.


References
1.
Guijas C, Montenegro-Burke J, Domingo-Almenara X, Palermo A, Warth B, Hermann G . METLIN: A Technology Platform for Identifying Knowns and Unknowns. Anal Chem. 2018; 90(5):3156-3164. PMC: 5933435. DOI: 10.1021/acs.analchem.7b04424. View

2.
Broeckling C, Ganna A, Layer M, Brown K, Sutton B, Ingelsson E . Enabling Efficient and Confident Annotation of LC-MS Metabolomics Data through MS1 Spectrum and Time Prediction. Anal Chem. 2016; 88(18):9226-34. DOI: 10.1021/acs.analchem.6b02479. View

3.
Sumner L, Amberg A, Barrett D, Beale M, Beger R, Daykin C . Proposed minimum reporting standards for chemical analysis Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI). Metabolomics. 2013; 3(3):211-221. PMC: 3772505. DOI: 10.1007/s11306-007-0082-2. View

4.
Bach E, Szedmak S, Brouard C, Bocker S, Rousu J . Liquid-chromatography retention order prediction for metabolite identification. Bioinformatics. 2018; 34(17):i875-i883. DOI: 10.1093/bioinformatics/bty590. View

5.
Samaraweera M, Hall L, Hill D, Grant D . Evaluation of an Artificial Neural Network Retention Index Model for Chemical Structure Identification in Nontargeted Metabolomics. Anal Chem. 2018; 90(21):12752-12760. PMC: 8378237. DOI: 10.1021/acs.analchem.8b03118. View