» Articles » PMID: 39333884

Utilization of a Natural Language Processing-based Approach to Determine the Composition of Artifact Residues

Overview
Publisher Biomed Central
Specialty Biology
Date 2024 Sep 28
PMID 39333884
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Determining the composition of artifact residues is a central problem in ancient residue metabolomics. This is done by comparing mass spectral features in common with an experimental artifact and an ancient artifact (standard method). While this method is simple and straightforward, we sought to increase the accuracy of predicting which plant species had been used in which artifacts.

Results: Here, we introduce an algorithm (new method) based on ideas from the field of natural language processing (NLP) to solve this problem. We tested our strategy on a set of modern clay pipes. To limit biases, we were not provided information on which plant species had been smoked in which clay pipes. The results indicate that our new method performed 12.5% better than the standard method in predicting the plant species smoked in each artifact.

Conclusions: Utilizing an NLP-based approach, we developed a robust algorithm for characterizing the composition of artifact residues. This work also discusses other general applications in which our algorithm could be used in the field of metabolomics, such as datasets where there are a limited number of replicates.

References
1.
Tang Y, Li R, Lin G, Li L . PEP search in MyCompoundID: detection and identification of dipeptides and tripeptides using dimethyl labeling and hydrophilic interaction liquid chromatography tandem mass spectrometry. Anal Chem. 2014; 86(7):3568-74. DOI: 10.1021/ac500109y. View

2.
Brownstein K, Tushingham S, Damitio W, Nguyen T, Gang D . An Ancient Residue Metabolomics-Based Method to Distinguish Use of Closely Related Plant Species in Ancient Pipes. Front Mol Biosci. 2020; 7:133. PMC: 7332879. DOI: 10.3389/fmolb.2020.00133. View

3.
Ovchinnikova K, Stuart L, Rakhlin A, Nikolenko S, Alexandrov T . ColocML: machine learning quantifies co-localization between mass spectrometry images. Bioinformatics. 2020; 36(10):3215-3224. PMC: 7214035. DOI: 10.1093/bioinformatics/btaa085. View

4.
Pang Z, Chong J, Zhou G, Morais D, Chang L, Barrette M . MetaboAnalyst 5.0: narrowing the gap between raw spectra and functional insights. Nucleic Acids Res. 2021; 49(W1):W388-W396. PMC: 8265181. DOI: 10.1093/nar/gkab382. View

5.
Tautenhahn R, Patti G, Rinehart D, Siuzdak G . XCMS Online: a web-based platform to process untargeted metabolomic data. Anal Chem. 2012; 84(11):5035-9. PMC: 3703953. DOI: 10.1021/ac300698c. View