» Articles » PMID: 28475312

Information Retrieval and Text Mining Technologies for Chemistry

Overview
Journal Chem Rev
Specialty Chemistry
Date 2017 May 6
PMID 28475312
Citations 60
Authors
Affiliations
Soon will be listed here.
Abstract

Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.

Citing Articles

Application of Transformers to Chemical Synthesis.

Jin D, Liang Y, Xiong Z, Yang X, Wang H, Zeng J Molecules. 2025; 30(3).

PMID: 39942600 PMC: 11821105. DOI: 10.3390/molecules30030493.


XenoMet: A Corpus of Texts to Extract Data on Metabolites of Xenobiotics.

Biziukova N, Rudik A, Dmitriev A, Tarasova O, Filimonov D, Poroikov V ACS Omega. 2025; 10(3):2459-2471.

PMID: 39895765 PMC: 11780559. DOI: 10.1021/acsomega.4c05723.


Site-specific prediction of O-GlcNAc modification in proteins using evolutionary scale model.

Khalid A, Kaleem A, Qazi W, Abdullah R, Iqtedar M, Naz S PLoS One. 2024; 19(12):e0316215.

PMID: 39739642 PMC: 11687694. DOI: 10.1371/journal.pone.0316215.


cidalsDB: an AI-empowered platform for anti-pathogen therapeutics research.

Harigua-Souiai E, Masmoudi O, Makni S, Oualha R, Abdelkrim Y, Hamdi S J Cheminform. 2024; 16(1):134.

PMID: 39609715 PMC: 11605991. DOI: 10.1186/s13321-024-00929-7.


Zombie cheminformatics: extraction and conversion of Wiswesser Line Notation (WLN) from chemical documents.

Blakey M, Pearman-Kanza S, Frey J J Cheminform. 2024; 16(1):42.

PMID: 38622746 PMC: 11017645. DOI: 10.1186/s13321-024-00831-2.