» Articles » PMID: 37712039

Transcription Between Human-readable Synthetic Descriptions and Machine-executable Instructions: an Application of the Latest Pre-training Technology

Overview
Journal Chem Sci
Specialty Chemistry
Date 2023 Sep 15
PMID 37712039
Authors
Affiliations
Soon will be listed here.
Abstract

AI has been widely applied in scientific scenarios, such as robots performing chemical synthetic actions to free researchers from monotonous experimental procedures. However, there exists a gap between human-readable natural language descriptions and machine-executable instructions, of which the former are typically in numerous chemical articles, and the latter are currently compiled manually by experts. We apply the latest technology of pre-trained models and achieve automatic transcription between descriptions and instructions. We design a concise and comprehensive schema of instructions and construct an open-source human-annotated dataset consisting of 3950 description-instruction pairs, with 9.2 operations in each instruction on average. We further propose knowledgeable pre-trained transcription models enhanced by multi-grained chemical knowledge. The performance of recent popular models and products showing great capability in automatic writing (, ChatGPT) has also been explored. Experiments prove that our system improves the instruction compilation efficiency of researchers by at least 42%, and can generate fluent academic paragraphs of synthetic descriptions when given instructions, showing the great potential of pre-trained models in improving human productivity.

References
1.
Swain M, Cole J . ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature. J Chem Inf Model. 2016; 56(10):1894-1904. DOI: 10.1021/acs.jcim.6b00207. View

2.
Walters W, Murcko M . Assessing the impact of generative AI on medicinal chemistry. Nat Biotechnol. 2020; 38(2):143-145. DOI: 10.1038/s41587-020-0418-2. View

3.
Zeng Z, Yao Y, Liu Z, Sun M . A deep-learning system bridging molecule structure and biomedical text with comprehension comparable to human professionals. Nat Commun. 2022; 13(1):862. PMC: 8844428. DOI: 10.1038/s41467-022-28494-3. View

4.
MERRIFIELD R . Automated synthesis of peptides. Science. 1965; 150(3693):178-85. DOI: 10.1126/science.150.3693.178. View

5.
Rohrbach S, Siauciulis M, Chisholm G, Pirvan P, Saleeb M, M Mehr S . Digitization and validation of a chemical synthesis literature database in the ChemPU. Science. 2022; 377(6602):172-180. DOI: 10.1126/science.abo0058. View