» Articles » PMID: 35681226

DECIMER-hand-drawn Molecule Images Dataset

Overview
Journal J Cheminform
Publisher Biomed Central
Specialty Chemistry
Date 2022 Jun 10
PMID 35681226
Authors
Affiliations
Soon will be listed here.
Abstract

The translation of images of chemical structures into machine-readable representations of the depicted molecules is known as optical chemical structure recognition (OCSR). There has been a lot of progress over the last three decades in this field, but the development of systems for the recognition of complex hand-drawn structure depictions is still at the beginning. Currently, there is no data for the systematic evaluation of OCSR methods on hand-drawn structures available. Here we present DECIMER - Hand-drawn molecule images, a standardised, openly available benchmark dataset of 5088 hand-drawn depictions of diversely picked chemical structures. Every structure depiction in the dataset is mapped to a machine-readable representation of the underlying molecule. The dataset is openly available and published under the CC-BY 4.0 licence which applies very few limitations. We hope that it will contribute to the further development of the field.

Citing Articles

Digitize-HCD: A dataset for digitization of handwritten circuit diagrams.

Ahmed N, Adnan M, Shafiullah A, Parash H, Rahman M, Akib I Data Brief. 2025; 59:111315.

PMID: 39931092 PMC: 11808513. DOI: 10.1016/j.dib.2025.111315.


Advancements in hand-drawn chemical structure recognition through an enhanced DECIMER architecture.

Rajan K, Brinkhaus H, Zielesny A, Steinbeck C J Cheminform. 2024; 16(1):78.

PMID: 38970120 PMC: 11227129. DOI: 10.1186/s13321-024-00872-7.


MMSSC-Net: multi-stage sequence cognitive networks for drug molecule recognition.

Zhang D, Zhao D, Wang Z, Li J, Li J RSC Adv. 2024; 14(26):18182-18191.

PMID: 38854833 PMC: 11155551. DOI: 10.1039/d4ra02442g.


HD_BPMDS: a curated binary pattern multitarget dataset of Huntington's disease-targeting agents.

Stefan S, Pahnke J, Namasivayam V J Cheminform. 2023; 15(1):109.

PMID: 37978560 PMC: 10655317. DOI: 10.1186/s13321-023-00775-z.


DECIMER.ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications.

Rajan K, Brinkhaus H, Agea M, Zielesny A, Steinbeck C Nat Commun. 2023; 14(1):5045.

PMID: 37598180 PMC: 10439916. DOI: 10.1038/s41467-023-40782-0.


References
1.
Zhang X, Yi J, Yang G, Wu C, Hou T, Cao D . ABC-Net: a divide-and-conquer based deep learning architecture for SMILES recognition from molecular images. Brief Bioinform. 2022; 23(2). DOI: 10.1093/bib/bbac033. View

2.
Kim S, Chen J, Cheng T, Gindulyte A, He J, He S . PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 2020; 49(D1):D1388-D1395. PMC: 7778930. DOI: 10.1093/nar/gkaa971. View

3.
Rajan K, Zielesny A, Steinbeck C . DECIMER: towards deep learning for chemical image recognition. J Cheminform. 2020; 12(1):65. PMC: 7590713. DOI: 10.1186/s13321-020-00469-w. View

4.
Rajan K, Brinkhaus H, Zielesny A, Steinbeck C . A review of optical chemical structure recognition tools. J Cheminform. 2020; 12(1):60. PMC: 7541205. DOI: 10.1186/s13321-020-00465-0. View

5.
Clevert D, Le T, Winter R, Montanari F . Img2Mol - accurate SMILES recognition from molecular graphical depictions. Chem Sci. 2021; 12(42):14174-14181. PMC: 8565361. DOI: 10.1039/d1sc01839f. View