DECIMER-hand-drawn Molecule Images Dataset
Overview
Affiliations
The translation of images of chemical structures into machine-readable representations of the depicted molecules is known as optical chemical structure recognition (OCSR). There has been a lot of progress over the last three decades in this field, but the development of systems for the recognition of complex hand-drawn structure depictions is still at the beginning. Currently, there is no data for the systematic evaluation of OCSR methods on hand-drawn structures available. Here we present DECIMER - Hand-drawn molecule images, a standardised, openly available benchmark dataset of 5088 hand-drawn depictions of diversely picked chemical structures. Every structure depiction in the dataset is mapped to a machine-readable representation of the underlying molecule. The dataset is openly available and published under the CC-BY 4.0 licence which applies very few limitations. We hope that it will contribute to the further development of the field.
Digitize-HCD: A dataset for digitization of handwritten circuit diagrams.
Ahmed N, Adnan M, Shafiullah A, Parash H, Rahman M, Akib I Data Brief. 2025; 59:111315.
PMID: 39931092 PMC: 11808513. DOI: 10.1016/j.dib.2025.111315.
Advancements in hand-drawn chemical structure recognition through an enhanced DECIMER architecture.
Rajan K, Brinkhaus H, Zielesny A, Steinbeck C J Cheminform. 2024; 16(1):78.
PMID: 38970120 PMC: 11227129. DOI: 10.1186/s13321-024-00872-7.
MMSSC-Net: multi-stage sequence cognitive networks for drug molecule recognition.
Zhang D, Zhao D, Wang Z, Li J, Li J RSC Adv. 2024; 14(26):18182-18191.
PMID: 38854833 PMC: 11155551. DOI: 10.1039/d4ra02442g.
HD_BPMDS: a curated binary pattern multitarget dataset of Huntington's disease-targeting agents.
Stefan S, Pahnke J, Namasivayam V J Cheminform. 2023; 15(1):109.
PMID: 37978560 PMC: 10655317. DOI: 10.1186/s13321-023-00775-z.
Rajan K, Brinkhaus H, Agea M, Zielesny A, Steinbeck C Nat Commun. 2023; 14(1):5045.
PMID: 37598180 PMC: 10439916. DOI: 10.1038/s41467-023-40782-0.