TCMEval-SDT: a Benchmark Dataset for Syndrome Differentiation Thought of Traditional Chinese Medicine
Affiliations
This paper presents a large publicly available benchmark dataset (TCMEval-SDT) for the thought process involved in syndrome differentiation in traditional Chinese medicine (TCM). The dataset consists of 300 TCM syndrome diagnosis cases sourced from the internet, classical Chinese medical texts, and medical records from hospitals, with metadata adhering to the Findable, Accessible, Interoperable, and Reusable (FAIR) principles. Each case has been annotated and curated by TCM experts and includes medical record ID, clinical data, explanatory summary, TCM syndrome, clinical information, and TCM pathogenesis, to support algorithms or models in emulating the diagnostic process of TCM clinicians. To provide a comprehensive description of the TCM syndrome diagnosis process, we summarize the diagnosis into four steps: (1) clinical information extraction, (2) TCM pathogenesis reasoning, (3) TCM syndrome reasoning, and (4) explanatory summary. We have also established validation criteria to evaluate their ability in TCM clinical diagnosis using this dataset. To facilitate research and evaluation in syndrome diagnosis of TCM, the TCMEval-SDT dataset is made publicly available under the CC-BY 4.0 license.