» Articles » PMID: 38388768

A Machine Learning Driven Automated System for Safety Data Sheet Indexing

Overview
Journal Sci Rep
Specialty Science
Date 2024 Feb 22
PMID 38388768
Authors
Affiliations
Soon will be listed here.
Abstract

Safety Data Sheets (SDS) are foundational to chemical management systems and are used in a wide variety of applications such as green chemistry, industrial hygiene, and regulatory compliance, among others within the Environment, Health, and Safety (EHS) and the Environment, Social, and Governance (ESG) domains. Companies usually prefer to have key pieces of information extracted from these datasheets and stored in an easy to access structured repository. This process is referred to as SDS "indexing". Historically, SDS indexing has always been done manually, which is labor-intensive, time-consuming, and costly. In this paper, we present an automated system to index the composition information of chemical products from SDS documents using a multi-stage ensemble method with a combination of machine learning models and rule-based systems stacked one after the other. The system specifically indexes the ingredient names, their corresponding Chemical Abstracts Service (CAS) numbers, and weight percentages. It takes the SDS document in PDF format as the input and gives the list of ingredient names along with their corresponding CAS numbers and weight percentages in a tabular format as the output. The system achieves a precision of 0.93 at the document level when evaluated on 20,000 SDS documents annotated for this purpose.

Citing Articles

A machine learning driven automated system to extract multiple information fields from safety data sheet documents.

Khan M, Penfield J, Suman A, Crowell S Heliyon. 2025; 11(4):e42215.

PMID: 40028543 PMC: 11872447. DOI: 10.1016/j.heliyon.2025.e42215.

References
1.
Cai Z, Vasconcelos N . Cascade R-CNN: High Quality Object Detection and Instance Segmentation. IEEE Trans Pattern Anal Mach Intell. 2019; 43(5):1483-1498. DOI: 10.1109/TPAMI.2019.2956516. View

2.
Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y . Deep High-Resolution Representation Learning for Visual Recognition. IEEE Trans Pattern Anal Mach Intell. 2020; 43(10):3349-3364. DOI: 10.1109/TPAMI.2020.2983686. View

3.
Jacobs A, Williams D, Hickey K, Patrick N, Williams A, Chalk S . CAS Common Chemistry in 2021: Expanding Access to Trusted Chemical Information for the Scientific Community. J Chem Inf Model. 2022; 62(11):2737-2743. PMC: 9199008. DOI: 10.1021/acs.jcim.2c00268. View