» Articles » PMID: 34500724

Automatic Identification of Analogue Series from Large Compound Data Sets: Methods and Applications

Overview
Journal Molecules
Publisher MDPI
Specialty Biology
Date 2021 Sep 10
PMID 34500724
Citations 4
Authors
Affiliations
Soon will be listed here.
Abstract

Analogue series play a key role in drug discovery. They arise naturally in lead optimization efforts where analogues are explored based on one or a few core structures. However, it is much harder to accurately identify and extract pairs or series of analogue molecules in large compound databases with no predefined core structures. This methodological review outlines the most common and recent methodological developments to automatically identify analogue series in large libraries. Initial approaches focused on using predefined rules to extract scaffold structures, such as the popular Bemis-Murcko scaffold. Later on, the matched molecular pair concept led to efficient algorithms to identify similar compounds sharing a common core structure by exploring many putative scaffolds for each compound. Further developments of these ideas yielded, on the one hand, approaches for hierarchical scaffold decomposition and, on the other hand, algorithms for the extraction of analogue series based on single-site modifications (so-called matched molecular series) by exploring potential scaffold structures based on systematic molecule fragmentation. Eventually, further development of these approaches resulted in methods for extracting analogue series defined by a single core structure with several substitution sites that allow convenient representations, such as R-group tables. These methods enable the efficient analysis of large data sets with hundreds of thousands or even millions of compounds and have spawned many related methodological developments.

Citing Articles

Cheminformatics and artificial intelligence for accelerating agrochemical discovery.

Djoumbou-Feunang Y, Wilmot J, Kinney J, Chanda P, Yu P, Sader A Front Chem. 2023; 11:1292027.

PMID: 38093816 PMC: 10716421. DOI: 10.3389/fchem.2023.1292027.


ZINC-22─A Free Multi-Billion-Scale Database of Tangible Compounds for Ligand Discovery.

Tingle B, Tang K, Castanon M, Gutierrez J, Khurelbaatar M, Dandarchuluun C J Chem Inf Model. 2023; 63(4):1166-1176.

PMID: 36790087 PMC: 9976280. DOI: 10.1021/acs.jcim.2c01253.


Data-Driven Approaches Used for Compound Library Design for the Treatment of Parkinson's Disease.

Barrera-Vazquez O, Santiago-de-la-Cruz J, Rivero-Segura N, Estrella-Parra E, Morales-Paoli G, Flores-Soto E Int J Mol Sci. 2023; 24(2).

PMID: 36674652 PMC: 9867512. DOI: 10.3390/ijms24021134.


Scaffold Generator: a Java library implementing molecular scaffold functionalities in the Chemistry Development Kit (CDK).

Schaub J, Zander J, Zielesny A, Steinbeck C J Cheminform. 2022; 14(1):79.

PMID: 36357931 PMC: 9650898. DOI: 10.1186/s13321-022-00656-x.

References
1.
Hu Y, Bajorath J . SAR Matrix Method for Large-Scale Analysis of Compound Structure-Activity Relationships and Exploration of Multitarget Activity Spaces. Methods Mol Biol. 2018; 1825:339-352. DOI: 10.1007/978-1-4939-8639-2_11. View

2.
Yonchev D, Vogt M, Stumpfe D, Kunimoto R, Miyao T, Bajorath J . Computational Assessment of Chemical Saturation of Analogue Series under Varying Conditions. ACS Omega. 2018; 3(11):15799-15808. PMC: 6288787. DOI: 10.1021/acsomega.8b02087. View

3.
Varin T, Schuffenhauer A, Ertl P, Renner S . Mining for bioactive scaffolds with scaffold networks: improved compound set enrichment from primary screening data. J Chem Inf Model. 2011; 51(7):1528-38. DOI: 10.1021/ci2000924. View

4.
Kanetaka H, Koseki Y, Taira J, Umei T, Komatsu H, Sakamoto H . Discovery of InhA inhibitors with anti-mycobacterial activity through a matched molecular pair approach. Eur J Med Chem. 2015; 94:378-85. DOI: 10.1016/j.ejmech.2015.02.062. View

5.
Takeuchi K, Kunimoto R, Bajorath J . Global Assessment of Substituents on the Basis of Analogue Series. J Med Chem. 2020; 63(23):15013-15020. DOI: 10.1021/acs.jmedchem.0c01607. View