A Maximum Common Substructure-based Algorithm for Searching and Predicting Drug-like Compounds

Overview

Journal Bioinformatics

Publisher Oxford University Press

Specialty Biology

Date 2008 Jul 1

PMID 18586736

Citations 56

Authors

Yiqun Cao

Tao Jiang

Thomas Girke

Affiliations

Soon will be listed here.

Abstract

Motivation: The prediction of biologically active compounds is of great importance for high-throughput screening (HTS) approaches in drug discovery and chemical genomics. Many computational methods in this area focus on measuring the structural similarities between chemical structures. However, traditional similarity measures are often too rigid or consider only global similarities between structures. The maximum common substructure (MCS) approach provides a more promising and flexible alternative for predicting bioactive compounds.

Results: In this article, a new backtracking algorithm for MCS is proposed and compared to global similarity measurements. Our algorithm provides high flexibility in the matching process, and it is very efficient in identifying local structural similarities. To predict and cluster biologically active compounds more efficiently, the concept of basis compounds is proposed that enables researchers to easily combine the MCS-based and traditional similarity measures with modern machine learning techniques. Support vector machines (SVMs) are used to test how the MCS-based similarity measure and the basis compound vectorization method perform on two empirically tested datasets. The test results show that MCS complements the well-known atom pair descriptor-based similarity measure. By combining these two measures, our SVM-based model predicts the biological activities of chemical compounds with higher specificity and sensitivity.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Molecular design of hydroxamic acid-based derivatives as urease inhibitors of Helicobacter pylori.

Wang N, Wu X, Liang J, Liu B, Wang B Mol Divers. 2024; 28(4):2229-2244.

PMID: 39020133 DOI: 10.1007/s11030-024-10914-9.

Chemical species recognition in an adaptive radiation of Hawaiian spiders (Araneae: Tetragnathidae).

Adams S, Gurajapu A, Qiang A, Gerbaulet M, Schulz S, Tsutsui N Proc Biol Sci. 2024; 291(2020):20232340.

PMID: 38593845 PMC: 11003775. DOI: 10.1098/rspb.2023.2340.

Decomposition of an odorant in olfactory perception and neural representation.

Ye Y, Wang Y, Zhuang Y, Tan H, Zuo Z, Yun H Nat Hum Behav. 2024; 8(6):1150-1162.

PMID: 38499771 DOI: 10.1038/s41562-024-01849-0.

Synergistic acceleration of machine learning and molecular docking for prostate-specific antigen ligand design.

Lin S, Chen Y, Liu R, Zhu M, Zhu T, Wang M RSC Adv. 2024; 14(12):8240-8250.

PMID: 38482069 PMC: 10936200. DOI: 10.1039/d3ra08550c.

Automated de Novo Design of Olefin Metathesis Catalysts: Computational and Experimental Analysis of a Simple Thermodynamic Design Criterion.

Foscato M, Occhipinti G, Hopen Eliasson S, Jensen V J Chem Inf Model. 2024; 64(2):412-424.

PMID: 38247361 PMC: 10806812. DOI: 10.1021/acs.jcim.3c01649.

References

Wheeler D, Barrett T, Benson D, Bryant S, Canese K, Chetvernin V . Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2006; 35(Database issue):D5-12. PMC: 1781113. DOI: 10.1093/nar/gkl1031. View

Dobson C . Chemical space and biology. Nature. 2004; 432(7019):824-8. DOI: 10.1038/nature03192. View

King R, Srinivasan A . The discovery of indicator variables for QSAR using inductive logic programming. J Comput Aided Mol Des. 1998; 11(6):571-80. DOI: 10.1023/a:1007967728701. View

Altman D, Bland J . Diagnostic tests 2: Predictive values. BMJ. 1994; 309(6947):102. PMC: 2540558. DOI: 10.1136/bmj.309.6947.102. View

Blower P, Cross K, Eichler G, Myatt G, Weinstein J, Yang C . Comparison of methods for sequential screening of large compound sets. Comb Chem High Throughput Screen. 2006; 9(2):115-22. DOI: 10.2174/138620706775541882. View

Cheng A, Coleman R, Smyth K, Cao Q, Soulard P, Caffrey D . Structure-based maximal affinity model predicts small-molecule druggability. Nat Biotechnol. 2007; 25(1):71-5. DOI: 10.1038/nbt1273. View

Wang Y, Fan K, Horng J . Genetic-based search for error-correcting graph isomorphism. IEEE Trans Syst Man Cybern B Cybern. 1997; 27(4):588-97. DOI: 10.1109/3477.604100. View

Engels M, Venkatarangan P . Smart screening: approaches to efficient HTS. Curr Opin Drug Discov Devel. 2001; 4(3):275-83. View

Sheridan R, Kearsley S . Why do we need so many chemical similarity search methods?. Drug Discov Today. 2003; 7(17):903-11. DOI: 10.1016/s1359-6446(02)02411-x. View

10.

Chen X, Reynolds C . Performance of similarity measures in 2D fragment-based similarity searching: comparison of structural descriptors and similarity coefficients. J Chem Inf Comput Sci. 2002; 42(6):1407-14. DOI: 10.1021/ci025531g. View

11.

Raymond J, Gardiner E, Willett P . Heuristics for similarity searching of chemical graphs using a maximum common edge subgraph algorithm. J Chem Inf Comput Sci. 2002; 42(2):305-16. DOI: 10.1021/ci010381f. View

12.

Girke T, Cheng L, Raikhel N . ChemMine. A compound mining database for chemical genomics. Plant Physiol. 2005; 138(2):573-7. PMC: 1150377. DOI: 10.1104/pp.105.062687. View