» Articles » PMID: 32161645

Ccbmlib - a Python Package for Modeling Tanimoto Similarity Value Distributions

Overview
Journal F1000Res
Date 2020 Mar 18
PMID 32161645
Citations 5
Authors
Affiliations
Soon will be listed here.
Abstract

The ccbmlib Python package is a collection of modules for modeling similarity value distributions based on Tanimoto coefficients for fingerprints available in RDKit. It can be used to assess the statistical significance of Tanimoto coefficients and evaluate how molecular similarity is reflected when different fingerprint representations are used. Significance measures derived from -values allow a quantitative comparison of similarity scores obtained from different fingerprint representations that might have very different value ranges. Furthermore, the package models conditional distributions of similarity coefficients for a given reference compound. The conditional significance score estimates where a test compound would be ranked in a similarity search. The models are based on the statistical analysis of feature distributions and feature correlations of fingerprints of a reference database. The resulting models have been evaluated for 11 RDKit fingerprints, taking a collection of ChEMBL compounds as a reference data set. For most fingerprints, highly accurate models were obtained, with differences of 1% or less for Tanimoto coefficients indicating high similarity.

Citing Articles

The anti-inflammatory activity of probiotic to activate Sirtuin-1 in inhibiting diabetic nephropathy progression.

Amelia R, Said F, Yasmin F, Harun H, Tofrizal T J Diabetes Metab Disord. 2023; 22(2):1425-1442.

PMID: 37975108 PMC: 10638242. DOI: 10.1007/s40200-023-01265-7.


Repurposing Drugs for Inhibition against ALDH2 via a 2D/3D Ligand-Based Similarity Search and Molecular Simulation.

Jiang W, Chen J, Zhang P, Zheng N, Ma L, Zhang Y Molecules. 2023; 28(21).

PMID: 37959744 PMC: 10650273. DOI: 10.3390/molecules28217325.


Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments.

Ucak U, Ashyrmamatov I, Ko J, Lee J Nat Commun. 2022; 13(1):1186.

PMID: 35246540 PMC: 8897428. DOI: 10.1038/s41467-022-28857-w.


Pharmacological targeting of Sam68 functions in colorectal cancer stem cells.

Masibag A, Bergin C, Haebe J, Zouggar A, Shah M, Sandouka T iScience. 2021; 24(12):103442.

PMID: 34877499 PMC: 8633986. DOI: 10.1016/j.isci.2021.103442.


ccbmlib - a Python package for modeling Tanimoto similarity value distributions.

Vogt M, Bajorath J F1000Res. 2020; 9.

PMID: 32161645 PMC: 7050271. DOI: 10.12688/f1000research.22292.2.

References
1.
Maggiora G, Bajorath J . Chemical space networks: a powerful new paradigm for the description of chemical space. J Comput Aided Mol Des. 2014; 28(8):795-802. DOI: 10.1007/s10822-014-9760-0. View

2.
Vogt M, Bajorath J . ccbmlib - a Python package for modeling Tanimoto similarity value distributions. F1000Res. 2020; 9. PMC: 7050271. DOI: 10.12688/f1000research.22292.2. View

3.
Willett P . Combination of similarity rankings using data fusion. J Chem Inf Model. 2013; 53(1):1-10. DOI: 10.1021/ci300547g. View

4.
Eckert H, Bajorath J . Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. Drug Discov Today. 2007; 12(5-6):225-33. DOI: 10.1016/j.drudis.2007.01.011. View

5.
Gedeck P, Rohde B, Bartels C . QSAR--how good is it in practice? Comparison of descriptor sets on an unbiased cross section of corporate data sets. J Chem Inf Model. 2006; 46(5):1924-36. DOI: 10.1021/ci050413p. View