Introduction of the Conditional Correlated Bernoulli Model of Similarity Value Distributions and Its Application to the Prospective Prediction of Fingerprint Search Performance
Overview
Medical Informatics
Authors
Affiliations
A statistical approach named the conditional correlated Bernoulli model is introduced for modeling of similarity scores and predicting the potential of fingerprint search calculations to identify active compounds. Fingerprint features are rationalized as dependent Bernoulli variables and conditional distributions of Tanimoto similarity values of database compounds given a reference molecule are assessed. The conditional correlated Bernoulli model is utilized in the context of virtual screening to estimate the position of a compound obtaining a certain similarity value in a database ranking. Through the generation of receiver operating characteristic curves from cumulative distribution functions of conditional similarity values for known active and random database compounds, one can predict how successful a fingerprint search might be. The comparison of curves for different fingerprints makes it possible to identify fingerprints that are most likely to identify new active molecules in a database search given a set of known reference molecules.
ccbmlib - a Python package for modeling Tanimoto similarity value distributions.
Vogt M, Bajorath J F1000Res. 2020; 9.
PMID: 32161645 PMC: 7050271. DOI: 10.12688/f1000research.22292.2.
Prediction of Compound Profiling Matrices Using Machine Learning.
Rodriguez-Perez R, Miyao T, Jasial S, Vogt M, Bajorath J ACS Omega. 2018; 3(4):4713-4723.
PMID: 30023899 PMC: 6045364. DOI: 10.1021/acsomega.8b00462.
Activity-relevant similarity values for fingerprints and implications for similarity searching.
Jasial S, Hu Y, Vogt M, Bajorath J F1000Res. 2016; 5.
PMID: 27127620 PMC: 4830209. DOI: 10.12688/f1000research.8357.2.