» Articles » PMID: 16180913

A Discussion of Measures of Enrichment in Virtual Screening: Comparing the Information Content of Descriptors with Increasing Levels of Sophistication

Overview
Date 2005 Sep 27
PMID 16180913
Citations 55
Authors
Affiliations
Soon will be listed here.
Abstract

We have performed virtual screening using some very simple features, by employing the number of atoms per element as molecular descriptors but without regard to any structural information whatsoever. Surprisingly, these atom counts are able to outperform virtual-affinity-based fingerprints and Unity fingerprints in some activity classes. Although molecular weight and other biases were known in target-based virtual screening settings (docking), we report the effect of using very simple descriptors for ligand-based virtual screening, by using clearly defined biological targets and employing a large data set (>100,000 compounds) containing multiple (11) activity classes. Structure-unaware atom count vectors as descriptors in combination with the Euclidean distance measure are able to achieve "enrichment factors" over random selection of around 4 (depending on the particular class of active compounds), putting the enrichment factors reported for more sophisticated virtual screening methods in a different light. They are also able to retrieve active compounds with novel scaffolds instead of merely the expected structural analogues. The added value of many currently used virtual screening methods (calculated as enrichment factors) drops down to a factor of between 1 and 2, instead of often reported double-digit figures. The observed effect is much less profound for simple descriptors such as molecular weight and is only present in cases of atypical (larger) ligands. The current state of virtual screening is not as sophisticated as might be expected, which is due to descriptors still not being able to capture structural properties relevant to binding. This fact can partly be explained by highly nonlinear structure-activity relationships, which represent a severe limitation of the "similar property principle" in the context of bioactivity.

Citing Articles

HDBind: encoding of molecular structure with hyperdimensional binary representations.

Jones D, Zhang X, Bennion B, Pinge S, Xu W, Kang J Sci Rep. 2024; 14(1):29025.

PMID: 39578580 PMC: 11584749. DOI: 10.1038/s41598-024-80009-w.


Identifying compound-protein interactions with knowledge graph embedding of perturbation transcriptomics.

Ni S, Kong X, Zhang Y, Chen Z, Wang Z, Fu Z Cell Genom. 2024; 4(10):100655.

PMID: 39303708 PMC: 11602590. DOI: 10.1016/j.xgen.2024.100655.


Gram matrix: an efficient representation of molecular conformation and learning objective for molecular pretraining.

Xiang W, Zhong F, Ni L, Zheng M, Li X, Shi Q Brief Bioinform. 2024; 25(4).

PMID: 38990515 PMC: 11238115. DOI: 10.1093/bib/bbae340.


Discovery of novel cholesteryl ester transfer protein (CETP) inhibitors by a multi-stage virtual screening.

Liu Y, Deng L, Ding F, Wang Q, Zhang S, Mi N BMC Chem. 2024; 18(1):95.

PMID: 38702788 PMC: 11069292. DOI: 10.1186/s13065-024-01192-5.


Hit discovery of potential CDK8 inhibitors and analysis of amino acid mutations for cancer therapy through computer-aided drug discovery.

Aghahasani R, Shiri F, Kamaladiny H, Haddadi F, Pirhadi S BMC Chem. 2024; 18(1):73.

PMID: 38615023 PMC: 11016228. DOI: 10.1186/s13065-024-01175-6.