» Articles » PMID: 23617227

CSAR Data Set Release 2012: Ligands, Affinities, Complexes, and Docking Decoys

Abstract

A major goal in drug design is the improvement of computational methods for docking and scoring. The Community Structure Activity Resource (CSAR) has collected several data sets from industry and added in-house data sets that may be used for this purpose ( www.csardock.org). CSAR has currently obtained data from Abbott, GlaxoSmithKline, and Vertex and is working on obtaining data from several others. Combined with our in-house projects, we are providing a data set consisting of 6 protein targets, 647 compounds with biological affinities, and 82 crystal structures. Multiple congeneric series are available for several targets with a few representative crystal structures of each of the series. These series generally contain a few inactive compounds, usually not available in the literature, to provide an upper bound to the affinity range. The affinity ranges are typically 3-4 orders of magnitude per series. For our in-house projects, we have had compounds synthesized for biological testing. Affinities were measured by Thermofluor, Octet RED, and isothermal titration calorimetry for the most soluble. This allows the direct comparison of the biological affinities for those compounds, providing a measure of the variance in the experimental affinity. It appears that there can be considerable variance in the absolute value of the affinity, making the prediction of the absolute value ill-defined. However, the relative rankings within the methods are much better, and this fits with the observation that predicting relative ranking is a more tractable problem computationally. For those in-house compounds, we also have measured the following physical properties: logD, logP, thermodynamic solubility, and pK(a). This data set also provides a substantial decoy set for each target consisting of diverse conformations covering the entire active site for all of the 58 CSAR-quality crystal structures. The CSAR data sets (CSAR-NRC HiQ and the 2012 release) provide substantial, publically available, curated data sets for use in parametrizing and validating docking and scoring methods.

Citing Articles

Machine learning approaches for predicting protein-ligand binding sites from sequence data.

Vural O, Jololian L Front Bioinform. 2025; 5:1520382.

PMID: 39963299 PMC: 11830693. DOI: 10.3389/fbinf.2025.1520382.


Rationalizing protein-ligand interactions via the effective fragment potential method and structural data from classical molecular dynamics.

Urbina A, Slipchenko L J Chem Phys. 2025; 162(4).

PMID: 39868918 PMC: 11774556. DOI: 10.1063/5.0247878.


A 4D tensor-enhanced multi-dimensional convolutional neural network for accurate prediction of protein-ligand binding affinity.

Huang D, Wang Y, Sun Y, Ji W, Zhang Q, Jiang Y Mol Divers. 2024; .

PMID: 39714563 DOI: 10.1007/s11030-024-11044-y.


Addressing docking pose selection with structure-based deep learning: Recent advances, challenges and opportunities.

Vittorio S, Lunghini F, Morerio P, Gadioli D, Orlandini S, Silva P Comput Struct Biotechnol J. 2024; 23:2141-2151.

PMID: 38827235 PMC: 11141151. DOI: 10.1016/j.csbj.2024.05.024.


Structure-based, deep-learning models for protein-ligand binding affinity prediction.

Wang D, Wu W, Wang R J Cheminform. 2024; 16(1):2.

PMID: 38173000 PMC: 10765576. DOI: 10.1186/s13321-023-00795-9.


References
1.
Jecklin M, Schauer S, Dumelin C, Zenobi R . Label-free determination of protein-ligand binding constants using mass spectrometry and validation using surface plasmon resonance and isothermal titration calorimetry. J Mol Recognit. 2009; 22(4):319-29. DOI: 10.1002/jmr.951. View

2.
Moustakas D, Lang P, Pegg S, Pettersen E, Kuntz I, Brooijmans N . Development and validation of a modular, extensible docking program: DOCK 5. J Comput Aided Mol Des. 2006; 20(10-11):601-19. DOI: 10.1007/s10822-006-9060-4. View

3.
Dunbar Jr J, Smith R, Yang C, Ung P, Lexa K, Khazanov N . CSAR benchmark exercise of 2010: selection of the protein-ligand complexes. J Chem Inf Model. 2011; 51(9):2036-46. PMC: 3180202. DOI: 10.1021/ci200082t. View

4.
Huang S, Zou X . Construction and test of ligand decoy sets using MDock: community structure-activity resource benchmarks for binding mode prediction. J Chem Inf Model. 2011; 51(9):2107-14. PMC: 3190646. DOI: 10.1021/ci200080g. View

5.
Bruno I, Cole J, Kessler M, Luo J, Motherwell W, Purkis L . Retrieval of crystallographically-derived molecular geometry information. J Chem Inf Comput Sci. 2004; 44(6):2133-44. DOI: 10.1021/ci049780b. View