CSAR Data Set Release 2012: Ligands, Affinities, Complexes, and Docking Decoys

Overview

Journal J Chem Inf Model

Publisher American Chemical Society

Specialties Chemistry
Medical Informatics

Date 2013 Apr 27

PMID 23617227

Citations 47

Authors

James B Dunbar Jr

Richard D Smith

Kelly L Damm-Ganamet

Aqeel Ahmed

Emilio Xavier Esposito

James Delproposto

Krishnapriya Chinnaswamy

You-Na Kang

Ginger Kubish

Jason E Gestwicki

Jeanne A Stuckey

Heather A Carlson

Affiliations

Soon will be listed here.

Abstract

A major goal in drug design is the improvement of computational methods for docking and scoring. The Community Structure Activity Resource (CSAR) has collected several data sets from industry and added in-house data sets that may be used for this purpose ( www.csardock.org). CSAR has currently obtained data from Abbott, GlaxoSmithKline, and Vertex and is working on obtaining data from several others. Combined with our in-house projects, we are providing a data set consisting of 6 protein targets, 647 compounds with biological affinities, and 82 crystal structures. Multiple congeneric series are available for several targets with a few representative crystal structures of each of the series. These series generally contain a few inactive compounds, usually not available in the literature, to provide an upper bound to the affinity range. The affinity ranges are typically 3-4 orders of magnitude per series. For our in-house projects, we have had compounds synthesized for biological testing. Affinities were measured by Thermofluor, Octet RED, and isothermal titration calorimetry for the most soluble. This allows the direct comparison of the biological affinities for those compounds, providing a measure of the variance in the experimental affinity. It appears that there can be considerable variance in the absolute value of the affinity, making the prediction of the absolute value ill-defined. However, the relative rankings within the methods are much better, and this fits with the observation that predicting relative ranking is a more tractable problem computationally. For those in-house compounds, we also have measured the following physical properties: logD, logP, thermodynamic solubility, and pK(a). This data set also provides a substantial decoy set for each target consisting of diverse conformations covering the entire active site for all of the 58 CSAR-quality crystal structures. The CSAR data sets (CSAR-NRC HiQ and the 2012 release) provide substantial, publically available, curated data sets for use in parametrizing and validating docking and scoring methods.

Citing Articles

Machine learning approaches for predicting protein-ligand binding sites from sequence data.

Vural O, Jololian L Front Bioinform. 2025; 5:1520382.

PMID: 39963299 PMC: 11830693. DOI: 10.3389/fbinf.2025.1520382.

Rationalizing protein-ligand interactions via the effective fragment potential method and structural data from classical molecular dynamics.

Urbina A, Slipchenko L J Chem Phys. 2025; 162(4).

PMID: 39868918 PMC: 11774556. DOI: 10.1063/5.0247878.

A 4D tensor-enhanced multi-dimensional convolutional neural network for accurate prediction of protein-ligand binding affinity.

Huang D, Wang Y, Sun Y, Ji W, Zhang Q, Jiang Y Mol Divers. 2024; .

PMID: 39714563 DOI: 10.1007/s11030-024-11044-y.

Addressing docking pose selection with structure-based deep learning: Recent advances, challenges and opportunities.

Vittorio S, Lunghini F, Morerio P, Gadioli D, Orlandini S, Silva P Comput Struct Biotechnol J. 2024; 23:2141-2151.

PMID: 38827235 PMC: 11141151. DOI: 10.1016/j.csbj.2024.05.024.

Structure-based, deep-learning models for protein-ligand binding affinity prediction.

Wang D, Wu W, Wang R J Cheminform. 2024; 16(1):2.

PMID: 38173000 PMC: 10765576. DOI: 10.1186/s13321-023-00795-9.

References

Jecklin M, Schauer S, Dumelin C, Zenobi R . Label-free determination of protein-ligand binding constants using mass spectrometry and validation using surface plasmon resonance and isothermal titration calorimetry. J Mol Recognit. 2009; 22(4):319-29. DOI: 10.1002/jmr.951. View

Moustakas D, Lang P, Pegg S, Pettersen E, Kuntz I, Brooijmans N . Development and validation of a modular, extensible docking program: DOCK 5. J Comput Aided Mol Des. 2006; 20(10-11):601-19. DOI: 10.1007/s10822-006-9060-4. View

Dunbar Jr J, Smith R, Yang C, Ung P, Lexa K, Khazanov N . CSAR benchmark exercise of 2010: selection of the protein-ligand complexes. J Chem Inf Model. 2011; 51(9):2036-46. PMC: 3180202. DOI: 10.1021/ci200082t. View

Huang S, Zou X . Construction and test of ligand decoy sets using MDock: community structure-activity resource benchmarks for binding mode prediction. J Chem Inf Model. 2011; 51(9):2107-14. PMC: 3190646. DOI: 10.1021/ci200080g. View

Bruno I, Cole J, Kessler M, Luo J, Motherwell W, Purkis L . Retrieval of crystallographically-derived molecular geometry information. J Chem Inf Comput Sci. 2004; 44(6):2133-44. DOI: 10.1021/ci049780b. View

Abdiche Y, Malashock D, Pinkerton A, Pons J . Determining kinetics and affinities of protein interactions using a parallel real-time label-free biosensor, the Octet. Anal Biochem. 2008; 377(2):209-17. DOI: 10.1016/j.ab.2008.03.035. View

Perola E, Charifson P . Conformational analysis of drug-like molecules bound to proteins: an extensive study of ligand reorganization upon binding. J Med Chem. 2004; 47(10):2499-510. DOI: 10.1021/jm030563w. View

Corbeil C, Williams C, Labute P . Variability in docking success rates due to dataset preparation. J Comput Aided Mol Des. 2012; 26(6):775-86. PMC: 3397132. DOI: 10.1007/s10822-012-9570-1. View

DesJarlais R, Sheridan R, Seibel G, Dixon J, Kuntz I, Venkataraghavan R . Using shape complementarity as an initial screen in designing ligands for a receptor binding site of known three-dimensional structure. J Med Chem. 1988; 31(4):722-9. DOI: 10.1021/jm00399a006. View

10.

Woltosz W . If we designed airplanes like we design drugs. J Comput Aided Mol Des. 2011; 26(1):159-63. PMC: 3268976. DOI: 10.1007/s10822-011-9490-5. View

11.

Stouch T . The errors of our ways: taking account of error in computer-aided drug design to build confidence intervals for our next 25 years. J Comput Aided Mol Des. 2012; 26(1):125-34. DOI: 10.1007/s10822-012-9541-6. View

12.

Damm-Ganamet K, Smith R, Dunbar Jr J, Stuckey J, Carlson H . CSAR benchmark exercise 2011-2012: evaluation of results from docking and relative ranking of blinded congeneric series. J Chem Inf Model. 2013; 53(8):1853-70. PMC: 3753884. DOI: 10.1021/ci400025f. View

13.

Jones G, Willett P, Glen R, Leach A, Taylor R . Development and validation of a genetic algorithm for flexible docking. J Mol Biol. 1997; 267(3):727-48. DOI: 10.1006/jmbi.1996.0897. View

14.

Friesner R, Murphy R, Repasky M, Frye L, Greenwood J, Halgren T . Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. J Med Chem. 2006; 49(21):6177-96. DOI: 10.1021/jm051256o. View

15.

Berman H, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H . The Protein Data Bank. Nucleic Acids Res. 1999; 28(1):235-42. PMC: 102472. DOI: 10.1093/nar/28.1.235. View

16.

Kleywegt G, Harris M, Zou J, Taylor T, Wahlby A, Jones T . The Uppsala Electron-Density Server. Acta Crystallogr D Biol Crystallogr. 2004; 60(Pt 12 Pt 1):2240-9. DOI: 10.1107/S0907444904013253. View

17.

Day Y, Baird C, Rich R, Myszka D . Direct comparison of binding equilibrium, thermodynamic, and rate constants determined by surface- and solution-based biophysical methods. Protein Sci. 2002; 11(5):1017-25. PMC: 2373566. DOI: 10.1110/ps.4330102. View

18.

Hartshorn M, Verdonk M, Chessari G, Brewerton S, Mooij W, Mortenson P . Diverse, high-quality test set for the validation of protein-ligand docking performance. J Med Chem. 2007; 50(4):726-41. DOI: 10.1021/jm061277y. View

19.

Blaney J . A very short history of structure-based design: how did we get here and where do we need to go?. J Comput Aided Mol Des. 2011; 26(1):13-4. DOI: 10.1007/s10822-011-9518-x. View

20.

Segall M . Can we really do computer-aided drug design?. J Comput Aided Mol Des. 2011; 26(1):121-4. DOI: 10.1007/s10822-011-9512-3. View