» Articles » PMID: 25687211

The Chemical Space Project

Overview
Journal Acc Chem Res
Specialty Chemistry
Date 2015 Feb 18
PMID 25687211
Citations 171
Authors
Affiliations
Soon will be listed here.
Abstract

One of the simplest questions that can be asked about molecular diversity is how many organic molecules are possible in total? To answer this question, my research group has computationally enumerated all possible organic molecules up to a certain size to gain an unbiased insight into the entire chemical space. Our latest database, GDB-17, contains 166.4 billion molecules of up to 17 atoms of C, N, O, S, and halogens, by far the largest small molecule database reported to date. Molecules allowed by valency rules but unstable or nonsynthesizable due to strained topologies or reactive functional groups were not considered, which reduced the enumeration by at least 10 orders of magnitude and was essential to arrive at a manageable database size. Despite these restrictions, GDB-17 is highly relevant with respect to known molecules. Beyond enumeration, understanding and exploiting GDBs (generated databases) led us to develop methods for virtual screening and visualization of very large databases in the form of a "periodic system of molecules" comprising six different fingerprint spaces, with web-browsers for nearest neighbor searches, and the MQN- and SMIfp-Mapplet application for exploring color-coded principal component maps of GDB and other large databases. Proof-of-concept applications of GDB for drug discovery were realized by combining virtual screening with chemical synthesis and activity testing for neurotransmitter receptor and transporter ligands. One surprising lesson from using GDB for drug analog searches is the incredible depth of chemical space, that is, the fact that millions of very close analogs of any molecule can be readily identified by nearest-neighbor searches in the MQN-space of the various GDBs. The chemical space project has opened an unprecedented door on chemical diversity. Ongoing and yet unmet challenges concern enumerating molecules beyond 17 atoms and synthesizing GDB molecules with innovative scaffolds and pharmacophores.

Citing Articles

Growth vs. Diversity: A Time-Evolution Analysis of the Chemical Space.

Lopez Perez K, Lopez-Lopez E, Soulage F, Felix E, Medina-Franco J, Alain Miranda-Quintana R bioRxiv. 2025; .

PMID: 40027807 PMC: 11870478. DOI: 10.1101/2025.02.18.638937.


Smart distributed data factory volunteer computing platform for active learning-driven molecular data acquisition.

Ghukasyan T, Altunyan V, Bughdaryan A, Aghajanyan T, Smbatyan K, Papoian G Sci Rep. 2025; 15(1):7122.

PMID: 40016468 PMC: 11868574. DOI: 10.1038/s41598-025-90981-6.


The evolution and application of RNA-focused small molecule libraries.

Taghavi A, Springer N, Zanon P, Li Y, Li C, Childs-Disney J RSC Chem Biol. 2025; .

PMID: 39957993 PMC: 11824871. DOI: 10.1039/d4cb00272e.


MAYA (Multiple ActivitY Analyzer): An Open Access Tool to Explore Structure-Multiple Activity Relationships in the Chemical Universe.

Espinoza-Castaneda J, Medina-Franco J Mol Inform. 2025; 44(2):e202400306.

PMID: 39932235 PMC: 11812492. DOI: 10.1002/minf.202400306.


4,6-Disubstituted pyrimidine-based microtubule affinity-regulating kinase 4 (MARK4) inhibitors: synthesis, characterization, activity and studies.

Haque A, Alenezi K, Rasheed M, Rahman M, Anwar S, Ahamad S Front Pharmacol. 2025; 15:1517504.

PMID: 39902071 PMC: 11788324. DOI: 10.3389/fphar.2024.1517504.