LINGO, an Efficient Holographic Text Based Method to Calculate Biophysical Properties and Intermolecular Similarities
Overview
Medical Informatics
Authors
Affiliations
SMILES strings are the most compact text based molecular representations. Implicitly they contain the information needed to compute all kinds of molecular structures and, thus, molecular properties derived from these structures. We show that this implicit information can be accessed directly at SMILES string level without the need to apply explicit time-consuming conversion of the SMILES strings into molecular graphs or 3D structures with subsequent 2D or 3D QSPR calculations. Our method is based on the fragmentation of SMILES strings into overlapping substrings of a defined size that we call LINGOs. The integral set of LINGOs derived from a given SMILES string, the LINGO profile, is a hologram of the SMILES representation of the molecule described. LINGO profiles provide input for QSPR models and the calculation of intermolecular similarities at very low computational cost. The octanol/water partition coefficient (LlogP) QSPR model achieved a correlation coefficient R2=0.93, a root-mean-square error RRMS=0.49 log units, a goodness of prediction correlation coefficient Q2=0.89 and a QRMS=0.61 log units. The intrinsic aqueous solubility (LlogS) QSPR model achieved correlation coefficient values of R2=0.91, Q2=0.82, and RRMS=0.60 and QRMS=0.89 log units. Integral Tanimoto coefficients computed from LINGO profiles provided sharp discrimination between random and bioisoster pairs extracted from Accelrys Bioster Database. Average similarities (LINGOsim) were 0.07 for the random pairs and 0.36 for the bioisosteric pairs.
Andronov M, Andronova N, Wand M, Schmidhuber J, Clevert D J Cheminform. 2025; 17(1):31.
PMID: 40065398 PMC: 11895308. DOI: 10.1186/s13321-025-00974-w.
An antimicrobial drug recommender system using MALDI-TOF MS and dual-branch neural networks.
De Waele G, Menschaert G, Waegeman W Elife. 2024; 13.
PMID: 39540875 PMC: 11563574. DOI: 10.7554/eLife.93242.
Do Molecular Fingerprints Identify Diverse Active Drugs in Large-Scale Virtual Screening? (No).
Venkatraman V, Gaiser J, Demekas D, Roy A, Xiong R, Wheeler T Pharmaceuticals (Basel). 2024; 17(8).
PMID: 39204097 PMC: 11356940. DOI: 10.3390/ph17080992.
Drug repositioning based on residual attention network and free multiscale adversarial training.
Li G, Li S, Liang C, Xiao Q, Luo J BMC Bioinformatics. 2024; 25(1):261.
PMID: 39118000 PMC: 11308596. DOI: 10.1186/s12859-024-05893-5.
Effectiveness of molecular fingerprints for exploring the chemical space of natural products.
Boldini D, Ballabio D, Consonni V, Todeschini R, Grisoni F, Sieber S J Cheminform. 2024; 16(1):35.
PMID: 38528548 PMC: 10964529. DOI: 10.1186/s13321-024-00830-3.