Representations of Lipid Nanoparticles Using Large Language Models for Transfection Efficiency Prediction
Overview
Authors
Affiliations
Motivation: Lipid nanoparticles (LNPs) are the most widely used vehicles for mRNA vaccine delivery. The structure of the lipids composing the LNPs can have a major impact on the effectiveness of the mRNA payload. Several properties should be optimized to improve delivery and expression including biodegradability, synthetic accessibility, and transfection efficiency.
Results: To optimize LNPs, we developed and tested models that enable the virtual screening of LNPs with high transfection efficiency. Our best method uses the lipid Simplified Molecular-Input Line-Entry System (SMILES) as inputs to a large language model. Large language model-generated embeddings are then used by a downstream gradient-boosting classifier. As we show, our method can more accurately predict lipid properties, which could lead to higher efficiency and reduced experimental time and costs.
Availability And Implementation: Code and data links available at: https://github.com/Sanofi-Public/LipoBART.
Computational Methods for Modeling Lipid-Mediated Active Pharmaceutical Ingredient Delivery.
Paloncyova M, Valerio M, Dos Santos R, Kuhrova P, Srejber M, cechova P Mol Pharm. 2025; 22(3):1110-1141.
PMID: 39879096 PMC: 11881150. DOI: 10.1021/acs.molpharmaceut.4c00744.