ScDiffusion: Conditional Generation of High-quality Single-cell Data Using Diffusion Model
Overview
Authors
Affiliations
Motivation: Single-cell RNA sequencing (scRNA-seq) data are important for studying the laws of life at single-cell level. However, it is still challenging to obtain enough high-quality scRNA-seq data. To mitigate the limited availability of data, generative models have been proposed to computationally generate synthetic scRNA-seq data. Nevertheless, the data generated with current models are not very realistic yet, especially when we need to generate data with controlled conditions. In the meantime, diffusion models have shown their power in generating data with high fidelity, providing a new opportunity for scRNA-seq generation.
Results: In this study, we developed scDiffusion, a generative model combining the diffusion model and foundation model to generate high-quality scRNA-seq data with controlled conditions. We designed multiple classifiers to guide the diffusion process simultaneously, enabling scDiffusion to generate data under multiple condition combinations. We also proposed a new control strategy called Gradient Interpolation. This strategy allows the model to generate continuous trajectories of cell development from a given cell state. Experiments showed that scDiffusion could generate single-cell gene expression data closely resembling real scRNA-seq data. Also, scDiffusion can conditionally produce data on specific cell types including rare cell types. Furthermore, we could use the multiple-condition generation of scDiffusion to generate cell type that was out of the training data. Leveraging the Gradient Interpolation strategy, we generated a continuous developmental trajectory of mouse embryonic cells. These experiments demonstrate that scDiffusion is a powerful tool for augmenting the real scRNA-seq data and can provide insights into cell fate research.
Availability And Implementation: scDiffusion is openly available at the GitHub repository https://github.com/EperLuo/scDiffusion or Zenodo https://zenodo.org/doi/10.5281/zenodo.13268742.
Zhang T, Zhao Z, Ren J, Zhang Z, Zhang H, Wang G Brief Bioinform. 2025; 26(1).
PMID: 39987461 PMC: 11846686. DOI: 10.1093/bib/bbaf071.
Single-cell RNA-seq data augmentation using generative Fourier transformer.
Nouri N Commun Biol. 2025; 8(1):113.
PMID: 39843603 PMC: 11754799. DOI: 10.1038/s42003-025-07552-8.
Linking transcriptome and morphology in bone cells at cellular resolution with generative AI.
Lu L, Ono N, Welch J J Bone Miner Res. 2024; 40(1):20-26.
PMID: 39303095 PMC: 11700600. DOI: 10.1093/jbmr/zjae151.