Deep Learning Improves the Ability of SgRNA Off-target Propensity Prediction
Overview
Authors
Affiliations
Background: CRISPR/Cas9 system, as the third-generation genome editing technology, has been widely applied in target gene repair and gene expression regulation. Selection of appropriate sgRNA can improve the on-target knockout efficacy of CRISPR/Cas9 system with high sensitivity and specificity. However, when CRISPR/Cas9 system is operating, unexpected cleavage may occur at some sites, known as off-target. Presently, a number of prediction methods have been developed to predict the off-target propensity of sgRNA at specific DNA fragments. Most of them use artificial feature extraction operations and machine learning techniques to obtain off-target scores. With the rapid expansion of off-target data and the rapid development of deep learning theory, the existing prediction methods can no longer satisfy the prediction accuracy at the clinical level.
Results: Here, we propose a prediction method named CnnCrispr to predict the off-target propensity of sgRNA at specific DNA fragments. CnnCrispr automatically trains the sequence features of sgRNA-DNA pairs with GloVe model, and embeds the trained word vector matrix into the deep learning model including biLSTM and CNN with five hidden layers. We conducted performance verification on the data set provided by DeepCrispr, and found that the auROC and auPRC in the "leave-one-sgRNA-out" cross validation could reach 0.957 and 0.429 respectively (the Pearson value and spearman value could reach 0.495 and 0.151 respectively under the same settings).
Conclusion: Our results show that CnnCrispr has better classification and regression performance than the existing states-of-art models. The code for CnnCrispr can be freely downloaded from https://github.com/LQYoLH/CnnCrispr.
Sari O, Liu Z, Pan Y, Shao X Bioinform Adv. 2025; 5(1):vbae184.
PMID: 39758829 PMC: 11696696. DOI: 10.1093/bioadv/vbae184.
Alipanahi R, Safari L, Khanteymoori A Mol Ther Nucleic Acids. 2024; 35(4):102370.
PMID: 39654539 PMC: 11626815. DOI: 10.1016/j.omtn.2024.102370.
Chakraborty S, Ray Dutta J, Ganesan R, Minary P Methods Mol Biol. 2024; 2847:241-300.
PMID: 39312149 DOI: 10.1007/978-1-0716-4079-1_17.
Learning to quantify uncertainty in off-target activity for CRISPR guide RNAs.
Ozden F, Minary P Nucleic Acids Res. 2024; 52(18):e87.
PMID: 39275984 PMC: 11472043. DOI: 10.1093/nar/gkae759.
Integrating machine learning and genome editing for crop improvement.
Chen L, Liu G, Zhang T aBIOTECH. 2024; 5(2):262-277.
PMID: 38974863 PMC: 11224061. DOI: 10.1007/s42994-023-00133-5.