» Articles » PMID: 36552295

Interpretable and Predictive Deep Neural Network Modeling of the SARS-CoV-2 Spike Protein Sequence to Predict COVID-19 Disease Severity

Overview
Journal Biology (Basel)
Publisher MDPI
Specialty Biology
Date 2022 Dec 23
PMID 36552295
Authors
Affiliations
Soon will be listed here.
Abstract

Through the COVID-19 pandemic, SARS-CoV-2 has gained and lost multiple mutations in novel or unexpected combinations. Predicting how complex mutations affect COVID-19 disease severity is critical in planning public health responses as the virus continues to evolve. This paper presents a novel computational framework to complement conventional lineage classification and applies it to predict the severe disease potential of viral genetic variation. The transformer-based neural network model architecture has additional layers that provide sample embeddings and sequence-wide attention for interpretation and visualization. First, training a model to predict SARS-CoV-2 taxonomy validates the architecture's interpretability. Second, an interpretable predictive model of disease severity is trained on spike protein sequence and patient metadata from GISAID. Confounding effects of changing patient demographics, increasing vaccination rates, and improving treatment over time are addressed by including demographics and case date as independent input to the neural network model. The resulting model can be interpreted to identify potentially significant virus mutations and proves to be a robust predctive tool. Although trained on sequence data obtained entirely before the availability of empirical data for Omicron, the model can predict the Omicron's reduced risk of severe disease, in accord with epidemiological and experimental data.

Citing Articles

Explainable artificial intelligence for omics data: a systematic mapping study.

Toussaint P, Leiser F, Thiebes S, Schlesner M, Brors B, Sunyaev A Brief Bioinform. 2023; 25(1).

PMID: 38113073 PMC: 10729786. DOI: 10.1093/bib/bbad453.


An Epidemiological Analysis for Assessing and Evaluating COVID-19 Based on Data Analytics in Latin American Countries.

Leiva V, Alcudia E, Montano J, Castro C Biology (Basel). 2023; 12(6).

PMID: 37372171 PMC: 10295742. DOI: 10.3390/biology12060887.


CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning.

Serna Garcia G, Al Khalaf R, Invernici F, Ceri S, Bernasconi A Gigascience. 2023; 12.

PMID: 37222749 PMC: 10205000. DOI: 10.1093/gigascience/giad036.


Predicting COVID-19 disease severity from SARS-CoV-2 spike protein sequence by mixed effects machine learning.

Sokhansanj B, Rosen G Comput Biol Med. 2022; 149:105969.

PMID: 36041271 PMC: 9384346. DOI: 10.1016/j.compbiomed.2022.105969.


Mapping Data to Deep Understanding: Making the Most of the Deluge of SARS-CoV-2 Genome Sequences.

Sokhansanj B, Rosen G mSystems. 2022; 7(2):e0003522.

PMID: 35311562 PMC: 9040592. DOI: 10.1128/msystems.00035-22.

References
1.
Schriml L, Chuvochina M, Davies N, Eloe-Fadrosh E, Finn R, Hugenholtz P . COVID-19 pandemic reveals the peril of ignoring metadata standards. Sci Data. 2020; 7(1):188. PMC: 7305141. DOI: 10.1038/s41597-020-0524-5. View

2.
Tao K, Tzou P, Nouhin J, Gupta R, de Oliveira T, Kosakovsky Pond S . The biological and clinical significance of emerging SARS-CoV-2 variants. Nat Rev Genet. 2021; 22(12):757-773. PMC: 8447121. DOI: 10.1038/s41576-021-00408-x. View

3.
ValizadehAslani T, Zhao Z, Sokhansanj B, Rosen G . Amino Acid -mer Feature Extraction for Quantitative Antimicrobial Resistance (AMR) Prediction by Machine Learning and Model Interpretation for Biological Insights. Biology (Basel). 2020; 9(11). PMC: 7694136. DOI: 10.3390/biology9110365. View

4.
Wang J, Gribskov M . IRESpy: an XGBoost model for prediction of internal ribosome entry sites. BMC Bioinformatics. 2019; 20(1):409. PMC: 6664791. DOI: 10.1186/s12859-019-2999-7. View

5.
Frampton D, Rampling T, Cross A, Bailey H, Heaney J, Byott M . Genomic characteristics and clinical effect of the emergent SARS-CoV-2 B.1.1.7 lineage in London, UK: a whole-genome sequencing and hospital-based cohort study. Lancet Infect Dis. 2021; 21(9):1246-1256. PMC: 8041359. DOI: 10.1016/S1473-3099(21)00170-5. View