» Articles » PMID: 34792168

From Genotype to Phenotype in Arabidopsis Thaliana: In-silico Genome Interpretation Predicts 288 Phenotypes from Sequencing Data

Overview
Specialty Biochemistry
Date 2021 Nov 18
PMID 34792168
Citations 5
Authors
Affiliations
Soon will be listed here.
Abstract

In many cases, the unprecedented availability of data provided by high-throughput sequencing has shifted the bottleneck from a data availability issue to a data interpretation issue, thus delaying the promised breakthroughs in genetics and precision medicine, for what concerns Human genetics, and phenotype prediction to improve plant adaptation to climate change and resistance to bioagressors, for what concerns plant sciences. In this paper, we propose a novel Genome Interpretation paradigm, which aims at directly modeling the genotype-to-phenotype relationship, and we focus on A. thaliana since it is the best studied model organism in plant genetics. Our model, called Galiana, is the first end-to-end Neural Network (NN) approach following the genomes in/phenotypes out paradigm and it is trained to predict 288 real-valued Arabidopsis thaliana phenotypes from Whole Genome sequencing data. We show that 75 of these phenotypes are predicted with a Pearson correlation ≥0.4, and are mostly related to flowering traits. We show that our end-to-end NN approach achieves better performances and larger phenotype coverage than models predicting single phenotypes from the GWAS-derived known associated genes. Galiana is also fully interpretable, thanks to the Saliency Maps gradient-based approaches. We followed this interpretation approach to identify 36 novel genes that are likely to be associated with flowering traits, finding evidence for 6 of them in the existing literature.

Citing Articles

A Feature Engineering Method for Whole-Genome DNA Sequence with Nucleotide Resolution.

Wang T, Cui Y, Sun T, Li H, Wang C, Hou Y Int J Mol Sci. 2025; 26(5).

PMID: 40076901 PMC: 11899767. DOI: 10.3390/ijms26052281.


Comparison of machine learning methods for genomic prediction of selected Arabidopsis thaliana traits.

Kelly C, McLaughlin R PLoS One. 2024; 19(8):e0308962.

PMID: 39196916 PMC: 11355539. DOI: 10.1371/journal.pone.0308962.


Biologically meaningful genome interpretation models to address data underdetermination for the leaf and seed ionome prediction in Arabidopsis thaliana.

Raimondi D, Passemiers A, Verplaetse N, Corso M, Ferrero-Serrano A, Nazzicari N Sci Rep. 2024; 14(1):13188.

PMID: 38851759 PMC: 11162433. DOI: 10.1038/s41598-024-63855-6.


Genome interpretation in a federated learning context allows the multi-center exome-based risk prediction of Crohn's disease patients.

Raimondi D, Chizari H, Verplaetse N, Loscher B, Franke A, Moreau Y Sci Rep. 2023; 13(1):19449.

PMID: 37945674 PMC: 10636050. DOI: 10.1038/s41598-023-46887-2.


Large sample size and nonlinear sparse models outline epistatic effects in inflammatory bowel disease.

Verplaetse N, Passemiers A, Arany A, Moreau Y, Raimondi D Genome Biol. 2023; 24(1):224.

PMID: 37798735 PMC: 10552306. DOI: 10.1186/s13059-023-03064-y.


References
1.
Maldonado C, Mora-Poblete F, Contreras-Soto R, Ahmar S, Chen J, Teixeira do Amaral Junior A . Genome-Wide Prediction of Complex Traits in Two Outcrossing Plant Species Through Deep Learning and Bayesian Regularized Neural Network. Front Plant Sci. 2020; 11:593897. PMC: 7728740. DOI: 10.3389/fpls.2020.593897. View

2.
Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z . GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics. 2009; 10:48. PMC: 2644678. DOI: 10.1186/1471-2105-10-48. View

3.
. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature. 2010; 463(7282):763-8. DOI: 10.1038/nature08747. View

4.
Meinke D, Cherry J, Dean C, Rounsley S, Koornneef M . Arabidopsis thaliana: a model plant for genome analysis. Science. 1998; 282(5389):662, 679-82. DOI: 10.1126/science.282.5389.662. View

5.
Giakountis A, Cremer F, Sim S, Reymond M, Schmitt J, Coupland G . Distinct patterns of genetic variation alter flowering responses of Arabidopsis accessions to different daylengths. Plant Physiol. 2009; 152(1):177-91. PMC: 2799355. DOI: 10.1104/pp.109.140772. View