» Articles » PMID: 33262328

Deep Learning Suggests That Gene Expression is Encoded in All Parts of a Co-evolving Interacting Gene Regulatory Structure

Overview
Journal Nat Commun
Specialty Biology
Date 2020 Dec 2
PMID 33262328
Citations 62
Authors
Affiliations
Soon will be listed here.
Abstract

Understanding the genetic regulatory code governing gene expression is an important challenge in molecular biology. However, how individual coding and non-coding regions of the gene regulatory structure interact and contribute to mRNA expression levels remains unclear. Here we apply deep learning on over 20,000 mRNA datasets to examine the genetic regulatory code controlling mRNA abundance in 7 model organisms ranging from bacteria to Human. In all organisms, we can predict mRNA abundance directly from DNA sequence, with up to 82% of the variation of transcript levels encoded in the gene regulatory structure. By searching for DNA regulatory motifs across the gene regulatory structure, we discover that motif interactions could explain the whole dynamic range of mRNA levels. Co-evolution across coding and non-coding regions suggests that it is not single motifs or regions, but the entire gene regulatory structure and specific combination of regulatory elements that define gene expression levels.

Citing Articles

Precise engineering of gene expression by editing plasticity.

Qiu Y, Liu L, Yan J, Xiang X, Wang S, Luo Y Genome Biol. 2025; 26(1):51.

PMID: 40065399 PMC: 11892124. DOI: 10.1186/s13059-025-03516-7.


Inferring protein from transcript abundances using convolutional neural networks.

Schwehn P, Falter-Braun P BioData Min. 2025; 18(1):18.

PMID: 40016737 PMC: 11866710. DOI: 10.1186/s13040-025-00434-z.


Chromatin enables precise and scalable gene regulation with factors of limited specificity.

Perkins M, Crocker J, Tkacik G Proc Natl Acad Sci U S A. 2025; 122(1):e2411887121.

PMID: 39793086 PMC: 11725945. DOI: 10.1073/pnas.2411887121.


Amino acid sequence encodes protein abundance shaped by protein stability at reduced synthesis cost.

Buric F, Viknander S, Fu X, Lemke O, Carmona O, Zrimec J Protein Sci. 2024; 34(1):e5239.

PMID: 39665261 PMC: 11635393. DOI: 10.1002/pro.5239.


Deep learning approaches for non-coding genetic variant effect prediction: current progress and future prospects.

Wang X, Li F, Zhang Y, Imoto S, Shen H, Li S Brief Bioinform. 2024; 25(5).

PMID: 39276327 PMC: 11401448. DOI: 10.1093/bib/bbae446.


References
1.
Agarwal V, Shendure J . Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks. Cell Rep. 2020; 31(7):107663. DOI: 10.1016/j.celrep.2020.107663. View

2.
Ludwig M, Bergman C, Patel N, Kreitman M . Evidence for stabilizing selection in a eukaryotic enhancer element. Nature. 2000; 403(6769):564-7. DOI: 10.1038/35000615. View

3.
Trabelsi A, Chaabane M, Ben-Hur A . Comprehensive evaluation of deep learning architectures for prediction of DNA/RNA sequence binding specificities. Bioinformatics. 2019; 35(14):i269-i277. PMC: 6612801. DOI: 10.1093/bioinformatics/btz339. View

4.
Subramanian S, Kumar S . Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome. Genetics. 2004; 168(1):373-81. PMC: 1448110. DOI: 10.1534/genetics.104.028944. View

5.
Fu L, Niu B, Zhu Z, Wu S, Li W . CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012; 28(23):3150-2. PMC: 3516142. DOI: 10.1093/bioinformatics/bts565. View