» Articles » PMID: 34608324

Effective Gene Expression Prediction from Sequence by Integrating Long-range Interactions

Overview
Journal Nat Methods
Date 2021 Oct 5
PMID 34608324
Citations 340
Authors
Affiliations
Soon will be listed here.
Abstract

How noncoding DNA determines gene expression in different cell types is a major unsolved problem, and critical downstream applications in human genetics depend on improved solutions. Here, we report substantially improved gene expression prediction accuracy from DNA sequences through the use of a deep learning architecture, called Enformer, that is able to integrate information from long-range interactions (up to 100 kb away) in the genome. This improvement yielded more accurate variant effect predictions on gene expression for both natural genetic variants and saturation mutagenesis measured by massively parallel reporter assays. Furthermore, Enformer learned to predict enhancer-promoter interactions directly from the DNA sequence competitively with methods that take direct experimental data as input. We expect that these advances will enable more effective fine-mapping of human disease associations and provide a framework to interpret cis-regulatory evolution.

Citing Articles

Foundation models in bioinformatics.

Guo F, Guan R, Li Y, Liu Q, Wang X, Yang C Natl Sci Rev. 2025; 12(4):nwaf028.

PMID: 40078374 PMC: 11900445. DOI: 10.1093/nsr/nwaf028.


Precise engineering of gene expression by editing plasticity.

Qiu Y, Liu L, Yan J, Xiang X, Wang S, Luo Y Genome Biol. 2025; 26(1):51.

PMID: 40065399 PMC: 11892124. DOI: 10.1186/s13059-025-03516-7.


Integration of proteomics profiling data to facilitate discovery of cancer neoantigens: a survey.

Luo S, Peng H, Shi Y, Cai J, Zhang S, Shao N Brief Bioinform. 2025; 26(2).

PMID: 40052441 PMC: 11886573. DOI: 10.1093/bib/bbaf087.


iModEst: disentangling -omic impacts on gene expression variation across genes and tissues.

Sokolowski D, Mai M, Verma A, Morgenshtern G, Subasri V, Naveed H NAR Genom Bioinform. 2025; 7(1):lqaf011.

PMID: 40041206 PMC: 11879402. DOI: 10.1093/nargab/lqaf011.


Enhancer reprogramming: critical roles in cancer and promising therapeutic strategies.

Yang J, Zhou F, Luo X, Fang Y, Wang X, Liu X Cell Death Discov. 2025; 11(1):84.

PMID: 40032852 PMC: 11876437. DOI: 10.1038/s41420-025-02366-3.


References
1.
Agarwal V, Shendure J . Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks. Cell Rep. 2020; 31(7):107663. DOI: 10.1016/j.celrep.2020.107663. View

2.
Richter F, Morton S, Kim S, Kitaygorodsky A, Wasson L, Chen K . Genomic analyses implicate noncoding de novo variants in congenital heart disease. Nat Genet. 2020; 52(8):769-777. PMC: 7415662. DOI: 10.1038/s41588-020-0652-z. View

3.
Alipanahi B, Delong A, Weirauch M, Frey B . Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol. 2015; 33(8):831-8. DOI: 10.1038/nbt.3300. View

4.
Fudenberg G, Kelley D, Pollard K . Predicting 3D genome folding from DNA sequence with Akita. Nat Methods. 2020; 17(11):1111-1117. PMC: 8211359. DOI: 10.1038/s41592-020-0958-x. View

5.
Gasperini M, Tome J, Shendure J . Towards a comprehensive catalogue of validated and target-linked human enhancers. Nat Rev Genet. 2020; 21(5):292-310. PMC: 7845138. DOI: 10.1038/s41576-019-0209-0. View