» Articles » PMID: 15084257

Predicting Gene Expression from Sequence

Overview
Journal Cell
Publisher Cell Press
Specialty Cell Biology
Date 2004 Apr 16
PMID 15084257
Citations 293
Authors
Affiliations
Soon will be listed here.
Abstract

We describe a systematic genome-wide approach for learning the complex combinatorial code underlying gene expression. Our probabilistic approach identifies local DNA-sequence elements and the positional and combinatorial constraints that determine their context-dependent role in transcriptional regulation. The inferred regulatory rules correctly predict expression patterns for 73% of genes in Saccharomyces cerevisiae, utilizing microarray expression data and sequences in the 800 bp upstream of genes. Application to Caenorhabditis elegans identifies predictive regulatory elements and combinatorial rules that control the phased temporal expression of transcription factors, histones, and germline specific genes. Successful prediction requires diverse and complex rules utilizing AND, OR, and NOT logic, with significant constraints on motif strength, orientation, and relative position. This system generates a large number of mechanistic hypotheses for focused experimental validation, and establishes a predictive dynamical framework for understanding cellular behavior from genomic sequence.

Citing Articles

iModEst: disentangling -omic impacts on gene expression variation across genes and tissues.

Sokolowski D, Mai M, Verma A, Morgenshtern G, Subasri V, Naveed H NAR Genom Bioinform. 2025; 7(1):lqaf011.

PMID: 40041206 PMC: 11879402. DOI: 10.1093/nargab/lqaf011.


Comprehensive identification of GASA genes in sunflower and expression profiling in response to drought.

Asad Ullah M, Ahmed M, AlHusnain L, Zia M, AlKahtani M, Attia K BMC Genomics. 2024; 25(1):954.

PMID: 39402437 PMC: 11472593. DOI: 10.1186/s12864-024-10860-8.


Big data and deep learning for RNA biology.

Hwang H, Jeon H, Yeo N, Baek D Exp Mol Med. 2024; 56(6):1293-1321.

PMID: 38871816 PMC: 11263376. DOI: 10.1038/s12276-024-01243-w.


DeepCBA: A deep learning framework for gene expression prediction in maize based on DNA sequences and chromatin interactions.

Wang Z, Peng Y, Li J, Li J, Yuan H, Yang S Plant Commun. 2024; 5(9):100985.

PMID: 38859587 PMC: 11413363. DOI: 10.1016/j.xplc.2024.100985.


Predicting gene expression state and prioritizing putative enhancers using 5hmC signal.

Gonzalez-Avalos E, Onodera A, Samaniego-Castruita D, Rao A, Ay F Genome Biol. 2024; 25(1):142.

PMID: 38825692 PMC: 11145787. DOI: 10.1186/s13059-024-03273-z.