» Articles » PMID: 22955618

An Expansive Human Regulatory Lexicon Encoded in Transcription Factor Footprints

Abstract

Regulatory factor binding to genomic DNA protects the underlying sequence from cleavage by DNase I, leaving nucleotide-resolution footprints. Using genomic DNase I footprinting across 41 diverse cell and tissue types, we detected 45 million transcription factor occupancy events within regulatory regions, representing differential binding to 8.4 million distinct short sequence elements. Here we show that this small genomic sequence compartment, roughly twice the size of the exome, encodes an expansive repertoire of conserved recognition sequences for DNA-binding proteins that nearly doubles the size of the human cis-regulatory lexicon. We find that genetic variants affecting allelic chromatin states are concentrated in footprints, and that these elements are preferentially sheltered from DNA methylation. High-resolution DNase I cleavage patterns mirror nucleotide-level evolutionary conservation and track the crystallographic topography of protein-DNA interfaces, indicating that transcription factor structure has been evolutionarily imprinted on the human genome sequence. We identify a stereotyped 50-base-pair footprint that precisely defines the site of transcript origination within thousands of human promoters. Finally, we describe a large collection of novel regulatory factor recognition motifs that are highly conserved in both sequence and function, and exhibit cell-selective occupancy patterns that closely parallel major regulators of development, differentiation and pluripotency.

Citing Articles

ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants.

Pampari A, Shcherbina A, Kvon E, Kosicki M, Nair S, Kundu S bioRxiv. 2025; .

PMID: 39829783 PMC: 11741299. DOI: 10.1101/2024.12.25.630221.


Single chromatin fiber profiling and nucleosome position mapping in the human brain.

Peter C, Agarwal A, Watanabe R, Kassim B, Wang X, Lambert T Cell Rep Methods. 2024; 4(12):100911.

PMID: 39631398 PMC: 11704683. DOI: 10.1016/j.crmeth.2024.100911.


Transcription Factor-Wide Association Studies to Identify Functional SNPs in Alzheimer's Disease.

Dunn J, Moore C, Kim N, Gao T, Cheng Z, Jin P J Neurosci. 2024; 45(2.

PMID: 39622643 PMC: 11714347. DOI: 10.1523/JNEUROSCI.1800-24.2024.


KAS-ATAC reveals the genome-wide single-stranded accessible chromatin landscape of the human genome.

Kim S, Marinov G, Greenleaf W Genome Res. 2024; 35(1):124-134.

PMID: 39572230 PMC: 11789636. DOI: 10.1101/gr.279621.124.


BPDCN MYB fusions regulate cell cycle genes, impair differentiation, and induce myeloid-dendritic cell leukemia.

Booth C, Bouyssou J, Togami K, Armand O, Rivas H, Yan K JCI Insight. 2024; 9(24).

PMID: 39499902 PMC: 11665559. DOI: 10.1172/jci.insight.183889.


References
1.
Grosveld F, Van Assendelft G, Greaves D, Kollias G . Position-independent, high-level expression of the human beta-globin gene in transgenic mice. Cell. 1987; 51(6):975-85. DOI: 10.1016/0092-8674(87)90584-8. View

2.
Biddie S, John S, Sabo P, Thurman R, Johnson T, Schiltz R . Transcription factor AP1 potentiates chromatin accessibility and glucocorticoid receptor binding. Mol Cell. 2011; 43(1):145-55. PMC: 3138120. DOI: 10.1016/j.molcel.2011.06.016. View

3.
Yun K, Wold B . Skeletal muscle determination and differentiation: story of a core regulatory network and its context. Curr Opin Cell Biol. 1996; 8(6):877-89. DOI: 10.1016/s0955-0674(96)80091-3. View

4.
Gilbert W, Muller-Hill B . Isolation of the lac repressor. Proc Natl Acad Sci U S A. 1966; 56(6):1891-8. PMC: 220206. DOI: 10.1073/pnas.56.6.1891. View

5.
Vaquerizas J, Kummerfeld S, Teichmann S, Luscombe N . A census of human transcription factors: function, expression and evolution. Nat Rev Genet. 2009; 10(4):252-63. DOI: 10.1038/nrg2538. View