High-throughput SELEX SAGE Method for Quantitative Modeling of Transcription-factor Binding Sites

Overview

Journal Nat Biotechnol

Specialty Biotechnology

Date 2002 Jul 9

PMID 12101405

Citations 92

Authors

Emmanuelle Roulet

Stephane Busso

Anamaria A Camargo

Andrew J G Simpson

Nicolas Mermod

Philipp Bucher

Affiliations

Soon will be listed here.

Abstract

The ability to determine the location and relative strength of all transcription-factor binding sites in a genome is important both for a comprehensive understanding of gene regulation and for effective promoter engineering in biotechnological applications. Here we present a bioinformatically driven experimental method to accurately define the DNA-binding sequence specificity of transcription factors. A generalized profile was used as a predictive quantitative model for binding sites, and its parameters were estimated from in vitro-selected ligands using standard hidden Markov model training algorithms. Computer simulations showed that several thousand low- to medium-affinity sequences are required to generate a profile of desired accuracy. To produce data on this scale, we applied high-throughput genomics methods to the biochemical problem addressed here. A method combining systematic evolution of ligands by exponential enrichment (SELEX) and serial analysis of gene expression (SAGE) protocols was coupled to an automated quality-controlled sequence extraction procedure based on Phred quality scores. This allowed the sequencing of a database of more than 10,000 potential DNA ligands for the CTF/NFI transcription factor. The resulting binding-site model defines the sequence specificity of this protein with a high degree of accuracy not achieved earlier and thereby makes it possible to identify previously unknown regulatory sequences in genomic DNA. A covariance analysis of the selected sites revealed non-independent base preferences at different nucleotide positions, providing insight into the binding mechanism.

Citing Articles

TF2TG: an online resource mining the potential gene targets of transcription factors in .

Hu Y, Rodiger J, Liu Y, Gao C, Liu Y, Qadiri M bioRxiv. 2025; .

PMID: 39990429 PMC: 11844531. DOI: 10.1101/2025.02.13.638157.

Structure-based learning to predict and model protein-DNA interactions and transcription-factor co-operativity in -regulatory elements.

Oriol F, Alberto M, Joachim A, Patrick G, M B, Ruben M NAR Genom Bioinform. 2024; 6(2):lqae068.

PMID: 38867914 PMC: 11167492. DOI: 10.1093/nargab/lqae068.

JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles.

Rauluseviciute I, Riudavets-Puig R, Blanc-Mathieu R, Castro-Mondragon J, Ferenc K, Kumar V Nucleic Acids Res. 2023; 52(D1):D174-D182.

PMID: 37962376 PMC: 10767809. DOI: 10.1093/nar/gkad1059.

A capsule network-based method for identifying transcription factors.

Zheng P, Qi Y, Li X, Liu Y, Yao Y, Huang G Front Microbiol. 2022; 13:1048478.

PMID: 36560938 PMC: 9763301. DOI: 10.3389/fmicb.2022.1048478.

Protein-RNA interaction prediction with deep learning: structure matters.

Wei J, Chen S, Zong L, Gao X, Li Y Brief Bioinform. 2021; 23(1).

PMID: 34929730 PMC: 8790951. DOI: 10.1093/bib/bbab540.