» Articles » PMID: 37983292

Identifying Promoter Sequence Architectures Via a Chunking-based Algorithm Using Non-negative Matrix Factorisation

Overview
Specialty Biology
Date 2023 Nov 20
PMID 37983292
Authors
Affiliations
Soon will be listed here.
Abstract

Core promoters are stretches of DNA at the beginning of genes that contain information that facilitates the binding of transcription initiation complexes. Different functional subsets of genes have core promoters with distinct architectures and characteristic motifs. Some of these motifs inform the selection of transcription start sites (TSS). By discovering motifs with fixed distances from known TSS positions, we could in principle classify promoters into different functional groups. Due to the variability and overlap of architectures, promoter classification is a difficult task that requires new approaches. In this study, we present a new method based on non-negative matrix factorisation (NMF) and the associated software called seqArchR that clusters promoter sequences based on their motifs at near-fixed distances from a reference point, such as TSS. When combined with experimental data from CAGE, seqArchR can efficiently identify TSS-directing motifs, including known ones like TATA, DPE, and nucleosome positioning signal, as well as novel lineage-specific motifs and the function of genes associated with them. By using seqArchR on developmental time courses, we reveal how relative use of promoter architectures changes over time with stage-specific expression. seqArchR is a powerful tool for initial genome-wide classification and functional characterisation of promoters. Its use cases are more general: it can also be used to discover any motifs at near-fixed distances from a reference point, even if they are present in only a small subset of sequences.

Citing Articles

Identification of transcription factor co-binding patterns with non-negative matrix factorization.

Rauluseviciute I, Launay T, Barzaghi G, Nikumbh S, Lenhard B, Krebs A Nucleic Acids Res. 2024; 52(18):e85.

PMID: 39217462 PMC: 11472169. DOI: 10.1093/nar/gkae743.


Core promoterome of barley embryo.

Pavlu S, Nikumbh S, Kovacik M, An T, Lenhard B, Simkova H Comput Struct Biotechnol J. 2024; 23:264-277.

PMID: 38173877 PMC: 10762323. DOI: 10.1016/j.csbj.2023.12.003.

References
1.
Haberle V, Lenhard B . Promoter architectures and developmental gene regulation. Semin Cell Dev Biol. 2016; 57:11-23. DOI: 10.1016/j.semcdb.2016.01.014. View

2.
Hutchins L, Murphy S, Singh P, Graber J . Position-dependent motif characterization using non-negative matrix factorization. Bioinformatics. 2008; 24(23):2684-90. PMC: 2639279. DOI: 10.1093/bioinformatics/btn526. View

3.
Yoshihama M, Uechi T, Asakawa S, Kawasaki K, Kato S, Higa S . The human ribosomal protein genes: sequencing and comparative analysis of 73 genes. Genome Res. 2002; 12(3):379-90. PMC: 155282. DOI: 10.1101/gr.214202. View

4.
Grishkevich V, Hashimshony T, Yanai I . Core promoter T-blocks correlate with gene expression levels in C. elegans. Genome Res. 2011; 21(5):707-17. PMC: 3083087. DOI: 10.1101/gr.113381.110. View

5.
Kwak H, Fuda N, Core L, Lis J . Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science. 2013; 339(6122):950-3. PMC: 3974810. DOI: 10.1126/science.1229386. View