» Articles » PMID: 38950902

Machine Learning Enables Pan-cancer Identification of Mutational Hotspots at Persistent CTCF Binding Sites

Overview
Specialty Biochemistry
Date 2024 Jul 1
PMID 38950902
Authors
Affiliations
Soon will be listed here.
Abstract

CCCTC-binding factor (CTCF) is an insulator protein that binds to a highly conserved DNA motif and facilitates regulation of three-dimensional (3D) nuclear architecture and transcription. CTCF binding sites (CTCF-BSs) reside in non-coding DNA and are frequently mutated in cancer. Our previous study identified a small subclass of CTCF-BSs that are resistant to CTCF knock down, termed persistent CTCF binding sites (P-CTCF-BSs). P-CTCF-BSs show high binding conservation and potentially regulate cell-type constitutive 3D chromatin architecture. Here, using ICGC sequencing data we made the striking observation that P-CTCF-BSs display a highly elevated mutation rate in breast and prostate cancer when compared to all CTCF-BSs. To address whether P-CTCF-BS mutations are also enriched in other cell-types, we developed CTCF-INSITE-a tool utilising machine learning to predict persistence based on genetic and epigenetic features of experimentally-determined P-CTCF-BSs. Notably, predicted P-CTCF-BSs also show a significantly elevated mutational burden in all 12 cancer-types tested. Enrichment was even stronger for P-CTCF-BS mutations with predicted functional impact to CTCF binding and chromatin looping. Using in vitro binding assays we validated that P-CTCF-BS cancer mutations, predicted to be disruptive, indeed reduced CTCF binding. Together this study reveals a new subclass of cancer specific CTCF-BS DNA mutations and provides insights into their importance in genome organization in a pan-cancer setting.

References
1.
Chen H, Tian Y, Shu W, Bo X, Wang S . Comprehensive identification and annotation of cell type-specific and ubiquitous CTCF-binding sites in the human genome. PLoS One. 2012; 7(7):e41374. PMC: 3400636. DOI: 10.1371/journal.pone.0041374. View

2.
Alipanahi B, Delong A, Weirauch M, Frey B . Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol. 2015; 33(8):831-8. DOI: 10.1038/nbt.3300. View

3.
Lupianez D, Kraft K, Heinrich V, Krawitz P, Brancati F, Klopocki E . Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell. 2015; 161(5):1012-1025. PMC: 4791538. DOI: 10.1016/j.cell.2015.04.004. View

4.
Karolchik D, Hinrichs A, Furey T, Roskin K, Sugnet C, Haussler D . The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 2003; 32(Database issue):D493-6. PMC: 308837. DOI: 10.1093/nar/gkh103. View

5.
Bell A, Felsenfeld G . Methylation of a CTCF-dependent boundary controls imprinted expression of the Igf2 gene. Nature. 2000; 405(6785):482-5. DOI: 10.1038/35013100. View