Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping
Overview
Affiliations
Segmental duplications and other highly repetitive regions of genomes contribute significantly to cells' regulatory programs. Advancements in next generation sequencing enabled genome-wide profiling of protein-DNA interactions by chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq). However, interactions in highly repetitive regions of genomes have proven difficult to map since short reads of 50-100 base pairs (bps) from these regions map to multiple locations in reference genomes. Standard analytical methods discard such multi-mapping reads and the few that can accommodate them are prone to large false positive and negative rates. We developed Perm-seq, a prior-enhanced read allocation method for ChIP-seq experiments, that can allocate multi-mapping reads in highly repetitive regions of the genomes with high accuracy. We comprehensively evaluated Perm-seq, and found that our prior-enhanced approach significantly improves multi-read allocation accuracy over approaches that do not utilize additional data types. The statistical formalism underlying our approach facilitates supervising of multi-read allocation with a variety of data sources including histone ChIP-seq. We applied Perm-seq to 64 ENCODE ChIP-seq datasets from GM12878 and K562 cells and identified many novel protein-DNA interactions in segmental duplication regions. Our analysis reveals that although the protein-DNA interactions sites are evolutionarily less conserved in repetitive regions, they share the overall sequence characteristics of the protein-DNA interactions in non-repetitive regions.
Accurate allocation of multimapped reads enables regulatory element analysis at repeats.
Morrissey A, Shi J, James D, Mahony S Genome Res. 2024; 34(6):937-951.
PMID: 38986578 PMC: 11293539. DOI: 10.1101/gr.278638.123.
Taming transposable elements in livestock and poultry: a review of their roles and applications.
Zhao P, Peng C, Fang L, Wang Z, Liu G Genet Sel Evol. 2023; 55(1):50.
PMID: 37479995 PMC: 10362595. DOI: 10.1186/s12711-023-00821-2.
Libbrecht M, Chan R, Hoffman M PLoS Comput Biol. 2021; 17(10):e1009423.
PMID: 34648491 PMC: 8516206. DOI: 10.1371/journal.pcbi.1009423.
Sequence deeper without sequencing more: Bayesian resolution of ambiguously mapped reads.
Shah R, Ruthenburg A PLoS Comput Biol. 2021; 17(4):e1008926.
PMID: 33872311 PMC: 8084338. DOI: 10.1371/journal.pcbi.1008926.
Mobile genomics: tools and techniques for tackling transposons.
ONeill K, Brocks D, Gale Hammell M Philos Trans R Soc Lond B Biol Sci. 2020; 375(1795):20190345.
PMID: 32075565 PMC: 7061981. DOI: 10.1098/rstb.2019.0345.