» Articles » PMID: 39283448

CWL-Based Analysis Pipeline for Hi-C Data: From FASTQ Files to Matrices

Overview
Specialty Molecular Biology
Date 2024 Sep 16
PMID 39283448
Authors
Affiliations
Soon will be listed here.
Abstract

Over a decade has passed since the development of the Hi-C method for genome-wide analysis of 3D genome organization. Hi-C utilizes next-generation sequencing (NGS) technology to generate large-scale chromatin interaction data, which has accumulated across a diverse range of species and cell types, particularly in eukaryotes. There is thus a growing need to streamline the process of Hi-C data analysis to utilize these data sets effectively. Hi-C generates data that are much larger compared to other NGS techniques such as chromatin immunoprecipitation sequencing (ChIP-seq) or RNA-seq, making the data reanalysis process computationally expensive. In an effort to bridge this resource gap, the 4D Nucleome (4DN) Data Portal has reanalyzed approximately 600 Hi-C data sets, allowing users to access and utilize the analyzed data. In this chapter, we provide detailed instructions for the implementation of the common workflow language (CWL)-based Hi-C analysis pipeline adopted by the 4DN Data Portal ecosystem. This reproducible and portable pipeline generates standard Hi-C contact matrices in formats such as .hic or .mcool from FASTQ files. It enables users to output their own Hi-C data in the same format as those registered in the 4DN Data portal, facilitating comparative analysis using data registered in the portal. Our custom-made scripts are available on GitHub at https://github.com/kuzobuta/4dn_cwl_pipeline .

References
1.
Ghosh R, Meyer B . Spatial Organization of Chromatin: Emergence of Chromatin Structure During Development. Annu Rev Cell Dev Biol. 2021; 37:199-232. PMC: 8664233. DOI: 10.1146/annurev-cellbio-032321-035734. View

2.
Marchal C, Sima J, Gilbert D . Control of DNA replication timing in the 3D genome. Nat Rev Mol Cell Biol. 2019; 20(12):721-737. PMC: 11567694. DOI: 10.1038/s41580-019-0162-y. View

3.
Stanic M, Mekhail K . Integration of DNA damage responses with dynamic spatial genome organization. Trends Genet. 2021; 38(3):290-304. DOI: 10.1016/j.tig.2021.08.016. View

4.
Lieberman-Aiden E, van Berkum N, Williams L, Imakaev M, Ragoczy T, Telling A . Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009; 326(5950):289-93. PMC: 2858594. DOI: 10.1126/science.1181369. View

5.
Dixon J, Selvaraj S, Yue F, Kim A, Li Y, Shen Y . Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012; 485(7398):376-80. PMC: 3356448. DOI: 10.1038/nature11082. View