» Articles » PMID: 37990143

Improved Quality Metrics for Association and Reproducibility in Chromatin Accessibility Data Using Mutual Information

Overview
Publisher Biomed Central
Specialty Biology
Date 2023 Nov 22
PMID 37990143
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Correlation metrics are widely utilized in genomics analysis and often implemented with little regard to assumptions of normality, homoscedasticity, and independence of values. This is especially true when comparing values between replicated sequencing experiments that probe chromatin accessibility, such as assays for transposase-accessible chromatin via sequencing (ATAC-seq). Such data can possess several regions across the human genome with little to no sequencing depth and are thus non-normal with a large portion of zero values. Despite distributed use in the epigenomics field, few studies have evaluated and benchmarked how correlation and association statistics behave across ATAC-seq experiments with known differences or the effects of removing specific outliers from the data. Here, we developed a computational simulation of ATAC-seq data to elucidate the behavior of correlation statistics and to compare their accuracy under set conditions of reproducibility.

Results: Using these simulations, we monitored the behavior of several correlation statistics, including the Pearson's R and Spearman's [Formula: see text] coefficients as well as Kendall's [Formula: see text] and Top-Down correlation. We also test the behavior of association measures, including the coefficient of determination R[Formula: see text], Kendall's W, and normalized mutual information. Our experiments reveal an insensitivity of most statistics, including Spearman's [Formula: see text], Kendall's [Formula: see text], and Kendall's W, to increasing differences between simulated ATAC-seq replicates. The removal of co-zeros (regions lacking mapped sequenced reads) between simulated experiments greatly improves the estimates of correlation and association. After removing co-zeros, the R[Formula: see text] coefficient and normalized mutual information display the best performance, having a closer one-to-one relationship with the known portion of shared, enhanced loci between simulated replicates. When comparing values between experimental ATAC-seq data using a random forest model, mutual information best predicts ATAC-seq replicate relationships.

Conclusions: Collectively, this study demonstrates how measures of correlation and association can behave in epigenomics experiments. We provide improved strategies for quantifying relationships in these increasingly prevalent and important chromatin accessibility assays.

Citing Articles

Multi-omics analysis reveals the dynamic interplay between Vero host chromatin structure and function during vaccinia virus infection.

Venu V, Roth C, Adikari S, Small E, Starkenburg S, Sanbonmatsu K Commun Biol. 2024; 7(1):721.

PMID: 38862613 PMC: 11166932. DOI: 10.1038/s42003-024-06389-x.


Human Coronavirus Infection Reorganizes Spatial Genomic Architecture in Permissive Lung Cells.

Singhal A, Roth C, Micheva-Viteva S, Venu V, Lappala A, Lee J Res Sq. 2024; .

PMID: 38559036 PMC: 10980144. DOI: 10.21203/rs.3.rs-3979539/v1.

References
1.
Xu Y, Das P, McCord R . SMILE: mutual information learning for integration of single-cell omics data. Bioinformatics. 2021; 38(2):476-486. PMC: 10060712. DOI: 10.1093/bioinformatics/btab706. View

2.
Silverman J, Roche K, Mukherjee S, David L . Naught all zeros in sequence count data are the same. Comput Struct Biotechnol J. 2020; 18:2789-2798. PMC: 7568192. DOI: 10.1016/j.csbj.2020.09.014. View

3.
Feng J, Liu T, Qin B, Zhang Y, Liu X . Identifying ChIP-seq enrichment using MACS. Nat Protoc. 2012; 7(9):1728-40. PMC: 3868217. DOI: 10.1038/nprot.2012.101. View

4.
Love M, Huber W, Anders S . Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014; 15(12):550. PMC: 4302049. DOI: 10.1186/s13059-014-0550-8. View

5.
Park P . ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet. 2009; 10(10):669-80. PMC: 3191340. DOI: 10.1038/nrg2641. View