» Articles » PMID: 39406498

Theoretical Framework for the Difference of Two Negative Binomial Distributions and Its Application in Comparative Analysis of Sequencing Data

Overview
Journal Genome Res
Specialty Genetics
Date 2024 Oct 15
PMID 39406498
Authors
Affiliations
Soon will be listed here.
Abstract

High-throughput sequencing (HTS) technologies have been instrumental in investigating biological questions at the bulk and single-cell levels. Comparative analysis of two HTS data sets often relies on testing the statistical significance for the difference of two negative binomial distributions (DOTNB). Although negative binomial distributions are well studied, the theoretical results for DOTNB remain largely unexplored. Here, we derive basic analytical results for DOTNB and examine its asymptotic properties. As a state-of-the-art application of DOTNB, we introduce DEGage, a computational method for detecting differentially expressed genes (DEGs) in scRNA-seq data. DEGage calculates the mean of the sample-wise differences of gene expression levels as the test statistic and determines significant differential expression by computing the -value with DOTNB. Extensive validation using simulated and real scRNA-seq data sets demonstrates that DEGage outperforms five popular DEG analysis tools: DEGseq2, DEsingle, edgeR, Monocle3, and scDD. DEGage is robust against high dropout levels and exhibits superior sensitivity when applied to balanced and imbalanced data sets, even with small sample sizes. We utilize DEGage to analyze prostate cancer scRNA-seq data sets and identify marker genes for 17 cell types. Furthermore, we apply DEGage to scRNA-seq data sets of mouse neurons with and without fear memory and reveal eight potential memory-related genes overlooked in previous analyses. The theoretical results and supporting software for DOTNB can be widely applied to comparative analyses of dispersed count data in HTS and broad research questions.

Citing Articles

Single-Cell Hi-C Technologies and Computational Data Analysis.

Dautle M, Chen Y Adv Sci (Weinh). 2025; 12(9):e2412232.

PMID: 39887949 PMC: 11884588. DOI: 10.1002/advs.202412232.

References
1.
Aran D, Looney A, Liu L, Wu E, Fong V, Hsu A . Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat Immunol. 2019; 20(2):163-172. PMC: 6340744. DOI: 10.1038/s41590-018-0276-y. View

2.
Oughtred R, Stark C, Breitkreutz B, Rust J, Boucher L, Chang C . The BioGRID interaction database: 2019 update. Nucleic Acids Res. 2018; 47(D1):D529-D541. PMC: 6324058. DOI: 10.1093/nar/gky1079. View

3.
Weinstein J, Collisson E, Mills G, Mills Shaw K, Ozenberger B, Ellrott K . The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013; 45(10):1113-20. PMC: 3919969. DOI: 10.1038/ng.2764. View

4.
Clocchiatti A, Ghosh S, Procopio M, Mazzeo L, Bordignon P, Ostano P . Androgen receptor functions as transcriptional repressor of cancer-associated fibroblast activation. J Clin Invest. 2018; 128(12):5531-5548. PMC: 6264730. DOI: 10.1172/JCI99159. View

5.
Rao-Ruiz P, Couey J, Marcelo I, Bouwkamp C, Slump D, Matos M . Engram-specific transcriptome profiling of contextual memory consolidation. Nat Commun. 2019; 10(1):2232. PMC: 6527697. DOI: 10.1038/s41467-019-09960-x. View