» Articles » PMID: 36981431

NaRnEA: An Information Theoretic Framework for Gene Set Analysis

Overview
Journal Entropy (Basel)
Publisher MDPI
Date 2023 Mar 29
PMID 36981431
Authors
Affiliations
Soon will be listed here.
Abstract

Gene sets are being increasingly leveraged to make high-level biological inferences from transcriptomic data; however, existing gene set analysis methods rely on overly conservative, heuristic approaches for quantifying the statistical significance of gene set enrichment. We created Nonparametric analytical-Rank-based Enrichment Analysis (NaRnEA) to facilitate accurate and robust gene set analysis with an optimal null model derived using the information theoretic Principle of Maximum Entropy. By measuring the differential activity of ~2500 transcriptional regulatory proteins based on the differential expression of each protein's transcriptional targets between primary tumors and normal tissue samples in three cohorts from The Cancer Genome Atlas (TCGA), we demonstrate that NaRnEA critically improves in two widely used gene set analysis methods: Gene Set Enrichment Analysis (GSEA) and analytical-Rank-based Enrichment Analysis (aREA). We show that the NaRnEA-inferred differential protein activity is significantly correlated with differential protein abundance inferred from independent, phenotype-matched mass spectrometry data in the Clinical Proteomic Tumor Analysis Consortium (CPTAC), confirming the statistical and biological accuracy of our approach. Additionally, our analysis crucially demonstrates that the sample-shuffling empirical null models leveraged by GSEA and aREA for gene set analysis are overly conservative, a shortcoming that is avoided by the newly developed Maximum Entropy analytical null model employed by NaRnEA.

Citing Articles

Cross-species regulatory network analysis identifies FOXO1 as a driver of ovarian follicular recruitment.

Kramer A, Berral-Gonzalez A, Ellwood K, Ding S, De Las Rivas J, Dutta A Sci Rep. 2024; 14(1):30787.

PMID: 39730395 PMC: 11680958. DOI: 10.1038/s41598-024-80003-2.


Genome-wide studies define new genetic mechanisms of IgA vasculitis.

Liu L, Zhu L, Monteiro-Martins S, Griffin A, Vlahos L, Fujita M medRxiv. 2024; .

PMID: 39417133 PMC: 11482997. DOI: 10.1101/2024.10.10.24315041.


Tumor Explants Elucidate a Cascade of Paracrine SHH, WNT, and VEGF Signals Driving Pancreatic Cancer Angiosuppression.

Hasselluhn M, Decker-Farrell A, Vlahos L, Thomas D, Curiel-Garcia A, Maurer H Cancer Discov. 2023; 14(2):348-361.

PMID: 37966260 PMC: 10922937. DOI: 10.1158/2159-8290.CD-23-0240.

References
1.
Shen Y, Alvarez M, Bisikirska B, Lachmann A, Realubit R, Pampou S . Systematic, network-based characterization of therapeutic target inhibitors. PLoS Comput Biol. 2017; 13(10):e1005599. PMC: 5638208. DOI: 10.1371/journal.pcbi.1005599. View

2.
Croft D, Mundo A, Haw R, Milacic M, Weiser J, Wu G . The Reactome pathway knowledgebase. Nucleic Acids Res. 2013; 42(Database issue):D472-7. PMC: 3965010. DOI: 10.1093/nar/gkt1102. View

3.
Tamayo P, Steinhardt G, Liberzon A, Mesirov J . The limitations of simple gene set enrichment analysis assuming gene independence. Stat Methods Med Res. 2012; 25(1):472-87. PMC: 3758419. DOI: 10.1177/0962280212460441. View

4.
Margolin A, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R . ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2006; 7 Suppl 1:S7. PMC: 1810318. DOI: 10.1186/1471-2105-7-S1-S7. View

5.
Paull E, Aytes A, Jones S, Subramaniam P, Giorgi F, Douglass E . A modular master regulator landscape controls cancer transcriptional identity. Cell. 2021; 184(2):334-351.e20. PMC: 8103356. DOI: 10.1016/j.cell.2020.11.045. View