Digitizing Omics Profiles by Divergence from a Baseline

Overview

Journal Proc Natl Acad Sci U S A

Specialty Science

Date 2018 Apr 19

PMID 29666255

Citations 12

Authors

Wikum Dinalankara

Qian Ke

Yiran Xu

Lanlan Ji

Nicole Pagane

Anching Lien

Tejasvi Matam

Elana J Fertig

Nathan D Price

Laurent Younes

Luigi Marchionni

Donald Geman

Affiliations

Soon will be listed here.

Abstract

Data collected from omics technologies have revealed pervasive heterogeneity and stochasticity of molecular states within and between phenotypes. A prominent example of such heterogeneity occurs between genome-wide mRNA, microRNA, and methylation profiles from one individual tumor to another, even within a cancer subtype. However, current methods in bioinformatics, such as detecting differentially expressed genes or CpG sites, are population-based and therefore do not effectively model intersample diversity. Here we introduce a unified theory to quantify sample-level heterogeneity that is applicable to a single omics profile. Specifically, we simplify an omics profile to a digital representation based on the omics profiles from a set of samples from a reference or baseline population (e.g., normal tissues). The state of any subprofile (e.g., expression vector for a subset of genes) is said to be "divergent" if it lies outside the estimated support of the baseline distribution and is consequently interpreted as "dysregulated" relative to that baseline. We focus on two cases: single features (e.g., individual genes) and distinguished subsets (e.g., regulatory pathways). Notably, since the divergence analysis is at the individual sample level, dysregulation can be analyzed probabilistically; for example, one can estimate the probability that a gene or pathway is divergent in some population. Finally, the reduction in complexity facilitates a more "personalized" and biologically interpretable analysis of variation, as illustrated by experiments involving tissue characterization, disease detection and progression, and disease-pathway associations.

Citing Articles

PhosCancer: A comprehensive database for investigating protein phosphorylation in human cancer.

Dong Q, Shen D, Ye J, Chen J, Li J iScience. 2024; 27(11):111060.

PMID: 39493875 PMC: 11530918. DOI: 10.1016/j.isci.2024.111060.

CellBiAge: Improved single-cell age classification using data binarization.

Yu D, Li M, Linghu G, Hu Y, Hajdarovic K, Wang A Cell Rep. 2023; 42(12):113500.

PMID: 38032797 PMC: 10791072. DOI: 10.1016/j.celrep.2023.113500.

Transcriptomic Harmonization as the Way for Suppressing Cross-Platform Bias and Batch Effect.

Borisov N, Buzdin A Biomedicines. 2022; 10(9).

PMID: 36140419 PMC: 9496268. DOI: 10.3390/biomedicines10092318.

Comprehensive Analysis of Ubiquitously Expressed Genes in Humans from A Data-driven Perspective.

Gu J, Dai J, Lu H, Zhao H Genomics Proteomics Bioinformatics. 2022; 21(1):164-176.

PMID: 35569803 PMC: 10373092. DOI: 10.1016/j.gpb.2021.08.017.

Efficient representations of tumor diversity with paired DNA-RNA aberrations.

Ke Q, Dinalankara W, Younes L, Geman D, Marchionni L PLoS Comput Biol. 2021; 17(6):e1008944.

PMID: 34115745 PMC: 8221796. DOI: 10.1371/journal.pcbi.1008944.

References

Tan A, Naiman D, Xu L, Winslow R, Geman D . Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics. 2005; 21(20):3896-904. PMC: 1987374. DOI: 10.1093/bioinformatics/bti631. View

Saade G, Boggess K, Sullivan S, Markenson G, Iams J, Coonrod D . Development and validation of a spontaneous preterm delivery predictor in asymptomatic women. Am J Obstet Gynecol. 2016; 214(5):633.e1-633.e24. DOI: 10.1016/j.ajog.2016.02.001. View

Ross A, Johnson M, Yousefi K, Davicioni E, Netto G, Marchionni L . Tissue-based Genomics Augments Post-prostatectomy Risk Stratification in a Natural History Cohort of Intermediate- and High-Risk Men. Eur Urol. 2015; 69(1):157-65. DOI: 10.1016/j.eururo.2015.05.042. View

Liberzon A, Birger C, Thorvaldsdottir H, Ghandi M, Mesirov J, Tamayo P . The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 2016; 1(6):417-425. PMC: 4707969. DOI: 10.1016/j.cels.2015.12.004. View

Bolstad B, Irizarry R, Astrand M, Speed T . A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003; 19(2):185-93. DOI: 10.1093/bioinformatics/19.2.185. View

Zilliox M, Irizarry R . A gene expression bar code for microarray data. Nat Methods. 2007; 4(11):911-3. PMC: 3154617. DOI: 10.1038/nmeth1102. View

Afsari B, Geman D, Fertig E . Learning dysregulated pathways in cancers from differential variability analysis. Cancer Inform. 2014; 13(Suppl 5):61-7. PMC: 4218688. DOI: 10.4137/CIN.S14066. View

Dinalankara W, Corrada Bravo H . Gene Expression Signatures Based on Variability can Robustly Predict Tumor Progression and Prognosis. Cancer Inform. 2015; 14:71-81. PMC: 4460970. DOI: 10.4137/CIN.S23862. View

Geman D, dAvignon C, Naiman D, Winslow R . Classifying gene expression profiles from pairwise mRNA comparisons. Stat Appl Genet Mol Biol. 2006; 3:Article19. PMC: 1989150. DOI: 10.2202/1544-6115.1071. View

10.

Price N, Magis A, Earls J, Glusman G, Levy R, Lausted C . A wellness study of 108 individuals using personal, dense, dynamic data clouds. Nat Biotechnol. 2017; 35(8):747-756. PMC: 5568837. DOI: 10.1038/nbt.3870. View

11.

Marchionni L, Afsari B, Geman D, Leek J . A simple and reproducible breast cancer prognostic test. BMC Genomics. 2013; 14:336. PMC: 3662649. DOI: 10.1186/1471-2164-14-336. View

12.

. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013; 45(6):580-5. PMC: 4010069. DOI: 10.1038/ng.2653. View

13.

McCall M, Uppal K, Jaffee H, Zilliox M, Irizarry R . The Gene Expression Barcode: leveraging public data repositories to begin cataloging the human and murine transcriptomes. Nucleic Acids Res. 2010; 39(Database issue):D1011-5. PMC: 3013751. DOI: 10.1093/nar/gkq1259. View

14.

Metzger-Filho O, Sun Z, Viale G, Price K, Crivellari D, Snyder R . Patterns of Recurrence and outcome according to breast cancer subtypes in lymph node-negative disease: results from international breast cancer study group trials VIII and IX. J Clin Oncol. 2013; 31(25):3083-90. PMC: 3753700. DOI: 10.1200/JCO.2012.46.1574. View

15.

Tibshirani R, Hastie T . Outlier sums for differential gene expression analysis. Biostatistics. 2006; 8(1):2-8. DOI: 10.1093/biostatistics/kxl005. View

16.

Haque R, Ahmed S, Inzhakova G, Shi J, Avila C, Polikoff J . Impact of breast cancer subtypes and treatment on survival: an analysis spanning two decades. Cancer Epidemiol Biomarkers Prev. 2012; 21(10):1848-55. PMC: 3467337. DOI: 10.1158/1055-9965.EPI-12-0474. View

17.

Li X, Hayward C, Fong P, Dominguez M, Hunsucker S, Lee L . A blood-based proteomic classifier for the molecular characterization of pulmonary nodules. Sci Transl Med. 2013; 5(207):207ra142. PMC: 4114963. DOI: 10.1126/scitranslmed.3007013. View

18.

Corrada Bravo H, Pihur V, McCall M, Irizarry R, Leek J . Gene expression anti-profiles as a basis for accurate universal cancer signatures. BMC Bioinformatics. 2012; 13:272. PMC: 3487959. DOI: 10.1186/1471-2105-13-272. View

19.

Weinstein J, Collisson E, Mills G, Mills Shaw K, Ozenberger B, Ellrott K . The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013; 45(10):1113-20. PMC: 3919969. DOI: 10.1038/ng.2764. View

20.

Eddy J, Hood L, Price N, Geman D . Identifying tightly regulated and variably expressed networks by Differential Rank Conservation (DIRAC). PLoS Comput Biol. 2010; 6(5):e1000792. PMC: 2877722. DOI: 10.1371/journal.pcbi.1000792. View