» Articles » PMID: 31329928

Data-driven Characterization of Molecular Phenotypes Across Heterogeneous Sample Collections

Overview
Specialty Biochemistry
Date 2019 Jul 23
PMID 31329928
Citations 11
Authors
Affiliations
Soon will be listed here.
Abstract

Existing large gene expression data repositories hold enormous potential to elucidate disease mechanisms, characterize changes in cellular pathways, and to stratify patients based on molecular profiles. To achieve this goal, integrative resources and tools are needed that allow comparison of results across datasets and data types. We propose an intuitive approach for data-driven stratifications of molecular profiles and benchmark our methodology using the dimensionality reduction algorithm t-distributed stochastic neighbor embedding (t-SNE) with multi-study and multi-platform data on hematological malignancies. Our approach enables assessing the contribution of biological versus technical variation to sample clustering, direct incorporation of additional datasets to the same low dimensional representation, comparison of molecular disease subtypes identified from separate t-SNE representations, and characterization of the obtained clusters based on pathway databases and additional data. In this manner, we performed an integrative analysis across multi-omics acute myeloid leukemia studies. Our approach indicated new molecular subtypes with differential survival and drug responsiveness among samples lacking fusion genes, including a novel myelodysplastic syndrome-like cluster and a cluster characterized with CEBPA mutations and differential activity of the S-adenosylmethionine-dependent DNA methylation pathway. In summary, integration across multiple studies can help to identify novel molecular disease subtypes and generate insight into disease biology.

Citing Articles

Identifying prognostic biomarker related to immune infiltration in acute myeloid leukemia.

Lu W, Yu G, Li Y, Yin C, Long J, Chen X Clin Exp Med. 2023; 23(8):4553-4562.

PMID: 37561221 DOI: 10.1007/s10238-023-01164-4.


Erythroid/megakaryocytic differentiation confers BCL-XL dependency and venetoclax resistance in acute myeloid leukemia.

Kuusanmaki H, Dufva O, Vaha-Koskela M, Leppa A, Huuhtanen J, Vanttinen I Blood. 2022; 141(13):1610-1625.

PMID: 36508699 PMC: 10651789. DOI: 10.1182/blood.2021011094.


Metabolic Phenotyping of Marine Heterotrophs on Refactored Media Reveals Diverse Metabolic Adaptations and Lifestyle Strategies.

Forchielli E, Sher D, Segre D mSystems. 2022; 7(4):e0007022.

PMID: 35856685 PMC: 9426600. DOI: 10.1128/msystems.00070-22.


Arginine Methyltransferase PRMT7 Deregulates Expression of RUNX1 Target Genes in T-Cell Acute Lymphoblastic Leukemia.

Oksa L, Makinen A, Nikkila A, Hyvarinen N, Laukkanen S, Rokka A Cancers (Basel). 2022; 14(9).

PMID: 35565298 PMC: 9101393. DOI: 10.3390/cancers14092169.


A systematic comparison of data- and knowledge-driven approaches to disease subtype discovery.

Rintala T, Federico A, Latonen L, Greco D, Fortino V Brief Bioinform. 2021; 22(6).

PMID: 34396389 PMC: 8575038. DOI: 10.1093/bib/bbab314.


References
1.
Yoo M, Shin J, Kim J, Ryall K, Lee K, Lee S . DSigDB: drug signatures database for gene set analysis. Bioinformatics. 2015; 31(18):3069-71. PMC: 4668778. DOI: 10.1093/bioinformatics/btv313. View

2.
Tung P, Blischak J, Hsiao C, Knowles D, Burnett J, Pritchard J . Batch effects and the effective design of single-cell gene expression studies. Sci Rep. 2017; 7:39921. PMC: 5206706. DOI: 10.1038/srep39921. View

3.
Edgar R, Domrachev M, Lash A . Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2001; 30(1):207-10. PMC: 99122. DOI: 10.1093/nar/30.1.207. View

4.
Meng C, Zeleznik O, Thallinger G, Kuster B, Gholami A, Culhane A . Dimension reduction techniques for the integrative analysis of multi-omics data. Brief Bioinform. 2016; 17(4):628-41. PMC: 4945831. DOI: 10.1093/bib/bbv108. View

5.
Kaski S, Nikkila J, Oja M, Venna J, Toronen P, Castren E . Trustworthiness and metrics in visualizing similarity of gene expression. BMC Bioinformatics. 2003; 4:48. PMC: 272927. DOI: 10.1186/1471-2105-4-48. View