Semi-supervised Recursively Partitioned Mixture Models for Identifying Cancer Subtypes

Overview

Journal Bioinformatics

Publisher Oxford University Press

Specialty Biology

Date 2010 Sep 14

PMID 20834038

Citations 28

Authors

Devin C Koestler

Carmen J Marsit

Brock C Christensen

Margaret R Karagas

Raphael Bueno

David J Sugarbaker

Karl T Kelsey

E Andres Houseman

Affiliations

Soon will be listed here.

Abstract

Motivation: Patients with identical cancer diagnoses often progress differently. The disparity we see in disease progression and treatment response can be attributed to the idea that two histologically similar cancers may be completely different diseases on the molecular level. Methods for identifying cancer subtypes associated with patient survival have the capacity to be powerful instruments for understanding the biochemical processes that underlie disease progression as well as providing an initial step toward more personalized therapy for cancer patients. We propose a method called semi-supervised recursively partitioned mixture models (SS-RPMM) that utilizes array-based genetic and patient-level clinical data for finding cancer subtypes that are associated with patient survival.

Results: In the proposed SS-RPMM, cancer subtypes are identified using a selected subset of genes that are associated with survival time. Since survival information is used in the gene selection step, this method is semi-supervised. Unlike other semi-supervised clustering classification methods, SS-RPMM does not require specification of the number of cancer subtypes, which is often unknown. In a simulation study, our proposed method compared favorably with other competing semi-supervised methods, including: semi-supervised clustering and supervised principal components analysis. Furthermore, an analysis of mesothelioma cancer data using SS-RPMM, revealed at least two distinct methylation profiles that are informative for survival.

Availability: The analyses implemented in this article were carried out using R (http://www.r.project.org/).

Contact: devin_koestler@brown.edu; e_andres_houseman@brown.edu

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Evaluation of agreement between common clustering strategies for DNA methylation-based subtyping of breast tumours.

Zarean E, Li S, Wong E, Makalic E, Milne R, Giles G Epigenomics. 2024; 17(2):105-114.

PMID: 39711216 PMC: 11792870. DOI: 10.1080/17501911.2024.2441653.

Genome-Scale Methylation Analysis Identifies Immune Profiles and Age Acceleration Associations with Bladder Cancer Outcomes.

Chen J, Salas L, Wiencke J, Koestler D, Molinaro A, Andrew A Cancer Epidemiol Biomarkers Prev. 2023; 32(10):1328-1337.

PMID: 37527159 PMC: 10543967. DOI: 10.1158/1055-9965.EPI-23-0331.

Comparative Transcriptome Profiling Reveals the Genes Involved in Storage Root Expansion in Sweetpotato ( (L.) Lam.).

Song W, Yan H, Ma M, Kou M, Li C, Tang W Genes (Basel). 2022; 13(7).

PMID: 35885939 PMC: 9321896. DOI: 10.3390/genes13071156.

Subject level clustering using a negative binomial model for small transcriptomic studies.

Li Q, Noel-MacDonnell J, Koestler D, Goode E, Fridley B BMC Bioinformatics. 2018; 19(1):474.

PMID: 30541426 PMC: 6292049. DOI: 10.1186/s12859-018-2556-9.

Identification of relevant subtypes via preweighted sparse clustering.

Gaynor S, Bair E Comput Stat Data Anal. 2018; 116:139-154.

PMID: 29785064 PMC: 5959300. DOI: 10.1016/j.csda.2017.06.003.

References

Alizadeh A, Eisen M, Davis R, Ma C, Lossos I, Rosenwald A . Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000; 403(6769):503-11. DOI: 10.1038/35000501. View

Marsit C, Christensen B, Houseman E, Karagas M, Wrensch M, Yeh R . Epigenetic profiling reveals etiologically distinct patterns of DNA methylation in head and neck squamous cell carcinoma. Carcinogenesis. 2009; 30(3):416-22. PMC: 2650795. DOI: 10.1093/carcin/bgp006. View

Houseman E, Coull B, Betensky R . Feature-specific penalized latent class analysis for genomic data. Biometrics. 2006; 62(4):1062-70. DOI: 10.1111/j.1541-0420.2006.00566.x. View

Deneberg S, Grovdal M, Karimi M, Jansson M, Nahi H, Corbacioglu A . Gene-specific and global methylation patterns predict outcome in patients with acute myeloid leukemia. Leukemia. 2010; 24(5):932-41. DOI: 10.1038/leu.2010.41. View

Hou J, Aerts J, den Hamer B, van IJcken W, den Bakker M, Riegman P . Gene expression-based classification of non-small cell lung carcinomas and survival prediction. PLoS One. 2010; 5(4):e10312. PMC: 2858668. DOI: 10.1371/journal.pone.0010312. View

Lapointe J, Li C, Higgins J, van de Rijn M, Bair E, Montgomery K . Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proc Natl Acad Sci U S A. 2004; 101(3):811-6. PMC: 321763. DOI: 10.1073/pnas.0304146101. View

Zhao H, Ljungberg B, Grankvist K, Rasmuson T, Tibshirani R, Brooks J . Gene expression profiling predicts survival in conventional renal cell carcinoma. PLoS Med. 2005; 3(1):e13. PMC: 1298943. DOI: 10.1371/journal.pmed.0030013. View

Bair E, Tibshirani R . Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol. 2004; 2(4):E108. PMC: 387275. DOI: 10.1371/journal.pbio.0020108. View

Langfelder P, Zhang B, Horvath S . Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics. 2007; 24(5):719-20. DOI: 10.1093/bioinformatics/btm563. View

10.

Lee A, He B, You L, Dadfarmay S, Xu Z, Mazieres J . Expression of the secreted frizzled-related protein gene family is downregulated in human mesothelioma. Oncogene. 2004; 23(39):6672-6. DOI: 10.1038/sj.onc.1207881. View

11.

van de Vijver M, He Y, Vant Veer L, Dai H, Hart A, Voskuil D . A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002; 347(25):1999-2009. DOI: 10.1056/NEJMoa021967. View

12.

Bullinger L, Dohner K, Bair E, Frohling S, Schlenk R, Tibshirani R . Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. N Engl J Med. 2004; 350(16):1605-16. DOI: 10.1056/NEJMoa031046. View

13.

Houseman E, Christensen B, Yeh R, Marsit C, Karagas M, Wrensch M . Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions. BMC Bioinformatics. 2008; 9:365. PMC: 2553421. DOI: 10.1186/1471-2105-9-365. View

14.

Christensen B, Houseman E, Marsit C, Zheng S, Wrensch M, Wiemels J . Aging and environmental exposures alter tissue-specific DNA methylation dependent upon CpG island context. PLoS Genet. 2009; 5(8):e1000602. PMC: 2718614. DOI: 10.1371/journal.pgen.1000602. View

15.

Sorlie T, Tibshirani R, Parker J, Hastie T, Marron J, Nobel A . Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A. 2003; 100(14):8418-23. PMC: 166244. DOI: 10.1073/pnas.0932692100. View

16.

Christensen B, Marsit C, Houseman E, Godleski J, Longacker J, Zheng S . Differentiation of lung adenocarcinoma, pleural mesothelioma, and nonmalignant pulmonary tissues using DNA methylation profiles. Cancer Res. 2009; 69(15):6315-21. PMC: 2755616. DOI: 10.1158/0008-5472.CAN-09-1073. View

17.

Christensen B, Houseman E, Godleski J, Marsit C, Longacker J, Roelofs C . Epigenetic profiles distinguish pleural mesothelioma from normal pleura and predict lung asbestos burden and clinical outcome. Cancer Res. 2009; 69(1):227-34. PMC: 2744125. DOI: 10.1158/0008-5472.CAN-08-2586. View

18.

Jiang J, Gusev Y, Aderca I, Mettler T, Nagorney D, Brackett D . Association of MicroRNA expression in hepatocellular carcinomas with hepatitis infection, cirrhosis, and patient survival. Clin Cancer Res. 2008; 14(2):419-27. PMC: 2755230. DOI: 10.1158/1078-0432.CCR-07-0523. View

19.

J van t Veer L, Dai H, van de Vijver M, He Y, Hart A, Mao M . Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002; 415(6871):530-6. DOI: 10.1038/415530a. View

20.

Yu J, Yu J, Cordero K, Johnson M, Ghosh D, Rae J . A transcriptional fingerprint of estrogen in human breast cancer predicts patient survival. Neoplasia. 2008; 10(1):79-88. PMC: 2213902. DOI: 10.1593/neo.07859. View