» Articles » PMID: 28407213

Analysis of Multiple Diverse Phenotypes Via Semiparametric Canonical Correlation Analysis

Overview
Journal Biometrics
Specialty Public Health
Date 2017 Apr 14
PMID 28407213
Citations 1
Authors
Affiliations
Soon will be listed here.
Abstract

Studying multiple outcomes simultaneously allows researchers to begin to identify underlying factors that affect all of a set of diseases (i.e., shared etiology) and what may give rise to differences in disorders between patients (i.e., disease subtypes). In this work, our goal is to build risk scores that are predictive of multiple phenotypes simultaneously and identify subpopulations at high risk of multiple phenotypes. Such analyses could yield insight into etiology or point to treatment and prevention strategies. The standard canonical correlation analysis (CCA) can be used to relate multiple continuous outcomes to multiple predictors. However, in order to capture the full complexity of a disorder, phenotypes may include a diverse range of data types, including binary, continuous, ordinal, and censored variables. When phenotypes are diverse in this way, standard CCA is not possible and no methods currently exist to model them jointly. In the presence of such complications, we propose a semi-parametric CCA method to develop risk scores that are predictive of multiple phenotypes. To guard against potential model mis-specification, we also propose a nonparametric calibration method to identify subgroups that are at high risk of multiple disorders. A resampling procedure is also developed to account for the variability in these estimates. Our method opens the door to synthesizing a wide array of data sources for the purposes of joint prediction.

Citing Articles

Sparse semiparametric canonical correlation analysis for data of mixed types.

Yoon G, Carroll R, Gaynanova I Biometrika. 2021; 107(3):609-625.

PMID: 34621080 PMC: 8494134. DOI: 10.1093/biomet/asaa007.

References
1.
Hardoon D, Szedmak S, Shawe-Taylor J . Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 2004; 16(12):2639-64. DOI: 10.1162/0899766042321814. View

2.
OBRIEN W, Hartigan P, Martin D, Esinhart J, Hill A, Benoit S . Changes in plasma HIV-1 RNA and CD4+ lymphocyte counts and the risk of progression to AIDS. Veterans Affairs Cooperative Study Group on AIDS. N Engl J Med. 1996; 334(7):426-31. DOI: 10.1056/NEJM199602153340703. View

3.
Zeng D, Lin D . A GENERAL ASYMPTOTIC THEORY FOR MAXIMUM LIKELIHOOD ESTIMATION IN SEMIPARAMETRIC REGRESSION MODELS WITH CENSORED DATA. Stat Sin. 2010; 20(2):871-910. PMC: 2888521. View

4.
Snavely A, Harrington D, Li Y . A latent variable transformation model approach for exploring dysphagia. Stat Med. 2014; 33(25):4337-52. PMC: 7263574. DOI: 10.1002/sim.6239. View

5.
Zhou L, Lin H, Song X, Li Y . Selection of latent variables for multiple mixed-outcome models. Scand Stat Theory Appl. 2016; 41(4):1064-1082. PMC: 5026194. DOI: 10.1111/sjos.12084. View