» Articles » PMID: 19377034

A Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis

Overview
Journal Biostatistics
Specialty Public Health
Date 2009 Apr 21
PMID 19377034
Citations 432
Authors
Affiliations
Soon will be listed here.
Abstract

We present a penalized matrix decomposition (PMD), a new framework for computing a rank-K approximation for a matrix. We approximate the matrix X as circumflexX = sigma(k=1)(K) d(k)u(k)v(k)(T), where d(k), u(k), and v(k) minimize the squared Frobenius norm of X - circumflexX, subject to penalties on u(k) and v(k). This results in a regularized version of the singular value decomposition. Of particular interest is the use of L(1)-penalties on u(k) and v(k), which yields a decomposition of X using sparse vectors. We show that when the PMD is applied using an L(1)-penalty on v(k) but not on u(k), a method for sparse principal components results. In fact, this yields an efficient algorithm for the "SCoTLASS" proposal (Jolliffe and others 2003) for obtaining sparse principal components. This method is demonstrated on a publicly available gene expression data set. We also establish connections between the SCoTLASS method for sparse principal component analysis and the method of Zou and others (2006). In addition, we show that when the PMD is applied to a cross-products matrix, it results in a method for penalized canonical correlation analysis (CCA). We apply this penalized CCA method to simulated data and to a genomic data set consisting of gene expression and DNA copy number measurements on the same set of samples.

Citing Articles

A unified hypothesis-free feature extraction framework for diverse epigenomic data.

Balci A, Chikina M Bioinform Adv. 2025; 5(1):vbaf013.

PMID: 40078573 PMC: 11897706. DOI: 10.1093/bioadv/vbaf013.


Partial face visibility and facial cognition: event-related potential and eye tracking investigation.

Chanpornpakdi I, Wongsawat Y, Tanaka T Cogn Neurodyn. 2025; 19(1):47.

PMID: 40070675 PMC: 11893966. DOI: 10.1007/s11571-025-10231-3.


Systematic reconstruction of molecular pathway signatures using scalable single-cell perturbation screens.

Jiang L, Dalgarno C, Papalexi E, Mascio I, Wessels H, Yun H Nat Cell Biol. 2025; 27(3):505-517.

PMID: 40011560 DOI: 10.1038/s41556-025-01622-z.


Robust convex biclustering with a tuning-free method.

Chen Y, Lei C, Li C, Ma H, Hu N J Appl Stat. 2025; 52(2):271-286.

PMID: 39926177 PMC: 11800347. DOI: 10.1080/02664763.2024.2367143.


Discovery of robust and highly specific microbiome signatures of non-alcoholic fatty liver disease.

Nychas E, Marfil-Sanchez A, Chen X, Mirhakkak M, Li H, Jia W Microbiome. 2025; 13(1):10.

PMID: 39810263 PMC: 11730835. DOI: 10.1186/s40168-024-01990-y.


References
1.
Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R . Missing value estimation methods for DNA microarrays. Bioinformatics. 2001; 17(6):520-5. DOI: 10.1093/bioinformatics/17.6.520. View

2.
Hyman E, Kauraniemi P, Hautaniemi S, Wolf M, Mousses S, Rozenblum E . Impact of DNA amplification on gene expression patterns in breast cancer. Cancer Res. 2002; 62(21):6240-5. View

3.
Lee D, Seung H . Learning the parts of objects by non-negative matrix factorization. Nature. 1999; 401(6755):788-91. DOI: 10.1038/44565. View

4.
Morley M, Molony C, Weber T, Devlin J, Ewens K, Spielman R . Genetic analysis of genome-wide variation in human gene expression. Nature. 2004; 430(7001):743-7. PMC: 2966974. DOI: 10.1038/nature02797. View

5.
Parkhomenko E, Tritchler D, Beyene J . Sparse canonical correlation analysis with application to genomic data integration. Stat Appl Genet Mol Biol. 2009; 8:Article 1. DOI: 10.2202/1544-6115.1406. View