» Articles » PMID: 38484017

INSIDER: Interpretable Sparse Matrix Decomposition for RNA Expression Data Analysis

Overview
Journal PLoS Genet
Specialty Genetics
Date 2024 Mar 14
PMID 38484017
Authors
Affiliations
Soon will be listed here.
Abstract

RNA sequencing (RNA-Seq) is widely used to capture transcriptome dynamics across tissues, biological entities, and conditions. Currently, few or no methods can handle multiple biological variables (e.g., tissues/ phenotypes) and their interactions simultaneously, while also achieving dimension reduction (DR). We propose INSIDER, a general and flexible statistical framework based on matrix factorization, which is freely available at https://github.com/kai0511/insider. INSIDER decomposes variation from different biological variables and their interactions into a shared low-rank latent space. Particularly, it introduces the elastic net penalty to induce sparsity while considering the grouping effects of genes. It can achieve DR of high-dimensional data (of > = 3 dimensions), as opposed to conventional methods (e.g., PCA/NMF) which generally only handle 2D data (e.g., sample × expression). Besides, it enables computing 'adjusted' expression profiles for specific biological variables while controlling variation from other variables. INSIDER is computationally efficient and accommodates missing data. INSIDER also performed similarly or outperformed a close competing method, SDA, as shown in simulations and can handle complex missing data in RNA-Seq data. Moreover, unlike SDA, it can be used when the data cannot be structured into a tensor. Lastly, we demonstrate its usefulness via real data analysis, including clustering donors for disease subtyping, revealing neuro-development trajectory using the BrainSpan data, and uncovering biological processes contributing to variables of interest (e.g., disease status and tissue) and their interactions.

Citing Articles

scParser: sparse representation learning for scalable single-cell RNA sequencing data analysis.

Zhao K, So H, Lin Z Genome Biol. 2024; 25(1):223.

PMID: 39152499 PMC: 11328435. DOI: 10.1186/s13059-024-03345-0.

References
1.
Hastie T, Mazumder R, Lee J, Zadeh R . Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares. J Mach Learn Res. 2019; 16:3367-3402. PMC: 6530939. View

2.
Hore V, Vinuela A, Buil A, Knight J, McCarthy M, Small K . Tensor decomposition for multiple-tissue gene expression experiments. Nat Genet. 2016; 48(9):1094-100. PMC: 5010142. DOI: 10.1038/ng.3624. View

3.
Johnson S, Blum R, Giedd J . Adolescent maturity and the brain: the promise and pitfalls of neuroscience research in adolescent health policy. J Adolesc Health. 2009; 45(3):216-21. PMC: 2892678. DOI: 10.1016/j.jadohealth.2009.05.016. View

4.
Clare R, King V, Wirenfeldt M, Vinters H . Synapse loss in dementias. J Neurosci Res. 2010; 88(10):2083-90. PMC: 3068914. DOI: 10.1002/jnr.22392. View

5.
Stiles J, Jernigan T . The basics of brain development. Neuropsychol Rev. 2010; 20(4):327-48. PMC: 2989000. DOI: 10.1007/s11065-010-9148-4. View