» Articles » PMID: 38250319

GEOMETRIC STRUCTURE GUIDED MODEL AND ALGORITHMS FOR COMPLETE DECONVOLUTION OF GENE EXPRESSION DATA

Overview
Journal Found Data Sci
Date 2024 Jan 22
PMID 38250319
Authors
Affiliations
Soon will be listed here.
Abstract

Complete deconvolution analysis for bulk RNA-seq data is important and helpful to distinguish whether the differences of disease-associated GEPs (gene expression profiles) in tissues of patients and normal controls are due to changes in cellular composition of tissue samples, or due to GEPs changes in specific cells. One of the major techniques to perform complete deconvolution is nonnegative matrix factorization (NMF), which also has a wide-range of applications in the machine learning community. However, the NMF is a well-known strongly ill-posed problem, so a direct application of NMF to RNA-seq data will suffer severe difficulties in the interpretability of solutions. In this paper, we develop an NMF-based mathematical model and corresponding computational algorithms to improve the solution identifiability of deconvoluting bulk RNA-seq data. In our approach, we combine the biological concept of marker genes with the solvability conditions of the NMF theories, and develop a geometric structures guided optimization model. In this strategy, the geometric structure of bulk tissue data is first explored by the spectral clustering technique. Then, the identified information of marker genes is integrated as solvability constraints, while the overall correlation graph is used as manifold regularization. Both synthetic and biological data are used to validate the proposed model and algorithms, from which solution interpretability and accuracy are significantly improved.

Citing Articles

Analyzing Single Cell RNA Sequencing with Topological Nonnegative Matrix Factorization.

Hozumi Y, Wei G J Comput Appl Math. 2024; 445.

PMID: 38464901 PMC: 10919214. DOI: 10.1016/j.cam.2024.115842.


A hybrid stochastic interpolation and compression method for kernel matrices.

Chen D J Comput Phys. 2023; 494.

PMID: 38098855 PMC: 10720703. DOI: 10.1016/j.jcp.2023.112491.

References
1.
Zhong Y, Wan Y, Pang K, Chow L, Liu Z . Digital sorting of complex tissues for cell type-specific gene expression profiles. BMC Bioinformatics. 2013; 14:89. PMC: 3626856. DOI: 10.1186/1471-2105-14-89. View

2.
Zaitsev K, Bambouskova M, Swain A, Artyomov M . Complete deconvolution of cellular mixtures based on linearity of transcriptional signatures. Nat Commun. 2019; 10(1):2209. PMC: 6525259. DOI: 10.1038/s41467-019-09990-5. View

3.
Laurberg H, Christensen M, Plumbley M, Hansen L, Holdt Jensen S . Theorems on positive data: on the uniqueness of NMF. Comput Intell Neurosci. 2008; :764206. PMC: 2386872. DOI: 10.1155/2008/764206. View

4.
Lee D, Seung H . Learning the parts of objects by non-negative matrix factorization. Nature. 1999; 401(6755):788-91. DOI: 10.1038/44565. View

5.
Cui A, Quon G, Rosenberg A, Yeung R, Morris Q . Gene Expression Deconvolution for Uncovering Molecular Signatures in Response to Therapy in Juvenile Idiopathic Arthritis. PLoS One. 2016; 11(5):e0156055. PMC: 4887077. DOI: 10.1371/journal.pone.0156055. View