» Articles » PMID: 14594714

A Bayesian Missing Value Estimation Method for Gene Expression Profile Data

Overview
Journal Bioinformatics
Specialty Biology
Date 2003 Nov 5
PMID 14594714
Citations 145
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Gene expression profile analyses have been used in numerous studies covering a broad range of areas in biology. When unreliable measurements are excluded, missing values are introduced in gene expression profiles. Although existing multivariate analysis methods have difficulty with the treatment of missing values, this problem has received little attention. There are many options for dealing with missing values, each of which reaches drastically different results. Ignoring missing values is the simplest method and is frequently applied. This approach, however, has its flaws. In this article, we propose an estimation method for missing values, which is based on Bayesian principal component analysis (BPCA). Although the methodology that a probabilistic model and latent variables are estimated simultaneously within the framework of Bayes inference is not new in principle, actual BPCA implementation that makes it possible to estimate arbitrary missing variables is new in terms of statistical methodology.

Results: When applied to DNA microarray data from various experimental conditions, the BPCA method exhibited markedly better estimation ability than other recently proposed methods, such as singular value decomposition and K-nearest neighbors. While the estimation performance of existing methods depends on model parameters whose determination is difficult, our BPCA method is free from this difficulty. Accordingly, the BPCA method provides accurate and convenient estimation for missing values.

Availability: The software is available at http://hawaii.aist-nara.ac.jp/~shige-o/tools/.

Citing Articles

Maintenance of cell wall remodeling and vesicle production are connected in .

Salgueiro-Toledo V, Bertol J, Gutierrez C, Serrano-Mestre J, Ferrer-Luzon N, Vazquez-Iniesta L Elife. 2025; 13.

PMID: 39960848 PMC: 11832169. DOI: 10.7554/eLife.94982.


Embracing the informative missingness and silent gene in analyzing biologically diverse samples.

Du D, Bhardwaj S, Lu Y, Wang Y, Parker S, Zhang Z Sci Rep. 2024; 14(1):28265.

PMID: 39550430 PMC: 11569126. DOI: 10.1038/s41598-024-78076-0.


A novel MissForest-based missing values imputation approach with recursive feature elimination in medical applications.

Hu Y, Wu R, Lin Y, Lin T BMC Med Res Methodol. 2024; 24(1):269.

PMID: 39516783 PMC: 11546113. DOI: 10.1186/s12874-024-02392-2.


Principal component analysis revisited: fast multitrait genetic evaluations with smooth convergence.

Ahlinder J, Hall D, Suontama M, Sillanpaa M G3 (Bethesda). 2024; .

PMID: 39429114 PMC: 11631533. DOI: 10.1093/g3journal/jkae228.


PEPerMINT: peptide abundance imputation in mass spectrometry-based proteomics using graph neural networks.

Pietz T, Gupta S, Schlaffner C, Ahmed S, Steen H, Renard B Bioinformatics. 2024; 40(Suppl 2):ii70-ii78.

PMID: 39230699 PMC: 11373339. DOI: 10.1093/bioinformatics/btae389.