» Articles » PMID: 30892199

Nonlinear Dimensionality Reduction With Missing Data Using Parametric Multiple Imputations

Overview
Date 2019 Mar 21
PMID 30892199
Citations 6
Authors
Affiliations
Soon will be listed here.
Abstract

Dimensionality reduction (DR) aims at faithfully and meaningfully representing high-dimensional (HD) data into a low-dimensional (LD) space. Recently developed neighbor embedding DR methods lead to outstanding performances, thanks to their ability to foil the curse of dimensionality. Unfortunately, they cannot be directly employed on incomplete data sets, which become ubiquitous in machine learning. Discarding samples with missing features prevents their LD coordinates computation and deteriorates the complete samples treatment. Common missing data imputation schemes are not appropriate in the nonlinear DR context either. Indeed, even if they model the data distribution in the feature space, they can, at best, enable the application of a DR scheme on the expected data set. In practice, one would, instead, like to obtain the LD embedding with the closest cost function value on average with respect to the complete data case. As the state-of-the-art DR techniques are nonlinear, the latter embedding results from minimizing the expected cost function on the incomplete database, not from considering the expected data set. This paper addresses these limitations by developing a general methodology for nonlinear DR with missing data, being directly applicable with any DR scheme optimizing some criterion. In order to model the feature dependences, an HD extension of Gaussian mixture models is first fitted on the incomplete data set. It is afterward employed under the multiple imputation paradigms to obtain a single relevant LD embedding, thus minimizing the cost function expectation. Extensive experiments demonstrate the superiority of the suggested framework over alternative approaches.

Citing Articles

Revealing unexpected complex encoding but simple decoding mechanisms in motor cortex via separating behaviorally relevant neural signals.

Li Y, Zhu X, Qi Y, Wang Y Elife. 2024; 12.

PMID: 39120996 PMC: 11315449. DOI: 10.7554/eLife.87881.


ParaDime: A Framework for Parametric Dimensionality Reduction.

Hinterreiter A, Humer C, Kainz B, Streit M Comput Graph Forum. 2024; 42(3):337-348.

PMID: 38505300 PMC: 10947012. DOI: 10.1111/cgf.14834.


Eleven quick tips for data cleaning and feature engineering.

Chicco D, Oneto L, Tavazzi E PLoS Comput Biol. 2022; 18(12):e1010718.

PMID: 36520712 PMC: 9754225. DOI: 10.1371/journal.pcbi.1010718.


State-dependent sequential allostery exhibited by chaperonin TRiC/CCT revealed by network analysis of Cryo-EM maps.

Zhang Y, Krieger J, Mikulska-Ruminska K, Kaynak B, Sorzano C, Carazo J Prog Biophys Mol Biol. 2020; 160:104-120.

PMID: 32866476 PMC: 7914283. DOI: 10.1016/j.pbiomolbio.2020.08.006.


Intrinsic dimensionality of human behavioral activity data.

Fragoso L, Paul T, Vadan F, Stanley K, Bell S, Osgood N PLoS One. 2019; 14(6):e0218966.

PMID: 31247031 PMC: 6597084. DOI: 10.1371/journal.pone.0218966.