» Articles » PMID: 27716030

Handling Missing Rows in Multi-omics Data Integration: Multiple Imputation in Multiple Factor Analysis Framework

Overview
Publisher Biomed Central
Specialty Biology
Date 2016 Oct 8
PMID 27716030
Citations 28
Authors
Affiliations
Soon will be listed here.
Abstract

Background: In omics data integration studies, it is common, for a variety of reasons, for some individuals to not be present in all data tables. Missing row values are challenging to deal with because most statistical methods cannot be directly applied to incomplete datasets. To overcome this issue, we propose a multiple imputation (MI) approach in a multivariate framework. In this study, we focus on multiple factor analysis (MFA) as a tool to compare and integrate multiple layers of information. MI involves filling the missing rows with plausible values, resulting in M completed datasets. MFA is then applied to each completed dataset to produce M different configurations (the matrices of coordinates of individuals). Finally, the M configurations are combined to yield a single consensus solution.

Results: We assessed the performance of our method, named MI-MFA, on two real omics datasets. Incomplete artificial datasets with different patterns of missingness were created from these data. The MI-MFA results were compared with two other approaches i.e., regularized iterative MFA (RI-MFA) and mean variable imputation (MVI-MFA). For each configuration resulting from these three strategies, the suitability of the solution was determined against the true MFA configuration obtained from the original data and a comprehensive graphical comparison showing how the MI-, RI- or MVI-MFA configurations diverge from the true configuration was produced. Two approaches i.e., confidence ellipses and convex hulls, to visualize and assess the uncertainty due to missing values were also described. We showed how the areas of ellipses and convex hulls increased with the number of missing individuals. A free and easy-to-use code was proposed to implement the MI-MFA method in the R statistical environment.

Conclusions: We believe that MI-MFA provides a useful and attractive method for estimating the coordinates of individuals on the first MFA components despite missing rows. MI-MFA configurations were close to the true configuration even when many individuals were missing in several data tables. This method takes into account the uncertainty of MI-MFA configurations induced by the missing rows, thereby allowing the reliability of the results to be evaluated.

Citing Articles

A Neural Database for Answering Aggregate Queries on Incomplete Relational Data.

Zeighami S, Seshadri R, Shahabi C IEEE Trans Knowl Data Eng. 2024; 36(7):2790-2802.

PMID: 39555147 PMC: 11566937. DOI: 10.1109/tkde.2023.3310914.


Optimizing multi-omics data imputation with NMF and GAN synergy.

Ansari M, Ahmed K, Zhang W Bioinformatics. 2024; 40(11).

PMID: 39546381 PMC: 11639186. DOI: 10.1093/bioinformatics/btae674.


iSubGen generates integrative disease subtypes by pairwise similarity assessment.

Fox N, Tian M, Markowitz A, Haider S, Li C, Boutros P Cell Rep Methods. 2024; 4(11):100884.

PMID: 39447572 PMC: 11705582. DOI: 10.1016/j.crmeth.2024.100884.


NetMIM: network-based multi-omics integration with block missingness for biomarker selection and disease outcome prediction.

Zhu B, Zhang Z, Leung S, Fan X Brief Bioinform. 2024; 25(5).

PMID: 39288230 PMC: 11407451. DOI: 10.1093/bib/bbae454.


Functional impact of multi-omic interactions in lung cancer.

Diaz-Campos M, Vasquez-Arriaga J, Ochoa S, Hernandez-Lemus E Front Genet. 2024; 15:1282241.

PMID: 38389572 PMC: 10881857. DOI: 10.3389/fgene.2024.1282241.


References
1.
Nishizuka S, Charboneau L, Young L, Major S, Reinhold W, Waltham M . Proteomic profiling of the NCI-60 cancer cell lines using new high-density reverse-phase lysate microarrays. Proc Natl Acad Sci U S A. 2003; 100(24):14229-34. PMC: 283574. DOI: 10.1073/pnas.2331323100. View

2.
Bushel P, Wolfinger R, Gibson G . Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes. BMC Syst Biol. 2007; 1:15. PMC: 1839893. DOI: 10.1186/1752-0509-1-15. View

3.
van Buuren S . Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res. 2007; 16(3):219-42. DOI: 10.1177/0962280206074463. View

4.
Nakagawa S, Freckleton R . Missing inaction: the dangers of ignoring missing data. Trends Ecol Evol. 2008; 23(11):592-6. DOI: 10.1016/j.tree.2008.06.014. View

5.
Liu H, DAndrade P, Fulmer-Smentek S, Lorenzi P, Kohn K, Weinstein J . mRNA and microRNA expression profiles of the NCI-60 integrated with drug activities. Mol Cancer Ther. 2010; 9(5):1080-91. PMC: 2879615. DOI: 10.1158/1535-7163.MCT-09-0965. View