Robust Rank Aggregation for Gene List Integration and Meta-analysis

Overview

Journal Bioinformatics

Publisher Oxford University Press

Specialty Biology

Date 2012 Jan 17

PMID 22247279

Citations 526

Authors

Raivo Kolde

Sven Laur

Priit Adler

Jaak Vilo

Affiliations

Soon will be listed here.

Abstract

Motivation: The continued progress in developing technological platforms, availability of many published experimental datasets, as well as different statistical methods to analyze those data have allowed approaching the same research question using various methods simultaneously. To get the best out of all these alternatives, we need to integrate their results in an unbiased manner. Prioritized gene lists are a common result presentation method in genomic data analysis applications. Thus, the rank aggregation methods can become a useful and general solution for the integration task.

Results: Standard rank aggregation methods are often ill-suited for biological settings where the gene lists are inherently noisy. As a remedy, we propose a novel robust rank aggregation (RRA) method. Our method detects genes that are ranked consistently better than expected under null hypothesis of uncorrelated inputs and assigns a significance score for each gene. The underlying probabilistic model makes the algorithm parameter free and robust to outliers, noise and errors. Significance scores also provide a rigorous way to keep only the statistically relevant genes in the final list. These properties make our approach robust and compelling for many settings.

Availability: All the methods are implemented as a GNU R package RobustRankAggreg, freely available at the Comprehensive R Archive Network http://cran.r-project.org/.

Citing Articles

Expression of ENL YEATS domain tumor mutations in nephrogenic or stromal lineage impairs kidney development.

Xue Z, Xuan H, Lau K, Su Y, Wegener M, Li K Nat Commun. 2025; 16(1):2531.

PMID: 40087269 DOI: 10.1038/s41467-025-57926-z.

Comprehensive bioinformatics analysis reveals key hub genes linked to prognosis in multiple myeloma with drug resistance.

Chen X, Wu Y, Li Y, Chen Q, Yao L, Lin L Medicine (Baltimore). 2025; 104(10):e41707.

PMID: 40068082 PMC: 11902958. DOI: 10.1097/MD.0000000000041707.

Identification and multi-omics analysis of essential coding and long non-coding genes in colorectal cancer.

Li Y, Meng Z, Fan C, Rong H, Xi Y, Liao Q Biochem Biophys Rep. 2025; 41:101938.

PMID: 40034256 PMC: 11874739. DOI: 10.1016/j.bbrep.2025.101938.

PRODE recovers essential and context-essential genes through neighborhood-informed scores.

Cantore T, Gasperini P, Bevilacqua R, Ciani Y, Sinha S, Ruppin E Genome Biol. 2025; 26(1):42.

PMID: 40022167 PMC: 11869679. DOI: 10.1186/s13059-025-03501-0.

Investigating the epigenetic landscape of symptomatic disk degeneration: a case study.

Yeater T, Kawarai Y, Lee S, Belani K, Beebe D, Sheyn D Pain Rep. 2025; 10(2):e1237.

PMID: 39995491 PMC: 11850048. DOI: 10.1097/PR9.0000000000001237.

References

Miller B, Stamatoyannopoulos J . Integrative meta-analysis of differential gene expression in acute myeloid leukemia. PLoS One. 2010; 5(3):e9466. PMC: 2830886. DOI: 10.1371/journal.pone.0009466. View

Reimand J, Kull M, Peterson H, Hansen J, Vilo J . g:Profiler--a web-based toolset for functional profiling of gene lists from large-scale experiments. Nucleic Acids Res. 2007; 35(Web Server issue):W193-200. PMC: 1933153. DOI: 10.1093/nar/gkm226. View

Hu Z, Killion P, Iyer V . Genetic reconstruction of a functional transcriptional regulatory network. Nat Genet. 2007; 39(5):683-7. DOI: 10.1038/ng2012. View

Pihur V, Datta S, Datta S . Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach. Bioinformatics. 2007; 23(13):1607-15. DOI: 10.1093/bioinformatics/btm158. View

Pihur V, Datta S, Datta S . Finding common genes in multiple cancer types through meta-analysis of microarray experiments: a rank aggregation approach. Genomics. 2008; 92(6):400-3. DOI: 10.1016/j.ygeno.2008.05.003. View

Hong E, Balakrishnan R, Dong Q, Christie K, Park J, Binkley G . Gene Ontology annotations at SGD: new data sources and annotation methods. Nucleic Acids Res. 2007; 36(Database issue):D577-81. PMC: 2238894. DOI: 10.1093/nar/gkm909. View

Wirapati P, Sotiriou C, Kunkel S, Farmer P, Pradervand S, Haibe-Kains B . Meta-analysis of gene expression profiles in breast cancer: toward a unified understanding of breast cancer subtyping and prognosis signatures. Breast Cancer Res. 2008; 10(4):R65. PMC: 2575538. DOI: 10.1186/bcr2124. View

Wren J . A global meta-analysis of microarray expression data to predict unknown gene functions and estimate the literature-data divide. Bioinformatics. 2009; 25(13):1694-701. PMC: 2732319. DOI: 10.1093/bioinformatics/btp290. View

Chen X, Xu H, Yuan P, Fang F, Huss M, Vega V . Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell. 2008; 133(6):1106-17. DOI: 10.1016/j.cell.2008.04.043. View

10.

Barrett T, Troup D, Wilhite S, Ledoux P, Rudnev D, Evangelista C . NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res. 2008; 37(Database issue):D885-90. PMC: 2686538. DOI: 10.1093/nar/gkn764. View

11.

Adler P, Kolde R, Kull M, Tkachenko A, Peterson H, Reimand J . Mining for coexpression across hundreds of datasets using novel rank aggregation and visualization methods. Genome Biol. 2009; 10(12):R139. PMC: 2812946. DOI: 10.1186/gb-2009-10-12-r139. View

12.

Lee H, Hsu A, Sajdak J, Qin J, Pavlidis P . Coexpression analysis of human genes across many microarray data sets. Genome Res. 2004; 14(6):1085-94. PMC: 419787. DOI: 10.1101/gr.1910904. View

13.

Cahan P, Rovegno F, Mooney D, Newman J, St Laurent 3rd G, McCaffrey T . Meta-analysis of microarray results: challenges, opportunities, and recommendations for standardization. Gene. 2007; 401(1-2):12-8. PMC: 2111172. DOI: 10.1016/j.gene.2007.06.016. View

14.

de Lichtenberg U, Jensen L, Fausboll A, Jensen T, Bork P, Brunak S . Comparison of computational methods for the identification of cell cycle-regulated genes. Bioinformatics. 2004; 21(7):1164-71. DOI: 10.1093/bioinformatics/bti093. View

15.

Rhodes D, Barrette T, Rubin M, Ghosh D, Chinnaiyan A . Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. Cancer Res. 2002; 62(15):4427-33. View

16.

Boulesteix A, Slawski M . Stability and aggregation of ranked gene lists. Brief Bioinform. 2009; 10(5):556-68. DOI: 10.1093/bib/bbp034. View

17.

DeConde R, Hawley S, Falcon S, Clegg N, Knudsen B, Etzioni R . Combining results of microarray experiments: a rank aggregation approach. Stat Appl Genet Mol Biol. 2006; 5:Article15. DOI: 10.2202/1544-6115.1204. View

18.

Aerts S, Lambrechts D, Maity S, Van Loo P, Coessens B, De Smet F . Gene prioritization through genomic data fusion. Nat Biotechnol. 2006; 24(5):537-44. DOI: 10.1038/nbt1203. View

19.

Larsson O, Sandberg R . Lack of correct data format and comparability limits future integrative microarray research. Nat Biotechnol. 2006; 24(11):1322-3. DOI: 10.1038/nbt1106-1322. View

20.

De Bie T, Tranchevent L, van Oeffelen L, Moreau Y . Kernel-based data fusion for gene prioritization. Bioinformatics. 2007; 23(13):i125-32. DOI: 10.1093/bioinformatics/btm187. View