An Interconnected Data Infrastructure to Support Large-scale Rare Disease Research

The Solve-RD project brings together clinicians, scientists, and patient representatives from 51 institutes spanning 15 countries to collaborate on genetically diagnosing ("solving") rare diseases (RDs). The project aims to significantly increase the diagnostic success rate by co-analyzing data from thousands of RD cases, including phenotypes, pedigrees, exome/genome sequencing, and multiomics data. Here we report on the data infrastructure devised and created to support this co-analysis. This infrastructure enables users to store, find, connect, and analyze data and metadata in a collaborative manner. Pseudonymized phenotypic and raw experimental data are submitted to the RD-Connect Genome-Phenome Analysis Platform and processed through standardized pipelines. Resulting files and novel produced omics data are sent to the European Genome-Phenome Archive, which adds unique file identifiers and provides long-term storage and controlled access services. MOLGENIS "RD3" and Café Variome "Discovery Nexus" connect data and metadata and offer discovery services, and secure cloud-based "Sandboxes" support multiparty data analysis. This successfully deployed and useful infrastructure design provides a blueprint for other projects that need to analyze large amounts of heterogeneous data.

Citing Articles

Genomic reanalysis of a pan-European rare-disease resource yields new diagnoses.

Laurie S, Steyaert W, de Boer E, Polavarapu K, Schuermans N, Sommer A Nat Med. 2025; 31(2):478-489.

PMID: 39825153 PMC: 11835725. DOI: 10.1038/s41591-024-03420-w.

References

Boycott K, Azzariti D, Hamosh A, Rehm H . Seven years since the launch of the Matchmaker Exchange: The evolution of genomic matchmaking. Hum Mutat. 2022; 43(6):659-667. PMC: 9133175. DOI: 10.1002/humu.24373. View

Swertz M, Dijkstra M, Adamusiak T, van der Velde J, Kanterakis A, Roos E . The MOLGENIS toolkit: rapid prototyping of biosoftware at the push of a button. BMC Bioinformatics. 2011; 11 Suppl 12:S12. PMC: 3040526. DOI: 10.1186/1471-2105-11-S12-S12. View

Matalonga L, Hernandez-Ferrer C, Piscia D, Schule R, Synofzik M, Topf A . Solving patients with rare diseases through programmatic reanalysis of genome-phenome data. Eur J Hum Genet. 2021; 29(9):1337-1347. PMC: 8440686. DOI: 10.1038/s41431-021-00852-7. View

Cook C, Bergman M, Finn R, Cochrane G, Birney E, Apweiler R . The European Bioinformatics Institute in 2016: Data growth and integration. Nucleic Acids Res. 2015; 44(D1):D20-6. PMC: 4702932. DOI: 10.1093/nar/gkv1352. View

Wilkinson M, Dumontier M, Aalbersberg I, Appleton G, Axton M, Baak A . The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016; 3:160018. PMC: 4792175. DOI: 10.1038/sdata.2016.18. View

Danecek P, Bonfield J, Liddle J, Marshall J, Ohan V, Pollard M . Twelve years of SAMtools and BCFtools. Gigascience. 2021; 10(2). PMC: 7931819. DOI: 10.1093/gigascience/giab008. View

Martin A, Williams E, Foulger R, Leigh S, Daugherty L, Niblock O . PanelApp crowdsources expert knowledge to establish consensus diagnostic gene panels. Nat Genet. 2019; 51(11):1560-1565. DOI: 10.1038/s41588-019-0528-2. View

Lancaster O, Beck T, Atlan D, Swertz M, Thangavelu D, Veal C . Cafe Variome: general-purpose software for making genotype-phenotype data discoverable in restricted or open access contexts. Hum Mutat. 2015; 36(10):957-64. DOI: 10.1002/humu.22841. View

Kavianpour S, Sutherland J, Mansouri-Benssassi E, Coull N, Jefferson E . Next-Generation Capabilities in Trusted Research Environments: Interview Study. J Med Internet Res. 2022; 24(9):e33720. PMC: 9533202. DOI: 10.2196/33720. View

10.

Laurie S, Fernandez-Callejo M, Marco-Sola S, Trotta J, Camps J, Chacon A . From Wet-Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing. Hum Mutat. 2016; 37(12):1263-1271. PMC: 5129537. DOI: 10.1002/humu.23114. View

11.

Zurek B, Ellwanger K, Vissers L, Schule R, Synofzik M, Topf A . Solve-RD: systematic pan-European data sharing and collaborative analysis to solve rare diseases. Eur J Hum Genet. 2021; 29(9):1325-1331. PMC: 8440542. DOI: 10.1038/s41431-021-00859-0. View

12.

Fabregat A, Jupe S, Matthews L, Sidiropoulos K, Gillespie M, Garapati P . The Reactome Pathway Knowledgebase. Nucleic Acids Res. 2017; 46(D1):D649-D655. PMC: 5753187. DOI: 10.1093/nar/gkx1132. View

13.

Fiume M, Cupak M, Keenan S, Rambla J, de la Torre S, Dyke S . Federated discovery and sharing of genomic data using Beacons. Nat Biotechnol. 2019; 37(3):220-224. PMC: 6728157. DOI: 10.1038/s41587-019-0046-x. View

14.

Rambla J, Baudis M, Ariosa R, Beck T, Fromont L, Navarro A . Beacon v2 and Beacon networks: A "lingua franca" for federated data discovery in biomedical genomics, and beyond. Hum Mutat. 2022; 43(6):791-799. PMC: 9322265. DOI: 10.1002/humu.24369. View

15.

Laurie S, Piscia D, Matalonga L, Corvo A, Fernandez-Callejo M, Garcia-Linares C . The RD-Connect Genome-Phenome Analysis Platform: Accelerating diagnosis, research, and gene discovery for rare diseases. Hum Mutat. 2022; 43(6):717-733. PMC: 9324157. DOI: 10.1002/humu.24353. View

16.

McLaren W, Gil L, Hunt S, Riat H, Ritchie G, Thormann A . The Ensembl Variant Effect Predictor. Genome Biol. 2016; 17(1):122. PMC: 4893825. DOI: 10.1186/s13059-016-0974-4. View

17.

Amberger J, Bocchini C, Schiettecatte F, Scott A, Hamosh A . OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 2014; 43(Database issue):D789-98. PMC: 4383985. DOI: 10.1093/nar/gku1205. View

18.

Jacobsen J, Baudis M, Baynam G, Beckmann J, Beltran S, Buske O . The GA4GH Phenopacket schema defines a computable representation of clinical data. Nat Biotechnol. 2022; 40(6):817-820. PMC: 9363006. DOI: 10.1038/s41587-022-01357-4. View

19.

van der Velde K, Singh G, Kaliyaperumal R, Liao X, de Ridder S, Rebers S . FAIR Genomes metadata schema promoting Next Generation Sequencing data reuse in Dutch healthcare and research. Sci Data. 2022; 9(1):169. PMC: 9008059. DOI: 10.1038/s41597-022-01265-x. View

20.

Karczewski K, Francioli L, Tiao G, Cummings B, Alfoldi J, Wang Q . The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020; 581(7809):434-443. PMC: 7334197. DOI: 10.1038/s41586-020-2308-7. View