» Articles » PMID: 39841812

Standardized Pipelines Support and Facilitate Integration of Diverse Datasets at the Rat Genome Database

Abstract

The Rat Genome Database (RGD) is a multispecies knowledgebase which integrates genetic, multiomic, phenotypic, and disease data across 10 mammalian species. To support cross-species, multiomics studies and to enhance and expand on data manually extracted from the biomedical literature by the RGD team of expert curators, RGD imports and integrates data from multiple sources. These include major databases and a substantial number of domain-specific resources, as well as direct submissions by individual researchers. The incorporation of these diverse datatypes is handled by a growing list of automated import, export, data processing, and quality control pipelines. This article outlines the development over time of a standardized infrastructure for automated RGD pipelines with a summary of key design decisions and a focus on lessons learned.

References
1.
Raney B, Barber G, Benet-Pages A, Casper J, Clawson H, Cline M . The UCSC Genome Browser database: 2024 update. Nucleic Acids Res. 2023; 52(D1):D1082-D1088. PMC: 10767968. DOI: 10.1093/nar/gkad987. View

2.
Park C, Bello S, Smith C, Hu Z, Munzenmaier D, Nigam R . The Vertebrate Trait Ontology: a controlled vocabulary for the annotation of trait data across species. J Biomed Semantics. 2013; 4(1):13. PMC: 3851175. DOI: 10.1186/2041-1480-4-13. View

3.
Fischer S . Throw Science to the Dogs: The best models for human disease may just be right under scientists' noses--if not in their laps. IEEE Pulse. 2015; 6(5):16-9. DOI: 10.1109/MPUL.2015.2456591. View

4.
Munro D, Wang T, Chitre A, Polesskaya O, Ehsan N, Gao J . The regulatory landscape of multiple brain regions in outbred heterogeneous stock rats. Nucleic Acids Res. 2022; 50(19):10882-10895. PMC: 9638908. DOI: 10.1093/nar/gkac912. View

5.
Shimoyama M, Smith J, De Pons J, Tutaj M, Khampang P, Hong W . The Chinchilla Research Resource Database: resource for an otolaryngology disease model. Database (Oxford). 2016; 2016. PMC: 4865329. DOI: 10.1093/database/baw073. View