Spfy: an Integrated Graph Database for Real-time Prediction of Bacterial Phenotypes and Downstream Comparative Analyses

Overview

Journal Database (Oxford)

Specialty Biology

Date 2018 Sep 14

PMID 30212910

Citations 3

Authors

Kevin K Le

Matthew D Whiteside

James E Hopkins

Victor P J Gannon

Chad R Laing

Affiliations

Soon will be listed here.

Abstract

Public health laboratories are currently moving to whole-genome sequence (WGS)-based analyses, and require the rapid prediction of standard reference laboratory methods based solely on genomic data. Currently, these predictive genomics tasks rely on workflows that chain together multiple programs for the requisite analyses. While useful, these systems do not store the analyses in a genome-centric way, meaning the same analyses are often re-computed for the same genomes. To solve this problem, we created Spfy, a platform that rapidly performs the common reference laboratory tests, uses a graph database to store and retrieve the results from the computational workflows and links data to individual genomes using standardized ontologies. The Spfy platform facilitates rapid phenotype identification, as well as the efficient storage and downstream comparative analysis of tens of thousands of genome sequences. Though generally applicable to bacterial genome sequences, Spfy currently contains 10 243 Escherichia coli genomes, for which in-silico serotype and Shiga-toxin subtype, as well as the presence of known virulence factors and antimicrobial resistance determinants have been computed. Additionally, the presence/absence of the entire E. coli pan-genome was computed and linked to each genome. Owing to its database of diverse pre-computed results, and the ability to easily incorporate user data, Spfy facilitates hypothesis testing in fields ranging from population genomics to epidemiology, while mitigating the re-computation of analyses. The graph approach of Spfy is flexible, and can accommodate new analysis software modules as they are developed, easily linking new results to those already stored. Spfy provides a database and analyses approach for E. coli that is able to match the rapid accumulation of WGS data in public databases.

Citing Articles

An overview of graph databases and their applications in the biomedical domain.

Timon-Reina S, Rincon M, Martinez-Tomas R Database (Oxford). 2021; 2021.

PMID: 34003247 PMC: 8130509. DOI: 10.1093/database/baab026.

Assessing the genomic relatedness and evolutionary rates of persistent verotoxigenic serotypes within a closed beef herd in Canada.

Wang L, Jokinen C, Laing C, Johnson R, Ziebell K, Gannon V Microb Genom. 2020; 6(6).

PMID: 32496181 PMC: 7371104. DOI: 10.1099/mgen.0.000376.

Formal Medical Knowledge Representation Supports Deep Learning Algorithms, Bioinformatics Pipelines, Genomics Data Analysis, and Big Data Processes.

Dhombres F, Charlet J Yearb Med Inform. 2019; 28(1):152-155.

PMID: 31419827 PMC: 6697514. DOI: 10.1055/s-0039-1677933.

References

Wattam A, Davis J, Assaf R, Boisvert S, Brettin T, Bun C . Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center. Nucleic Acids Res. 2016; 45(D1):D535-D542. PMC: 5210524. DOI: 10.1093/nar/gkw1017. View

Goecks J, Nekrutenko A, Taylor J . Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010; 11(8):R86. PMC: 2945788. DOI: 10.1186/gb-2010-11-8-r86. View

Seemann T . Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014; 30(14):2068-9. DOI: 10.1093/bioinformatics/btu153. View

Hunt M, Mather A, Sanchez-Buso L, Page A, Parkhill J, Keane J . ARIBA: rapid antimicrobial resistance genotyping directly from sequencing reads. Microb Genom. 2017; 3(10):e000131. PMC: 5695208. DOI: 10.1099/mgen.0.000131. View

Carrillo C, Koziol A, Mathews A, Goji N, Lambert D, Huszczynski G . Comparative Evaluation of Genomic and Laboratory Approaches for Determination of Shiga Toxin Subtypes in Escherichia coli. J Food Prot. 2017; 79(12):2078-2085. DOI: 10.4315/0362-028X.JFP-16-228. View

Thomsen M, Ahrenfeldt J, Cisneros J, Jurtz V, Larsen M, Hasman H . A Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance. PLoS One. 2016; 11(6):e0157718. PMC: 4915688. DOI: 10.1371/journal.pone.0157718. View

Swaminathan B, Barrett T, Hunter S, Tauxe R . PulseNet: the molecular subtyping network for foodborne bacterial disease surveillance, United States. Emerg Infect Dis. 2001; 7(3):382-9. PMC: 2631779. DOI: 10.3201/eid0703.010303. View

Lytsy B, Engstrand L, Gustafsson A, Kaden R . Time to review the gold standard for genotyping vancomycin-resistant enterococci in epidemiology: Comparing whole-genome sequencing with PFGE and MLST in three suspected outbreaks in Sweden during 2013-2015. Infect Genet Evol. 2017; 54:74-80. DOI: 10.1016/j.meegid.2017.06.010. View

McArthur A, Waglechner N, Nizam F, Yan A, Azad M, Baylay A . The comprehensive antibiotic resistance database. Antimicrob Agents Chemother. 2013; 57(7):3348-57. PMC: 3697360. DOI: 10.1128/AAC.00419-13. View

10.

Wang K, Yuen S, Xu J, Lee S, Yan H, Shi S . Whole-genome sequencing and comprehensive molecular profiling identify new driver mutations in gastric cancer. Nat Genet. 2014; 46(6):573-82. DOI: 10.1038/ng.2983. View

11.

Laing C, Buchanan C, Taboada E, Zhang Y, Kropinski A, Villegas A . Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions. BMC Bioinformatics. 2010; 11:461. PMC: 2949892. DOI: 10.1186/1471-2105-11-461. View

12.

Ingle D, Valcanis M, Kuzevski A, Tauschek M, Inouye M, Stinear T . serotyping of from short read data identifies limited novel O-loci but extensive diversity of O:H serotype combinations within and between pathogenic lineages. Microb Genom. 2017; 2(7):e000064. PMC: 5343136. DOI: 10.1099/mgen.0.000064. View

13.

Yuen R, Thiruvahindrapuram B, Merico D, Walker S, Tammimies K, Hoang N . Whole-genome sequencing of quartet families with autism spectrum disorder. Nat Med. 2015; 21(2):185-91. DOI: 10.1038/nm.3792. View

14.

Whiteside M, Gannon V, Laing C . Phylotyper: in silico predictor of gene subtypes. Bioinformatics. 2017; 33(22):3638-3641. PMC: 5870578. DOI: 10.1093/bioinformatics/btx459. View

15.

Willig L, Petrikin J, Smith L, Saunders C, Thiffault I, Miller N . Whole-genome sequencing for identification of Mendelian disorders in critically ill infants: a retrospective analysis of diagnostic and clinical findings. Lancet Respir Med. 2015; 3(5):377-87. PMC: 4479194. DOI: 10.1016/S2213-2600(15)00139-3. View

16.

Naccache S, Federman S, Veeraraghavan N, Zaharia M, Lee D, Samayoa E . A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res. 2014; 24(7):1180-92. PMC: 4079973. DOI: 10.1101/gr.171934.113. View

17.

Schatz M . Biological data sciences in genome research. Genome Res. 2015; 25(10):1417-22. PMC: 4579325. DOI: 10.1101/gr.191684.115. View

18.

Ronholm J, Nasheri N, Petronella N, Pagotto F . Navigating Microbiological Food Safety in the Era of Whole-Genome Sequencing. Clin Microbiol Rev. 2016; 29(4):837-57. PMC: 5010751. DOI: 10.1128/CMR.00056-16. View

19.

Lambert D, Carrillo C, Koziol A, Manninger P, Blais B . GeneSippr: a rapid whole-genome approach for the identification and characterization of foodborne pathogens such as priority Shiga toxigenic Escherichia coli. PLoS One. 2015; 10(4):e0122928. PMC: 4393293. DOI: 10.1371/journal.pone.0122928. View

20.

Vaz C, Francisco A, Silva M, Jolley K, Bray J, Pouseele H . TypOn: the microbial typing ontology. J Biomed Semantics. 2015; 5(1):43. PMC: 4290098. DOI: 10.1186/2041-1480-5-43. View