» Articles » PMID: 38234823

CryoVirusDB: A Labeled Cryo-EM Image Dataset for AI-Driven Virus Particle Picking

Overview
Journal bioRxiv
Date 2024 Jan 18
PMID 38234823
Authors
Affiliations
Soon will be listed here.
Abstract

With the advancements in instrumentation, image processing algorithms, and computational capabilities, single-particle electron cryo-microscopy (cryo-EM) has achieved nearly atomic resolution in determining the 3D structures of viruses. The virus structures play a crucial role in studying their biological function and advancing the development of antiviral vaccines and treatments. Despite the effectiveness of artificial intelligence (AI) in general image processing, its development for identifying and extracting virus particles from cryo-EM micrographs (images) has been hindered by the lack of manually labelled high-quality datasets. To fill the gap, we introduce CryoVirusDB, a labeled dataset containing the coordinates of expert-picked virus particles in cryo-EM micrographs. CryoVirusDB comprises 9,941 micrographs of 9 different viruses along with the coordinates of 339,398 labeled virus particles. It can be used to train and test AI and machine learning (e.g., deep learning) methods to accurately identify virus particles in cryo-EM micrographs for building atomic 3D structural models for viruses.

Citing Articles

The Polycomb system sustains promoters in a deep OFF state by limiting pre-initiation complex formation to counteract transcription.

Szczurek A, Dimitrova E, Kelley J, Blackledge N, Klose R Nat Cell Biol. 2024; 26(10):1700-1711.

PMID: 39261718 PMC: 11469961. DOI: 10.1038/s41556-024-01493-w.

References
1.
Castro K, Scheck A, Xiao S, Correia B . Computational design of vaccine immunogens. Curr Opin Biotechnol. 2022; 78:102821. DOI: 10.1016/j.copbio.2022.102821. View

2.
Punjani A, Rubinstein J, Fleet D, Brubaker M . cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat Methods. 2017; 14(3):290-296. DOI: 10.1038/nmeth.4169. View

3.
Chandler-Bostock R, Mata C, Bingham R, Dykeman E, Meng B, Tuthill T . Assembly of infectious enteroviruses depends on multiple, conserved genomic RNA-coat protein contacts. PLoS Pathog. 2020; 16(12):e1009146. PMC: 7793301. DOI: 10.1371/journal.ppat.1009146. View

4.
Ong E, Huang X, Pearce R, Zhang Y, He Y . Computational design of SARS-CoV-2 spike glycoproteins to increase immunogenicity by T cell epitope engineering. Comput Struct Biotechnol J. 2021; 19:518-529. PMC: 7773544. DOI: 10.1016/j.csbj.2020.12.039. View

5.
Pettersen E, Goddard T, Huang C, Meng E, Couch G, Croll T . UCSF ChimeraX: Structure visualization for researchers, educators, and developers. Protein Sci. 2020; 30(1):70-82. PMC: 7737788. DOI: 10.1002/pro.3943. View