» Articles » PMID: 33075050

BinaryCIF and CIFTools-Lightweight, Efficient and Extensible Macromolecular Data Management

Overview
Specialty Biology
Date 2020 Oct 19
PMID 33075050
Citations 21
Authors
Affiliations
Soon will be listed here.
Abstract

3D macromolecular structural data is growing ever more complex and plentiful in the wake of substantive advances in experimental and computational structure determination methods including macromolecular crystallography, cryo-electron microscopy, and integrative methods. Efficient means of working with 3D macromolecular structural data for archiving, analyses, and visualization are central to facilitating interoperability and reusability in compliance with the FAIR Principles. We address two challenges posed by growth in data size and complexity. First, data size is reduced by bespoke compression techniques. Second, complexity is managed through improved software tooling and fully leveraging available data dictionary schemas. To this end, we introduce BinaryCIF, a serialization of Crystallographic Information File (CIF) format files that maintains full compatibility to related data schemas, such as PDBx/mmCIF, while reducing file sizes by more than a factor of two versus gzip compressed CIF files. Moreover, for the largest structures, BinaryCIF provides even better compression-factor ten and four versus CIF files and gzipped CIF files, respectively. Herein, we describe CIFTools, a set of libraries in Java and TypeScript for generic and typed handling of CIF and BinaryCIF files. Together, BinaryCIF and CIFTools enable lightweight, efficient, and extensible handling of 3D macromolecular structural data.

Citing Articles

Free tools for crystallographic symmetry handling and visualization.

de la Flor G, Aroyo M, Gimondi I, Ward S, Momma K, Hanson R J Appl Crystallogr. 2024; 57(Pt 5):1618-1639.

PMID: 39387077 PMC: 11460394. DOI: 10.1107/S1600576724007659.


Mesoscale explorer: Visual exploration of large-scale molecular models.

Rose A, Sehnal D, Goodsell D, Autin L Protein Sci. 2024; 33(10):e5177.

PMID: 39291955 PMC: 11409463. DOI: 10.1002/pro.5177.


Mesoscale Explorer - Visual Exploration of Large-Scale Molecular Models.

Rose A, Sehnal D, Goodsell D, Autin L bioRxiv. 2024; .

PMID: 39282403 PMC: 11398308. DOI: 10.1101/2024.09.02.610826.


Describing and Sharing Molecular Visualizations Using the MolViewSpec Toolkit.

Bittrich S, Midlik A, Varadi M, Velankar S, Burley S, Young J Curr Protoc. 2024; 4(7):e1099.

PMID: 39024028 PMC: 11338654. DOI: 10.1002/cpz1.1099.


Efficient protein structure archiving using ProteStAr.

Deorowicz S, Gudys A Bioinformatics. 2024; 40(7).

PMID: 38984796 PMC: 11239224. DOI: 10.1093/bioinformatics/btae428.


References
1.
Berman H, Henrick K, Nakamura H . Announcing the worldwide Protein Data Bank. Nat Struct Biol. 2003; 10(12):980. DOI: 10.1038/nsb1203-980. View

2.
Adams P, Afonine P, Baskaran K, Berman H, Berrisford J, Bricogne G . Announcing mandatory submission of PDBx/mmCIF format files for crystallographic depositions to the Protein Data Bank (PDB). Acta Crystallogr D Struct Biol. 2019; 75(Pt 4):451-454. PMC: 6465986. DOI: 10.1107/S2059798319004522. View

3.
Westbrook J, Bourne P . STAR/mmCIF: an ontology for macromolecular structure. Bioinformatics. 2000; 16(2):159-68. DOI: 10.1093/bioinformatics/16.2.159. View

4.
Bekker G, Nakamura H, Kinjo A . Molmil: a molecular viewer for the PDB and beyond. J Cheminform. 2016; 8(1):42. PMC: 5002144. DOI: 10.1186/s13321-016-0155-1. View

5.
Burley S, Kurisu G, Markley J, Nakamura H, Velankar S, Berman H . PDB-Dev: a Prototype System for Depositing Integrative/Hybrid Structural Models. Structure. 2017; 25(9):1317-1318. PMC: 5821105. DOI: 10.1016/j.str.2017.08.001. View