» Articles » PMID: 24532722

Fast_protein_cluster: Parallel and Optimized Clustering of Large-scale Protein Modeling Data

Overview
Journal Bioinformatics
Specialty Biology
Date 2014 Feb 18
PMID 24532722
Citations 4
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: fast_protein_cluster is a fast, parallel and memory efficient package used to cluster 60 000 sets of protein models (with up to 550 000 models per set) generated by the Nutritious Rice for the World project.

Results: fast_protein_cluster is an optimized and extensible toolkit that supports Root Mean Square Deviation after optimal superposition (RMSD) and Template Modeling score (TM-score) as metrics. RMSD calculations using a laptop CPU are 60× faster than qcprot and 3× faster than current graphics processing unit (GPU) implementations. New GPU code further increases the speed of RMSD and TM-score calculations. fast_protein_cluster provides novel k-means and hierarchical clustering methods that are up to 250× and 2000× faster, respectively, than Clusco, and identify significantly more accurate models than Spicker and Clusco.

Availability And Implementation: fast_protein_cluster is written in C++ using OpenMP for multi-threading support. Custom streaming Single Instruction Multiple Data (SIMD) extensions and advanced vector extension intrinsics code accelerate CPU calculations, and OpenCL kernels support AMD and Nvidia GPUs. fast_protein_cluster is available under the M.I.T. license. (http://software.compbio.washington.edu/fast_protein_cluster)

Citing Articles

Fast and accurate protein structure search with Foldseek.

van Kempen M, Kim S, Tumescheit C, Mirdita M, Lee J, Gilchrist C Nat Biotechnol. 2023; 42(2):243-246.

PMID: 37156916 PMC: 10869269. DOI: 10.1038/s41587-023-01773-0.


Polycystin-1 Assembles With Kv Channels to Govern Cardiomyocyte Repolarization and Contractility.

Altamirano F, Schiattarella G, French K, Kim S, Engelberger F, Kyrychenko S Circulation. 2019; 140(11):921-936.

PMID: 31220931 PMC: 6733647. DOI: 10.1161/CIRCULATIONAHA.118.034731.


De novo protein structure prediction using ultra-fast molecular dynamics simulation.

Cheung N, Yu W PLoS One. 2018; 13(11):e0205819.

PMID: 30458007 PMC: 6245515. DOI: 10.1371/journal.pone.0205819.


Histone demethylase KDM5A is regulated by its reader domain through a positive-feedback mechanism.

Ortiz Torres I, Kuchenbecker K, Nnadi C, Fletterick R, Kelly M, Fujimori D Nat Commun. 2015; 6:6204.

PMID: 25686748 PMC: 5080983. DOI: 10.1038/ncomms7204.

References
1.
Zhang Y, Skolnick J . SPICKER: a clustering approach to identify near-native protein folds. J Comput Chem. 2004; 25(6):865-71. DOI: 10.1002/jcc.20011. View

2.
Theobald D . Rapid calculation of RMSDs using a quaternion-based characteristic polynomial. Acta Crystallogr A. 2005; 61(Pt 4):478-80. DOI: 10.1107/S0108767305015266. View

3.
Hung L, Samudrala R . Accelerated protein structure comparison using TM-score-GPU. Bioinformatics. 2012; 28(16):2191-2. PMC: 3413391. DOI: 10.1093/bioinformatics/bts345. View

4.
Jamroz M, Kolinski A . ClusCo: clustering and comparison of protein models. BMC Bioinformatics. 2013; 14:62. PMC: 3645956. DOI: 10.1186/1471-2105-14-62. View