» Articles » PMID: 16731699

Cd-hit: a Fast Program for Clustering and Comparing Large Sets of Protein or Nucleotide Sequences

Overview
Journal Bioinformatics
Specialty Biology
Date 2006 May 30
PMID 16731699
Citations 4866
Authors
Affiliations
Soon will be listed here.
Abstract

In 2001 and 2002, we published two papers (Bioinformatics, 17, 282-283, Bioinformatics, 18, 77-82) describing an ultrafast protein sequence clustering program called cd-hit. This program can efficiently cluster a huge protein database with millions of sequences. However, the applications of the underlying algorithm are not limited to only protein sequences clustering, here we present several new programs using the same algorithm including cd-hit-2d, cd-hit-est and cd-hit-est-2d. Cd-hit-2d compares two protein datasets and reports similar matches between them; cd-hit-est clusters a DNA/RNA sequence database and cd-hit-est-2d compares two nucleotide datasets. All these programs can handle huge datasets with millions of sequences and can be hundreds of times faster than methods based on the popular sequence comparison and database search tools, such as BLAST.

Citing Articles

EBAX-1/ZSWIM8 destabilizes miRNAs, resulting in transgenerational inheritance of a predatory trait.

Quiobe S, Kalirad A, Roseler W, Witte H, Wang Y, Rodelsperger C Sci Adv. 2025; 11(11):eadu0875.

PMID: 40073139 PMC: 11900880. DOI: 10.1126/sciadv.adu0875.


Development and patterning of a highly versatile visual system in spiders.

Baudouin Gonzalez L, Schonauer A, Harper A, Arif S, Leite D, Steinhoff P Proc Biol Sci. 2025; 292(2042):20242069.

PMID: 40068820 PMC: 11896711. DOI: 10.1098/rspb.2024.2069.


Resolving phylogenetic conflicts in Pandanales: the dual roles of gene flow and whole-genome duplication.

Shi T, He J Front Plant Sci. 2025; 16:1511582.

PMID: 40065784 PMC: 11891173. DOI: 10.3389/fpls.2025.1511582.


Discovery of the widespread site-specific single-stranded nuclease family Ssn.

Chenal M, Rivera-Millot A, Harrison L, Khairalla A, Nieves C, Bernet E Nat Commun. 2025; 16(1):2388.

PMID: 40064889 PMC: 11893778. DOI: 10.1038/s41467-025-57514-1.


What defines a photosynthetic microbial mat in western Antarctica?.

Mercado-Juarez R, Valdespino-Castillo P, Merino Ibarra M, Batista S, Mac Cormack W, Ruberto L PLoS One. 2025; 20(3):e0315919.

PMID: 40043057 PMC: 11882083. DOI: 10.1371/journal.pone.0315919.