» Articles » PMID: 20053844

CD-HIT Suite: a Web Server for Clustering and Comparing Biological Sequences

Overview
Journal Bioinformatics
Specialty Biology
Date 2010 Jan 8
PMID 20053844
Citations 1110
Authors
Affiliations
Soon will be listed here.
Abstract

Unlabelled: CD-HIT is a widely used program for clustering and comparing large biological sequence datasets. In order to further assist the CD-HIT users, we significantly improved this program with more functions and better accuracy, scalability and flexibility. Most importantly, we developed a new web server, CD-HIT Suite, for clustering a user-uploaded sequence dataset or comparing it to another dataset at different identity levels. Users can now interactively explore the clusters within web browsers. We also provide downloadable clusters for several public databases (NCBI NR, Swissprot and PDB) at different identity levels.

Availability: Free access at http://cd-hit.org

Citing Articles

The proteotranscriptomic characterization of venom in the white seafan elucidates the evolution of Octocorallia arsenal.

Modica M, Leone S, Gerdol M, Greco S, Aurelle D, Oliverio M Open Biol. 2025; 15(3):250015.

PMID: 40068811 PMC: 11896702. DOI: 10.1098/rsob.250015.


MurG as a potential target of quercetin in Staphylococcus aureus supported by evidence from subtractive proteomics and molecular dynamics.

Goswami D, Prajapati J, Dabhi M, Sharkey L, Pidot S Sci Rep. 2025; 15(1):7309.

PMID: 40025069 PMC: 11873250. DOI: 10.1038/s41598-025-90395-4.


zol and fai: large-scale targeted detection and evolutionary investigation of gene clusters.

Salamzade R, Tran P, Martin C, Manson A, Gilmore M, Earl A Nucleic Acids Res. 2025; 53(3).

PMID: 39907107 PMC: 11795205. DOI: 10.1093/nar/gkaf045.


Extensive paralogism in the environmental pangenome: a key factor in the ecological success of natural SAR11 populations.

Molina-Pardines C, Haro-Moreno J, Rodriguez-Valera F, Lopez-Perez M Microbiome. 2025; 13(1):41.

PMID: 39905490 PMC: 11796062. DOI: 10.1186/s40168-025-02037-6.


Prediction of potential drug targets and key inhibitors (ZINC67974679, ZINC67982856, and ZINC05668040) against using integrated computational approaches.

Rahman S, Liu H, Shah M, Almutairi M, Liaqat I, Tanaka T Front Vet Sci. 2025; 11:1507496.

PMID: 39885844 PMC: 11780677. DOI: 10.3389/fvets.2024.1507496.


References
1.
Li W, Godzik A . Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006; 22(13):1658-9. DOI: 10.1093/bioinformatics/btl158. View

2.
Suzek B, Huang H, McGarvey P, Mazumder R, Wu C . UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007; 23(10):1282-8. DOI: 10.1093/bioinformatics/btm098. View

3.
Yooseph S, Sutton G, Rusch D, Halpern A, Williamson S, Remington K . The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol. 2007; 5(3):e16. PMC: 1821046. DOI: 10.1371/journal.pbio.0050016. View

4.
Li W, Jaroszewski L, Godzik A . Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics. 2002; 18(1):77-82. DOI: 10.1093/bioinformatics/18.1.77. View

5.
Letunic I, Doerks T, Bork P . SMART 6: recent updates and new developments. Nucleic Acids Res. 2008; 37(Database issue):D229-32. PMC: 2686533. DOI: 10.1093/nar/gkn808. View