» Articles » PMID: 22127870

The Pfam Protein Families Database

Abstract

Pfam is a widely used database of protein families, currently containing more than 13,000 manually curated protein families as of release 26.0. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over the last 2 years, we have generated 1840 new families and increased coverage of the UniProt Knowledgebase (UniProtKB) to nearly 80%. Notably, we have taken the step of opening up the annotation of our families to the Wikipedia community, by linking Pfam families to relevant Wikipedia pages and encouraging the Pfam and Wikipedia communities to improve and expand those pages. We continue to improve the Pfam website and add new visualizations, such as the 'sunburst' representation of taxonomic distribution of families. In this work we additionally address two topics that will be of particular interest to the Pfam community. First, we explain the definition and use of family-specific, manually curated gathering thresholds. Second, we discuss some of the features of domains of unknown function (also known as DUFs), which constitute a rapidly growing class of families within Pfam.

Citing Articles

Contrastive Learning Enables Epitope Overlap Predictions for Targeted Antibody Discovery.

Holt C, Janke A, Amlashi P, Jamieson P, Marinov T, Georgiev I bioRxiv. 2025; .

PMID: 40060439 PMC: 11888244. DOI: 10.1101/2025.02.25.640114.


Ultrasound-activated nano-oxygen sensitizer for sonodynamic-radiotherapy of esophageal cancer.

Liu J, Shi M, Zhao H, Bai X, Lin Q, Guan X Nanoscale Adv. 2025; .

PMID: 40007570 PMC: 11848934. DOI: 10.1039/d5na00042d.


The chromosome-level genome provides insights into the adaptive evolution of the visual system in Oratosquilla oratoria.

Zhang D, Sun X, Chen L, Lin L, Yin C, Yang W BMC Biol. 2025; 23(1):38.

PMID: 39915724 PMC: 11804072. DOI: 10.1186/s12915-025-02146-6.


Cyclin-dependent kinases (CDKs) are key genes regulating early development of Neptunea arthritica cumingii: evidence from comparative transcriptome and proteome analyses.

Lv F, Ge X, Chang Y, Hao Z BMC Genomics. 2024; 25(1):1221.

PMID: 39701993 PMC: 11660575. DOI: 10.1186/s12864-024-10970-3.


MsDUF3700 overexpression enhances aluminum tolerance in alfalfa shoots.

Cao J, Wang T, Yu D, He J, Qian W, Tang B Plant Cell Rep. 2024; 43(12):301.

PMID: 39630276 DOI: 10.1007/s00299-024-03385-7.


References
1.
Finn R, Mistry J, Tate J, Coggill P, Heger A, Pollington J . The Pfam protein families database. Nucleic Acids Res. 2009; 38(Database issue):D211-22. PMC: 2808889. DOI: 10.1093/nar/gkp985. View

2.
Gomez T, Billadeau D . A FAM21-containing WASH complex regulates retromer-dependent sorting. Dev Cell. 2009; 17(5):699-711. PMC: 2803077. DOI: 10.1016/j.devcel.2009.09.009. View

3.
Hunter S, Apweiler R, Attwood T, Bairoch A, Bateman A, Binns D . InterPro: the integrative protein signature database. Nucleic Acids Res. 2008; 37(Database issue):D211-5. PMC: 2686546. DOI: 10.1093/nar/gkn785. View

4.
Cantarel B, Coutinho P, Rancurel C, Bernard T, Lombard V, Henrissat B . The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics. Nucleic Acids Res. 2008; 37(Database issue):D233-8. PMC: 2686590. DOI: 10.1093/nar/gkn663. View

5.
Sathiyamoorthy K, Mills E, Franzmann T, Rosenshine I, Saper M . The crystal structure of Escherichia coli group 4 capsule protein GfcC reveals a domain organization resembling that of Wza. Biochemistry. 2011; 50(24):5465-76. DOI: 10.1021/bi101869h. View