» Articles » PMID: 29036270

On Expert Curation and Scalability: UniProtKB/Swiss-Prot As a Case Study

Overview
Journal Bioinformatics
Specialty Biology
Date 2017 Oct 17
PMID 29036270
Citations 63
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Biological knowledgebases, such as UniProtKB/Swiss-Prot, constitute an essential component of daily scientific research by offering distilled, summarized and computable knowledge extracted from the literature by expert curators. While knowledgebases play an increasingly important role in the scientific community, their ability to keep up with the growth of biomedical literature is under scrutiny. Using UniProtKB/Swiss-Prot as a case study, we address this concern via multiple literature triage approaches.

Results: With the assistance of the PubTator text-mining tool, we tagged more than 10 000 articles to assess the ratio of papers relevant for curation. We first show that curators read and evaluate many more papers than they curate, and that measuring the number of curated publications is insufficient to provide a complete picture as demonstrated by the fact that 8000-10 000 papers are curated in UniProt each year while curators evaluate 50 000-70 000 papers per year. We show that 90% of the papers in PubMed are out of the scope of UniProt, that a maximum of 2-3% of the papers indexed in PubMed each year are relevant for UniProt curation, and that, despite appearances, expert curation in UniProt is scalable.

Availability And Implementation: UniProt is freely available at http://www.uniprot.org/.

Contact: sylvain.poux@sib.swiss.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Unveiling the Microbial Signatures of Arabica Coffee Cherries: Insights into Ripeness Specific Diversity, Functional Traits, and Implications for Quality and Safety.

Tenea G, Cifuentes V, Reyes P, Cevallos-Vallejos M Foods. 2025; 14(4).

PMID: 40002058 PMC: 11854473. DOI: 10.3390/foods14040614.


Assembly and annotation of a chromosome-level reference genome for the endangered Colorado pikeminnow (Ptychocheilus lucius).

Mussmann S G3 (Bethesda). 2024; 14(11).

PMID: 39268723 PMC: 11540322. DOI: 10.1093/g3journal/jkae217.


Morphology, behavior, and phylogenomics of Oxytoxum lohmannii, Dinoflagellata.

Cooney E, Jacobson D, Wolfe G, Bright K, Saldarriaga J, Keeling P J Eukaryot Microbiol. 2024; 71(6):e13050.

PMID: 39019843 PMC: 11603288. DOI: 10.1111/jeu.13050.


The origin, evolution, and molecular diversity of the chemokine system.

Aleotti A, Goulty M, Lewis C, Giorgini F, Feuda R Life Sci Alliance. 2024; 7(3).

PMID: 38228369 PMC: 10792014. DOI: 10.26508/lsa.202302471.


Beneficial probiotic bacteria prevalence in different lactating dromedary camel milk of Saudi Arabia.

Sheikh A, Mohamed Ibrahim H, Almathen F, Alfattah M, Khalifa A Saudi J Biol Sci. 2023; 31(1):103879.

PMID: 38090133 PMC: 10711163. DOI: 10.1016/j.sjbs.2023.103879.


References
1.
Zimmerman C, Lin Y, Leib D, Guo L, Huey E, Daly G . Thirst neurons anticipate the homeostatic consequences of eating and drinking. Nature. 2016; 537(7622):680-684. PMC: 5161740. DOI: 10.1038/nature18950. View

2.
Karp P . How much does curation cost?. Database (Oxford). 2016; 2016. PMC: 4976296. DOI: 10.1093/database/baw110. View

3.
Oxenoid K, Dong Y, Cao C, Cui T, Sancak Y, Markhard A . Architecture of the mitochondrial calcium uniporter. Nature. 2016; 533(7602):269-73. PMC: 4874835. DOI: 10.1038/nature17656. View

4.
Oliver S, Lock A, Harris M, Nurse P, Wood V . Model organism databases: essential resources that need the support of both funders and users. BMC Biol. 2016; 14:49. PMC: 4918006. DOI: 10.1186/s12915-016-0276-z. View

5.
Bengtsson-Palme J, Boulund F, Edstrom R, Feizi A, Johnning A, Jonsson V . Strategies to improve usability and preserve accuracy in biological sequence databases. Proteomics. 2016; 16(18):2454-60. DOI: 10.1002/pmic.201600034. View