» Articles » PMID: 35961013

A Roadmap for the Functional Annotation of Protein Families: a Community Perspective

Abstract

Over the last 25 years, biology has entered the genomic era and is becoming a science of 'big data'. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3-4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.

Citing Articles

An NLP-based method to mine gene and function relationships from published articles.

Kumar N, Mukhtar M Sci Rep. 2025; 15(1):7503.

PMID: 40033048 PMC: 11876572. DOI: 10.1038/s41598-025-91809-z.


Metatranscriptomes-based sequence similarity networks uncover genetic signatures within parasitic freshwater microbial eukaryotes.

Monjot A, Rousseau J, Bittner L, Lepere C Microbiome. 2025; 13(1):43.

PMID: 39915863 PMC: 11800578. DOI: 10.1186/s40168-024-02027-0.


A metric and its derived protein network for evaluation of ortholog database inconsistency.

Yang W, Ji J, Fang G BMC Bioinformatics. 2025; 26(1):6.

PMID: 39773281 PMC: 11707888. DOI: 10.1186/s12859-024-06023-x.


FuncFetch: an LLM-assisted workflow enables mining thousands of enzyme-substrate interactions from published manuscripts.

Smith N, Yuan X, Melissinos C, Moghe G Bioinformatics. 2024; 41(1).

PMID: 39718779 PMC: 11734755. DOI: 10.1093/bioinformatics/btae756.


Domainator, a flexible software suite for domain-based annotation and neighborhood analysis, identifies proteins involved in antiviral systems.

Johnson S, Weigele P, Fomenkov A, Ge A, Vincze A, Eaglesham J Nucleic Acids Res. 2024; 53(2.

PMID: 39657740 PMC: 11754643. DOI: 10.1093/nar/gkae1175.


References
1.
Westbrook J, Young J, Shao C, Feng Z, Guranovic V, Lawson C . PDBx/mmCIF Ecosystem: Foundational Semantic Tools for Structural Biology. J Mol Biol. 2022; 434(11):167599. PMC: 10292674. DOI: 10.1016/j.jmb.2022.167599. View

2.
Bernhofer M, Dallago C, Karl T, Satagopam V, Heinzinger M, Littmann M . PredictProtein - Predicting Protein Structure and Function for 29 Years. Nucleic Acids Res. 2021; 49(W1):W535-W540. PMC: 8265159. DOI: 10.1093/nar/gkab354. View

3.
Goodacre N, Gerloff D, Uetz P . Protein domains of unknown function are essential in bacteria. mBio. 2014; 5(1):e00744-13. PMC: 3884060. DOI: 10.1128/mBio.00744-13. View

4.
Wang T, Shao W, Huang Z, Tang H, Zhang J, Ding Z . MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat Commun. 2021; 12(1):3445. PMC: 8187432. DOI: 10.1038/s41467-021-23774-w. View

5.
Stephens Z, Lee S, Faghri F, Campbell R, Zhai C, Efron M . Big Data: Astronomical or Genomical?. PLoS Biol. 2015; 13(7):e1002195. PMC: 4494865. DOI: 10.1371/journal.pbio.1002195. View