Protein Functional Annotation of Simultaneously Improved Stability, Accuracy and False Discovery Rate Achieved by a Sequence-based Deep Learning
Overview
Affiliations
Functional annotation of protein sequence with high accuracy has become one of the most important issues in modern biomedical studies, and computational approaches of significantly accelerated analysis process and enhanced accuracy are greatly desired. Although a variety of methods have been developed to elevate protein annotation accuracy, their ability in controlling false annotation rates remains either limited or not systematically evaluated. In this study, a protein encoding strategy, together with a deep learning algorithm, was proposed to control the false discovery rate in protein function annotation, and its performances were systematically compared with that of the traditional similarity-based and de novo approaches. Based on a comprehensive assessment from multiple perspectives, the proposed strategy and algorithm were found to perform better in both prediction stability and annotation accuracy compared with other de novo methods. Moreover, an in-depth assessment revealed that it possessed an improved capacity of controlling the false discovery rate compared with traditional methods. All in all, this study not only provided a comprehensive analysis on the performances of the newly proposed strategy but also provided a tool for the researcher in the fields of protein function annotation.
An NLP-based method to mine gene and function relationships from published articles.
Kumar N, Mukhtar M Sci Rep. 2025; 15(1):7503.
PMID: 40033048 PMC: 11876572. DOI: 10.1038/s41598-025-91809-z.
Navigating the human-monkeypox virus interactome: HuPoxNET atlas reveals functional insights.
Kataria R, Duhan N, Kaundal R Front Microbiol. 2024; 15:1399555.
PMID: 39155985 PMC: 11327128. DOI: 10.3389/fmicb.2024.1399555.
Data-Driven Synthetic Cell Factories Development for Industrial Biomanufacturing.
Shi Z, Liu P, Liao X, Mao Z, Zhang J, Wang Q Biodes Res. 2023; 2022:9898461.
PMID: 37850146 PMC: 10521697. DOI: 10.34133/2022/9898461.
Recall DNA methylation levels at low coverage sites using a CNN model in WGBS.
Luo X, Wang Y, Zou Q, Xu L PLoS Comput Biol. 2023; 19(6):e1011205.
PMID: 37315069 PMC: 10266633. DOI: 10.1371/journal.pcbi.1011205.
Spiers A, Dorfmueller H, Jerdan R, McGregor J, Nicoll A, Steel K PLoS One. 2023; 18(6):e0286540.
PMID: 37267309 PMC: 10237404. DOI: 10.1371/journal.pone.0286540.