» Articles » PMID: 39769208

MHTAPred-SS: A Highly Targeted Autoencoder-Driven Deep Multi-Task Learning Framework for Accurate Protein Secondary Structure Prediction

Overview
Journal Int J Mol Sci
Publisher MDPI
Date 2025 Jan 8
PMID 39769208
Authors
Affiliations
Soon will be listed here.
Abstract

Accurate protein secondary structure prediction (PSSP) plays a crucial role in biopharmaceutics and disease diagnosis. Current prediction methods are mainly based on multiple sequence alignment (MSA) encoding and collaborative operations of diverse networks. However, existing encoding approaches lead to poor feature space utilization, and encoding quality decreases with fewer homologous proteins. Moreover, the performance of simple stacked networks is greatly limited by feature extraction capabilities and learning strategies. To this end, we propose MHTAPred-SS, a novel PSSP framework based on the fusion of six features, including the embedding feature derived from a pre-trained protein language model. First, we propose a highly targeted autoencoder (HTA) as the driver to encode sequences in a homologous protein-independent manner. Second, under the guidance of biological knowledge, we design a protein secondary structure prediction model based on the multi-task learning strategy (PSSP-MTL). Experimental results on six independent test sets show that MHTAPred-SS achieves state-of-the-art performance, with values of 88.14%, 84.89%, 78.74% and 77.15% for Q3, SOV3, Q8 and SOV8 metrics on the TEST2016 dataset, respectively. Additionally, we demonstrate that MHTAPred-SS has significant advantages in single-category and boundary secondary structure prediction, and can finely capture the distribution of secondary structure segments, thereby contributing to subsequent tasks.

References
1.
Akoglu H . User's guide to correlation coefficients. Turk J Emerg Med. 2018; 18(3):91-93. PMC: 6107969. DOI: 10.1016/j.tjem.2018.08.001. View

2.
Martin J, Gibrat J, Rodolphe F . Analysis of an optimal hidden Markov model for secondary structure prediction. BMC Struct Biol. 2006; 6:25. PMC: 1769381. DOI: 10.1186/1472-6807-6-25. View

3.
Hanson J, Paliwal K, Litfin T, Yang Y, Zhou Y . Improving prediction of protein secondary structure, backbone angles, solvent accessibility and contact numbers by using predicted contact maps and an ensemble of recurrent and residual convolutional neural networks. Bioinformatics. 2018; 35(14):2403-2410. DOI: 10.1093/bioinformatics/bty1006. View

4.
Wuthrich K . Protein structure determination in solution by nuclear magnetic resonance spectroscopy. Science. 1989; 243(4887):45-50. DOI: 10.1126/science.2911719. View

5.
Mirdita M, Schutze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M . ColabFold: making protein folding accessible to all. Nat Methods. 2022; 19(6):679-682. PMC: 9184281. DOI: 10.1038/s41592-022-01488-1. View