» Articles » PMID: 19473879

Supervised Machine Learning Algorithms for Protein Structure Classification

Overview
Publisher Elsevier
Date 2009 May 29
PMID 19473879
Citations 15
Authors
Affiliations
Soon will be listed here.
Abstract

We explore automation of protein structural classification using supervised machine learning methods on a set of 11,360 pairs of protein domains (up to 35% sequence identity) consisting of three secondary structure elements. Fifteen algorithms from five categories of supervised algorithms are evaluated for their ability to learn for a pair of protein domains, the deepest common structural level within the SCOP hierarchy, given a one-dimensional representation of the domain structures. This representation encapsulates evolutionary information in terms of sequence identity and structural information characterising the secondary structure elements and lengths of the respective domains. The evaluation is performed in two steps, first selecting the best performing base learners and subsequently evaluating boosted and bagged meta learners. The boosted random forest, a collection of decision trees, is found to be the most accurate, with a cross-validated accuracy of 97.0% and F-measures of 0.97, 0.85, 0.93 and 0.98 for classification of proteins to the Class, Fold, Super-Family and Family levels in the SCOP hierarchy. The meta learning regime, especially boosting, improved performance by more accurately classifying the instances from less populated classes.

Citing Articles

A Machine Learning Approach to Identify C Type Lectin Domain (CTLD) Containing Proteins.

Singh L, Singh S, Singh D Protein J. 2024; 43(4):718-725.

PMID: 39068630 DOI: 10.1007/s10930-024-10224-x.


A plasma miRNA-based classifier for small cell lung cancer diagnosis.

Saviana M, Romano G, McElroy J, Nigita G, Distefano R, Toft R Front Oncol. 2023; 13:1255527.

PMID: 37869089 PMC: 10585112. DOI: 10.3389/fonc.2023.1255527.


A Comparison of Deep Learning Techniques for Arterial Blood Pressure Prediction.

Paviglianiti A, Randazzo V, Villata S, Cirrincione G, Pasero E Cognit Comput. 2021; 14(5):1689-1710.

PMID: 34466163 PMC: 8391010. DOI: 10.1007/s12559-021-09910-0.


Method Superior to Traditional Spectral Identification: FT-NIR Two-Dimensional Correlation Spectroscopy Combined with Deep Learning to Identify the Shelf Life of Fresh .

Wang L, Li J, Li T, Liu H, Wang Y ACS Omega. 2021; 6(30):19665-19674.

PMID: 34368554 PMC: 8340397. DOI: 10.1021/acsomega.1c02317.


Nondestructive 3D Image Analysis Pipeline to Extract Rice Grain Traits Using X-Ray Computed Tomography.

Hu W, Zhang C, Jiang Y, Huang C, Liu Q, Xiong L Plant Phenomics. 2020; 2020:3414926.

PMID: 33313550 PMC: 7706343. DOI: 10.34133/2020/3414926.