IFeature: a Python Package and Web Server for Features Extraction and Selection from Protein and Peptide Sequences

Overview

Journal Bioinformatics

Publisher Oxford University Press

Specialty Biology

Date 2018 Mar 13

PMID 29528364

Citations 199

Authors

Zhen Chen

Pei Zhao

Fuyi Li

Andre Leier

Tatiana T Marquez-Lago

Yanan Wang

Geoffrey I Webb

A Ian Smith

Roger J Daly

Kuo-Chen Chou

Jiangning Song

Affiliations

Soon will be listed here.

Abstract

Summary: Structural and physiochemical descriptors extracted from sequence data have been widely used to represent sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as DNAs/RNAs. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences. iFeature is capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors. It also allows users to extract specific amino acid properties from the AAindex database. Furthermore, iFeature integrates 12 different types of commonly used feature clustering, selection and dimensionality reduction algorithms, greatly facilitating training, analysis and benchmarking of machine-learning models. The functionality of iFeature is made freely available via an online web server and a stand-alone toolkit.

Availability And Implementation: http://iFeature.erc.monash.edu/; https://github.com/Superzchen/iFeature/.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Predicting amyloid proteins using attention-based long short-term memory.

Li Z PeerJ Comput Sci. 2025; 11:e2660.

PMID: 40062260 PMC: 11888867. DOI: 10.7717/peerj-cs.2660.

iAMP-CRA: Identifying Antimicrobial Peptides Using Convolutional Recurrent Neural Network with Self-Attention.

Lu J, He Y, Han G, Zeng L Health Inf Sci Syst. 2025; 13(1):25.

PMID: 40062190 PMC: 11883064. DOI: 10.1007/s13755-025-00342-w.

PyPropel: a Python-based tool for efficiently processing and characterising protein data.

Sun J, Ru J, Cribbs A, Xiong D BMC Bioinformatics. 2025; 26(1):70.

PMID: 40025421 PMC: 11871610. DOI: 10.1186/s12859-025-06079-3.

An optimized deep-forest algorithm using a modified differential evolution optimization algorithm: A case of host-pathogen protein-protein interaction prediction.

Emmanuel J, Isewon I, Oyelade J Comput Struct Biotechnol J. 2025; 27:595-611.

PMID: 39995682 PMC: 11849198. DOI: 10.1016/j.csbj.2025.01.020.

APBIO: bioactive profiling of air pollutants through inferred bioactivity signatures and prediction of novel target interactions.

Viesi E, Perricone U, Aloy P, Giugno R J Cheminform. 2025; 17(1):13.

PMID: 39891207 PMC: 11786462. DOI: 10.1186/s13321-025-00961-1.

References

Dubchak I, Muchnik I, Mayor C, Dralyuk I, Kim S . Recognition of a protein fold in the context of the Structural Classification of Proteins (SCOP) classification. Proteins. 1999; 35(4):401-7. View

Chou K . Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. Biochem Biophys Res Commun. 2000; 278(2):477-83. DOI: 10.1006/bbrc.2000.3815. View

Chou K . Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins. 2001; 43(3):246-55. DOI: 10.1002/prot.1035. View

Bhasin M, Raghava G . Classification of nuclear receptors based on amino acid composition and dipeptide composition. J Biol Chem. 2004; 279(22):23262-6. DOI: 10.1074/jbc.M401932200. View

Chou K, Cai Y . Prediction of protein subcellular locations by GO-FunD-PseAA predictor. Biochem Biophys Res Commun. 2004; 320(4):1236-9. DOI: 10.1016/j.bbrc.2004.06.073. View

Chou K . Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics. 2004; 21(1):10-9. DOI: 10.1093/bioinformatics/bth466. View

Sokal R, Thomson B . Population structure inferred by local spatial autocorrelation: an example from an Amerindian tribal population. Am J Phys Anthropol. 2005; 129(1):121-31. DOI: 10.1002/ajpa.20250. View

Larranaga P, Calvo B, Santana R, Bielza C, Galdiano J, Inza I . Machine learning in bioinformatics. Brief Bioinform. 2006; 7(1):86-112. DOI: 10.1093/bib/bbk007. View

Li Z, Lin H, Han L, Jiang L, Chen X, Chen Y . PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res. 2006; 34(Web Server issue):W32-7. PMC: 1538821. DOI: 10.1093/nar/gkl305. View

10.

Shen J, Zhang J, Luo X, Zhu W, Yu K, Chen K . Predicting protein-protein interactions based only on sequences information. Proc Natl Acad Sci U S A. 2007; 104(11):4337-41. PMC: 1838603. DOI: 10.1073/pnas.0607879104. View

11.

Shen H, Chou K . PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Anal Biochem. 2007; 373(2):386-8. DOI: 10.1016/j.ab.2007.10.012. View

12.

Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M . AAindex: amino acid index database, progress report 2008. Nucleic Acids Res. 2007; 36(Database issue):D202-5. PMC: 2238890. DOI: 10.1093/nar/gkm998. View

13.

Chou K, Shen H . Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms. Nat Protoc. 2008; 3(2):153-62. DOI: 10.1038/nprot.2007.494. View

14.

Tung C, Ho S . Computational identification of ubiquitylation sites from protein sequences. BMC Bioinformatics. 2008; 9:310. PMC: 2488362. DOI: 10.1186/1471-2105-9-310. View

15.

Rottig M, Rausch C, Kohlbacher O . Combining structure and sequence information allows automated prediction of substrate specificities within enzyme families. PLoS Comput Biol. 2010; 6(1):e1000636. PMC: 2796266. DOI: 10.1371/journal.pcbi.1000636. View

16.

Song J, Tan H, Shen H, Mahmood K, Boyd S, Webb G . Cascleave: towards more accurate prediction of caspase substrate cleavage sites. Bioinformatics. 2010; 26(6):752-60. DOI: 10.1093/bioinformatics/btq043. View

17.

Barkan D, Hostetter D, Mahrus S, Pieper U, Wells J, Craik C . Prediction of protease substrates using sequence and structure features. Bioinformatics. 2010; 26(14):1714-22. PMC: 2894511. DOI: 10.1093/bioinformatics/btq267. View

18.

Chou K . Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol. 2010; 273(1):236-47. PMC: 7125570. DOI: 10.1016/j.jtbi.2010.12.024. View

19.

Lee T, Chen S, Hung H, Ou Y . Incorporating distant sequence features and radial basis function networks to identify ubiquitin conjugation sites. PLoS One. 2011; 6(3):e17331. PMC: 3052307. DOI: 10.1371/journal.pone.0017331. View

20.

Rao H, Zhu F, Yang G, Li Z, Chen Y . Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res. 2011; 39(Web Server issue):W385-90. PMC: 3125735. DOI: 10.1093/nar/gkr284. View