» Articles » PMID: 24451626

InterProScan 5: Genome-scale Protein Function Classification

Abstract

Robust large-scale sequence analysis is a major challenge in modern genomic science, where biologists are frequently trying to characterize many millions of sequences. Here, we describe a new Java-based architecture for the widely used protein function prediction software package InterProScan. Developments include improvements and additions to the outputs of the software and the complete reimplementation of the software framework, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis. InterProScan is freely available for download from the EMBl-EBI FTP site and the open source code is hosted at Google Code.

Citing Articles

Chromosome-level haplotype-resolved genome assembly of bread wheat's wild relative Aegilops mutica.

Grewal S, Yang C, Krasheninnikova K, Collins J, Wood J, Ashling S Sci Data. 2025; 12(1):438.

PMID: 40082453 PMC: 11906796. DOI: 10.1038/s41597-025-04737-y.


Enhancing sweet sorghum emergence and stress resilience in saline-alkaline soils through ABA seed priming: insights into hormonal and metabolic reprogramming.

Yang J, Zhang W, Wang T, Xu J, Wang J, Huang J BMC Genomics. 2025; 26(1):241.

PMID: 40075293 PMC: 11905452. DOI: 10.1186/s12864-025-11420-4.


Evolutionary genomics reveals variation in structure and genetic content implicated in virulence and lifestyle in the genus Gaeumannomyces.

Hill R, Grey M, Fedi M, Smith D, Canning G, Ward S BMC Genomics. 2025; 26(1):239.

PMID: 40075289 PMC: 11905480. DOI: 10.1186/s12864-025-11432-0.


A chromosomal-level genome assembly of Begonia fimbristipula (Begoniaceae).

Xiao T, Wang Z, Yan H Sci Data. 2025; 12(1):429.

PMID: 40074751 PMC: 11904028. DOI: 10.1038/s41597-025-04768-5.


The assembly and annotation of two teinturier grapevine varieties, Dakapo and Rubired.

Ritter E, Cochetel N, Minio A, Cousins P, Cantu D, Niederhuth C GigaByte. 2025; 2025:gigabyte149.

PMID: 40065997 PMC: 11891882. DOI: 10.46471/gigabyte.149.


References
1.
Letunic I, Doerks T, Bork P . SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res. 2011; 40(Database issue):D302-5. PMC: 3245027. DOI: 10.1093/nar/gkr931. View

2.
Wu C, Nikolskaya A, Huang H, Yeh L, Natale D, Vinayaka C . PIRSF: family classification system at the Protein Information Resource. Nucleic Acids Res. 2003; 32(Database issue):D112-4. PMC: 308831. DOI: 10.1093/nar/gkh097. View

3.
Petersen T, Brunak S, von Heijne G, Nielsen H . SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods. 2011; 8(10):785-6. DOI: 10.1038/nmeth.1701. View

4.
Eddy S . A new generation of homology search tools based on probabilistic inference. Genome Inform. 2010; 23(1):205-11. View

5.
Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J . Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000; 25(1):25-9. PMC: 3037419. DOI: 10.1038/75556. View