» Articles » PMID: 24165881

SIMAP--the Database of All-against-all Protein Sequence Similarities and Annotations with New Interfaces and Increased Coverage

Overview
Specialty Biochemistry
Date 2013 Oct 30
PMID 24165881
Citations 14
Authors
Affiliations
Soon will be listed here.
Abstract

The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith-Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads.

Citing Articles

Protein-Coding Gene Families in Prokaryote Genome Comparisons.

Carhuaricra-Huaman D, Setubal J Methods Mol Biol. 2024; 2802:33-55.

PMID: 38819555 DOI: 10.1007/978-1-0716-3838-5_2.


Cracking the black box of deep sequence-based protein-protein interaction prediction.

Bernett J, Blumenthal D, List M Brief Bioinform. 2024; 25(2).

PMID: 38446741 PMC: 10939362. DOI: 10.1093/bib/bbae076.


Cytoscape stringApp 2.0: Analysis and Visualization of Heterogeneous Biological Networks.

Doncheva N, Morris J, Holze H, Kirsch R, Nastou K, Cuesta-Astroz Y J Proteome Res. 2022; 22(2):637-646.

PMID: 36512705 PMC: 9904289. DOI: 10.1021/acs.jproteome.2c00651.


eggNOG 6.0: enabling comparative genomics across 12 535 organisms.

Hernandez-Plaza A, Szklarczyk D, Botas J, Cantalapiedra C, Giner-Lamia J, Mende D Nucleic Acids Res. 2022; 51(D1):D389-D394.

PMID: 36399505 PMC: 9825578. DOI: 10.1093/nar/gkac1022.


Ten Years of Collaborative Progress in the Quest for Orthologs.

Linard B, Ebersberger I, McGlynn S, Glover N, Mochizuki T, Patricio M Mol Biol Evol. 2021; 38(8):3033-3045.

PMID: 33822172 PMC: 8321534. DOI: 10.1093/molbev/msab098.


References
1.
Altschul S, Wootton J, Gertz E, Agarwala R, Morgulis A, Schaffer A . Protein database searches using compositionally adjusted substitution matrices. FEBS J. 2005; 272(20):5101-9. PMC: 1343503. DOI: 10.1111/j.1742-4658.2005.04945.x. View

2.
Safran M, Dalah I, Alexander J, Rosen N, Iny Stein T, Shmoish M . GeneCards Version 3: the human gene integrator. Database (Oxford). 2010; 2010:baq020. PMC: 2938269. DOI: 10.1093/database/baq020. View

3.
Tatusov R, Fedorova N, Jackson J, Jacobs A, Kiryutin B, Koonin E . The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003; 4:41. PMC: 222959. DOI: 10.1186/1471-2105-4-41. View

4.
Kersey P, Staines D, Lawson D, Kulesha E, Derwent P, Humphrey J . Ensembl Genomes: an integrative resource for genome-scale data from non-vertebrate species. Nucleic Acids Res. 2011; 40(Database issue):D91-7. PMC: 3245118. DOI: 10.1093/nar/gkr895. View

5.
Frickey T, Lupas A . CLANS: a Java application for visualizing protein families based on pairwise similarity. Bioinformatics. 2004; 20(18):3702-4. DOI: 10.1093/bioinformatics/bth444. View