SIMAP--the Database of All-against-all Protein Sequence Similarities and Annotations with New Interfaces and Increased Coverage
Overview
Affiliations
The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith-Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads.
Protein-Coding Gene Families in Prokaryote Genome Comparisons.
Carhuaricra-Huaman D, Setubal J Methods Mol Biol. 2024; 2802:33-55.
PMID: 38819555 DOI: 10.1007/978-1-0716-3838-5_2.
Cracking the black box of deep sequence-based protein-protein interaction prediction.
Bernett J, Blumenthal D, List M Brief Bioinform. 2024; 25(2).
PMID: 38446741 PMC: 10939362. DOI: 10.1093/bib/bbae076.
Cytoscape stringApp 2.0: Analysis and Visualization of Heterogeneous Biological Networks.
Doncheva N, Morris J, Holze H, Kirsch R, Nastou K, Cuesta-Astroz Y J Proteome Res. 2022; 22(2):637-646.
PMID: 36512705 PMC: 9904289. DOI: 10.1021/acs.jproteome.2c00651.
eggNOG 6.0: enabling comparative genomics across 12 535 organisms.
Hernandez-Plaza A, Szklarczyk D, Botas J, Cantalapiedra C, Giner-Lamia J, Mende D Nucleic Acids Res. 2022; 51(D1):D389-D394.
PMID: 36399505 PMC: 9825578. DOI: 10.1093/nar/gkac1022.
Ten Years of Collaborative Progress in the Quest for Orthologs.
Linard B, Ebersberger I, McGlynn S, Glover N, Mochizuki T, Patricio M Mol Biol Evol. 2021; 38(8):3033-3045.
PMID: 33822172 PMC: 8321534. DOI: 10.1093/molbev/msab098.