» Articles » PMID: 12441382

Sequence Conserved for Subcellular Localization

Overview
Journal Protein Sci
Specialty Biochemistry
Date 2002 Nov 21
PMID 12441382
Citations 61
Authors
Affiliations
Soon will be listed here.
Abstract

The more proteins diverged in sequence, the more difficult it becomes for bioinformatics to infer similarities of protein function and structure from sequence. The precise thresholds used in automated genome annotations depend on the particular aspect of protein function transferred by homology. Here, we presented the first large-scale analysis of the relation between sequence similarity and identity in subcellular localization. Three results stood out: (1) The subcellular compartment is generally more conserved than what might have been expected given that short sequence motifs like nuclear localization signals can alter the native compartment; (2) the sequence conservation of localization is similar between different compartments; and (3) it is similar to the conservation of structure and enzymatic activity. In particular, we found the transition between the regions of conserved and nonconserved localization to be very sharp, although the thresholds for conservation were less well defined than for structure and enzymatic activity. We found that a simple measure for sequence similarity accounting for pairwise sequence identity and alignment length, the HSSP distance, distinguished accurately between protein pairs of identical and different localizations. In fact, BLAST expectation values outperformed the HSSP distance only for alignments in the subtwilight zone. We succeeded in slightly improving the accuracy of inferring localization through homology by fine tuning the thresholds. Finally, we applied our results to the entire SWISS-PROT database and five entirely sequenced eukaryotes.

Citing Articles

Wheat E3 ligase is involved in drought stress tolerance in transgenic .

Hong M, Ko C, Kim D Physiol Mol Biol Plants. 2025; 31(2):233-246.

PMID: 40070538 PMC: 11890807. DOI: 10.1007/s12298-025-01557-7.


A Review for Artificial Intelligence Based Protein Subcellular Localization.

Xiao H, Zou Y, Wang J, Wan S Biomolecules. 2024; 14(4).

PMID: 38672426 PMC: 11048326. DOI: 10.3390/biom14040409.


Protein Sorting Prediction.

Nielsen H Methods Mol Biol. 2023; 2715:27-63.

PMID: 37930519 DOI: 10.1007/978-1-0716-3445-5_2.


HAR_Locator: a novel protein subcellular location prediction model of immunohistochemistry images based on hybrid attention modules and residual units.

Zou K, Wang S, Wang Z, Zhang Z, Yang F Front Mol Biosci. 2023; 10:1171429.

PMID: 37664182 PMC: 10470064. DOI: 10.3389/fmolb.2023.1171429.


Computational methods for protein localization prediction.

Jiang Y, Wang D, Wang W, Xu D Comput Struct Biotechnol J. 2021; 19:5834-5844.

PMID: 34765098 PMC: 8564054. DOI: 10.1016/j.csbj.2021.10.023.


References
1.
Reinhardt A, Hubbard T . Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res. 1998; 26(9):2230-6. PMC: 147531. DOI: 10.1093/nar/26.9.2230. View

2.
Rost B . Review: protein secondary structure prediction continues to rise. J Struct Biol. 2001; 134(2-3):204-18. DOI: 10.1006/jsbi.2001.4336. View

3.
Ashburner M, Drysdale R . FlyBase--the Drosophila genetic database. Development. 1994; 120(7):2077-9. DOI: 10.1242/dev.120.7.2077. View

4.
Koonin E . Bridging the gap between sequence and function. Trends Genet. 2000; 16(1):16. DOI: 10.1016/s0168-9525(99)01927-7. View

5.
Chothia C, Lesk A . The relation between the divergence of sequence and structure in proteins. EMBO J. 1986; 5(4):823-6. PMC: 1166865. DOI: 10.1002/j.1460-2075.1986.tb04288.x. View