» Articles » PMID: 12487631

Species-specific Protein Sequence and Fold Optimizations

Overview
Publisher Biomed Central
Specialty Biology
Date 2002 Dec 19
PMID 12487631
Citations 3
Authors
Affiliations
Soon will be listed here.
Abstract

Background: An organism's ability to adapt to its particular environmental niche is of fundamental importance to its survival and proliferation. In the largest study of its kind, we sought to identify and exploit the amino-acid signatures that make species-specific protein adaptation possible across 100 complete genomes.

Results: Environmental niche was determined to be a significant factor in variability from correspondence analysis using the amino acid composition of over 360,000 predicted open reading frames (ORFs) from 17 archaea, 76 bacteria and 7 eukaryote complete genomes. Additionally, we found clusters of phylogenetically unrelated archaea and bacteria that share similar environments by amino acid composition clustering. Composition analyses of conservative, domain-based homology modeling suggested an enrichment of small hydrophobic residues Ala, Gly, Val and charged residues Asp, Glu, His and Arg across all genomes. However, larger aromatic residues Phe, Trp and Tyr are reduced in folds, and these results were not affected by low complexity biases. We derived two simple log-odds scoring functions from ORFs (CG) and folds (CF) for each of the complete genomes. CF achieved an average cross-validation success rate of 85 +/- 8% whereas the CG detected 73 +/- 9% species-specific sequences when competing against all other non-redundant CG. Continuously updated results are available at http://genome.mshri.on.ca.

Conclusion: Our analysis of amino acid compositions from the complete genomes provides stronger evidence for species-specific and environmental residue preferences in genomic sequences as well as in folds. Scoring functions derived from this work will be useful in future protein engineering experiments and possibly in identifying horizontal transfer events.

Citing Articles

Curating COBRA Models of Microbial Metabolism.

Navid A Methods Mol Biol. 2021; 2349:321-338.

PMID: 34719001 DOI: 10.1007/978-1-0716-1585-0_14.


Evolution of complete proteomes: guanine-cytosine pressure, phylogeny and environmental influences blend the proteomic architecture.

Chen W, Shao Y, Chen F BMC Evol Biol. 2013; 13:219.

PMID: 24088322 PMC: 3850711. DOI: 10.1186/1471-2148-13-219.


Global analysis of predicted proteomes: functional adaptation of physical properties.

Knight C, Kassen R, Hebestreit H, Rainey P Proc Natl Acad Sci U S A. 2004; 101(22):8390-5.

PMID: 15150418 PMC: 420404. DOI: 10.1073/pnas.0307270101.

References
1.
Wilkins M, Pasquali C, Appel R, Ou K, Golaz O, Sanchez J . From proteins to proteomes: large scale protein identification by two-dimensional electrophoresis and amino acid analysis. Biotechnology (N Y). 1996; 14(1):61-5. DOI: 10.1038/nbt0196-61. View

2.
Audia J, Webb C, FOSTER J . Breaking through the acid barrier: an orchestrated response to proton stress by enteric bacteria. Int J Med Microbiol. 2001; 291(2):97-106. DOI: 10.1078/1438-4221-00106. View

3.
May B, Zhang Q, Li L, Paustian M, Whittam T, Kapur V . Complete genomic sequence of Pasteurella multocida, Pm70. Proc Natl Acad Sci U S A. 2001; 98(6):3460-5. PMC: 30675. DOI: 10.1073/pnas.051634598. View

4.
Kowalski J, Kelly R, Konisky J, Clark D, Wittrup K . Purification and functional characterization of a chaperone from Methanococcus jannaschii. Syst Appl Microbiol. 1998; 21(2):173-8. DOI: 10.1016/S0723-2020(98)80021-0. View

5.
Wheeler D, Church D, Lash A, Leipe D, Madden T, Pontius J . Database resources of the National Center for Biotechnology Information: 2002 update. Nucleic Acids Res. 2001; 30(1):13-6. PMC: 99094. DOI: 10.1093/nar/30.1.13. View