» Articles » PMID: 19486509

Tandem and Cryptic Amino Acid Repeats Accumulate in Disordered Regions of Proteins

Overview
Journal Genome Biol
Specialties Biology
Genetics
Date 2009 Jun 3
PMID 19486509
Citations 53
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Amino acid repeats (AARs) are common features of protein sequences. They often evolve rapidly and are involved in a number of human diseases. They also show significant associations with particular Gene Ontology (GO) functional categories, particularly transcription, suggesting they play some role in protein function. It has been suggested recently that AARs play a significant role in the evolution of intrinsically unstructured regions (IURs) of proteins. We investigate the relationship between AAR frequency and evolution and their localization within proteins based on a set of 5,815 orthologous proteins from four mammalian (human, chimpanzee, mouse and rat) and a bird (chicken) genome. We consider two classes of AAR (tandem repeats and cryptic repeats: regions of proteins containing overrepresentations of short amino acid repeats).

Results: Mammals show very similar repeat frequencies but chicken shows lower frequencies of many of the cryptic repeats common in mammals. Regions flanking tandem AARs evolve more rapidly than the rest of the protein containing the repeat and this phenomenon is more pronounced for non-conserved repeats than for conserved ones. GO associations are similar to those previously described for the mammals, but chicken cryptic repeats show fewer significant associations. Comparing the overlaps of AARs with IURs and protein domains showed that up to 96% of some AAR types are associated preferentially with IURs. However, no more than 15% of IURs contained an AAR.

Conclusions: Their location within IURs explains many of the evolutionary properties of AARs. Further study is needed on the types of IURs containing AARs.

Citing Articles

Teleost genomic repeat landscapes in light of diversification rates and ecology.

Reinar W, Torresen O, Nederbragt A, Matschiner M, Jentoft S, Jakobsen K Mob DNA. 2023; 14(1):14.

PMID: 37789366 PMC: 10546739. DOI: 10.1186/s13100-023-00302-9.


Adaptive protein evolution through length variation of short tandem repeats in .

Reinar W, Greulich A, Sto I, Knutsen J, Reitan T, Torresen O Sci Adv. 2023; 9(12):eadd6960.

PMID: 36947624 PMC: 10032594. DOI: 10.1126/sciadv.add6960.


Expansion and functional analysis of the SR-related protein family across the domains of life.

Cascarina S, Ross E RNA. 2022; 28(10):1298-1314.

PMID: 35863866 PMC: 9479744. DOI: 10.1261/rna.079170.122.


LCD-Composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains.

Cascarina S, King D, Nishimura E, Ross E NAR Genom Bioinform. 2021; 3(2):lqab048.

PMID: 34056598 PMC: 8153834. DOI: 10.1093/nargab/lqab048.


Homopeptide and homocodon levels across fungi are coupled to GC/AT-bias and intrinsic disorder, with unique behaviours for some amino acids.

Wang Y, Harrison P Sci Rep. 2021; 11(1):10025.

PMID: 33976321 PMC: 8113271. DOI: 10.1038/s41598-021-89650-1.


References
1.
Lise S, Jones D . Sequence patterns associated with disordered regions in proteins. Proteins. 2004; 58(1):144-50. DOI: 10.1002/prot.20279. View

2.
Alba M, Santibanez-Koref M, Hancock J . The comparative genomics of polyglutamine repeats: extreme differences in the codon organization of repeat-encoding regions between mammals and Drosophila. J Mol Evol. 2001; 52(3):249-59. DOI: 10.1007/s002390010153. View

3.
Al-Shahrour F, Minguez P, Tarraga J, Medina I, Alloza E, Montaner D . FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments. Nucleic Acids Res. 2007; 35(Web Server issue):W91-6. PMC: 1933151. DOI: 10.1093/nar/gkm260. View

4.
Pinto M, Lobe C . Products of the grg (Groucho-related gene) family can dimerize through the amino-terminal Q domain. J Biol Chem. 1996; 271(51):33026-31. DOI: 10.1074/jbc.271.51.33026. View

5.
Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R . InterProScan: protein domains identifier. Nucleic Acids Res. 2005; 33(Web Server issue):W116-20. PMC: 1160203. DOI: 10.1093/nar/gki442. View