» Articles » PMID: 39524510

Comprehensive Identification and Characterization of Simple Sequence Repeats Based on the Whole-genome Sequences of 14 Forest and Fruit Trees

Overview
Date 2024 Nov 11
PMID 39524510
Authors
Affiliations
Soon will be listed here.
Abstract

Simple sequence repeats (SSRs) are popular and important molecular markers that exist widely in plants. Here, we conducted a comprehensive identification and comparative analysis of SSRs in 14 tree species. A total of 16, 298 SSRs were identified from 429, 449 genes, and primers were successfully designed for 99.44% of the identified SSRs. Our analysis indicated that tri-nucleotide SSRs were the most abundant, with an average of ~834 per species. Functional enrichment analysis by combining SSR-containing genes in all species, revealed 50 significantly enriched terms, with most belonging to transcription factor families associated with plant development and abiotic stresses such as Myeloblastosis_DNA-bind_4 (Myb_DNA-bind_4), APETALA2 (AP2), and Fantastic Four meristem regulator (FAF). Further functional enrichment analysis showed that 48 terms related to abiotic stress regulation and floral development were significantly enriched in ten species, whereas no significantly enriched terms were found in four species. Interestingly, the largest number of enriched terms was detected in (L.) Osbeck, accounting for 54.17% of all significantly enriched functional terms. Finally, we analyzed AP2 and trihelix gene families (Myb_DNA-bind_4) due to their significant enrichment in SSR-containing genes. The results indicated that whole-genome duplication (WGD) and whole genome triplication (WGT) might have played major roles in the expansion of the AP2 gene family but only slightly affected the expansion of the trihelix gene family during evolution. In conclusion, the identification and comprehensive characterization of SSR markers will greatly facilitate future comparative genomics and functional genomics studies.

References
1.
El-Gebali S, Mistry J, Bateman A, Eddy S, Luciani A, Potter S . The Pfam protein families database in 2019. Nucleic Acids Res. 2018; 47(D1):D427-D432. PMC: 6324024. DOI: 10.1093/nar/gky995. View

2.
Zalapa J, Cuevas H, Zhu H, Steffan S, Senalik D, Zeldin E . Using next-generation sequencing approaches to isolate simple sequence repeat (SSR) loci in the plant sciences. Am J Bot. 2011; 99(2):193-208. DOI: 10.3732/ajb.1100394. View

3.
Beier S, Thiel T, Munch T, Scholz U, Mascher M . MISA-web: a web server for microsatellite prediction. Bioinformatics. 2017; 33(16):2583-2585. PMC: 5870701. DOI: 10.1093/bioinformatics/btx198. View

4.
Zhang L, Yuan D, Yu S, Li Z, Cao Y, Miao Z . Preference of simple sequence repeats in coding and non-coding regions of Arabidopsis thaliana. Bioinformatics. 2004; 20(7):1081-6. DOI: 10.1093/bioinformatics/bth043. View

5.
Chen C, Chen H, Zhang Y, Thomas H, Frank M, He Y . TBtools: An Integrative Toolkit Developed for Interactive Analyses of Big Biological Data. Mol Plant. 2020; 13(8):1194-1202. DOI: 10.1016/j.molp.2020.06.009. View