Finding and Characterizing Repeats in Plant Genomes

Overview

Journal Methods Mol Biol

Specialty Molecular Biology

Date 2022 Jan 17

PMID 35037215

Authors

Jacques Nicolas

Sebastien Tempel

Anna-Sophie Fiston-Lavier

Emira Cherif

Affiliations

Soon will be listed here.

Abstract

Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of the available software that can help biologists to scan automatically for these repeats in sequence data or check hypothetical models intended to characterize their structures. Since transposable elements (TEs) are a major source of repeats in plants, many methods have been used or developed for this broad class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided two sections on this topic (for the analysis of genomes or directly of sequenced reads), as well as a selection of the main existing software. It may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of an efficient search for repeats and more complex patterns. We first introduce the key concepts of the art of indexing and mapping or querying sequences. We end the chapter with the more prospective issue of building models of repeat families. We present the Machine Learning approach first, seeking to build predictors automatically for some families of ET, from a set of sequences known to belong to this family. A second approach, the linguistic (or syntactic) approach, allows biologists to describe themselves and check the validity of models of their favorite repeat family.

Citing Articles

Repetitive DNA sequence detection and its role in the human genome.

Liao X, Zhu W, Zhou J, Li H, Xu X, Zhang B Commun Biol. 2023; 6(1):954.

PMID: 37726397 PMC: 10509279. DOI: 10.1038/s42003-023-05322-y.

Methodologies for the Discovery of Transposable Element Families.

Storer J, Hubley R, Rosen J, Smit A Genes (Basel). 2022; 13(4).

PMID: 35456515 PMC: 9025800. DOI: 10.3390/genes13040709.

References

Barghini E, Natali L, Cossu R, Giordani T, Pindo M, Cattonaro F . The peculiar landscape of repetitive sequences in the olive (Olea europaea L.) genome. Genome Biol Evol. 2014; 6(4):776-91. PMC: 4007544. DOI: 10.1093/gbe/evu058. View

Lim K, Keong Kwoh C, Hsu L, Wirawan A . Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance. Brief Bioinform. 2012; 14(1):67-81. DOI: 10.1093/bib/bbs023. View

Harris R, Cechova M, Makova K . Noise-cancelling repeat finder: uncovering tandem repeats in error-prone long-read sequencing data. Bioinformatics. 2019; 35(22):4809-4811. PMC: 6853708. DOI: 10.1093/bioinformatics/btz484. View

Warburton P, Giordano J, Cheung F, Gelfand Y, Benson G . Inverted repeat structure of the human genome: the X-chromosome contains a preponderance of large, highly homologous inverted repeats that contain testes genes. Genome Res. 2004; 14(10A):1861-9. PMC: 524409. DOI: 10.1101/gr.2542904. View

Shortt J, Ruggiero R, Cox C, Wacholder A, Pollock D . Finding and extending ancient simple sequence repeat-derived regions in the human genome. Mob DNA. 2020; 11:11. PMC: 7027126. DOI: 10.1186/s13100-020-00206-y. View

Wang Y, Huang J . Lirex: A Package for Identification of Long Inverted Repeats in Genomes. Genomics Proteomics Bioinformatics. 2017; 15(2):141-146. PMC: 5414712. DOI: 10.1016/j.gpb.2017.01.005. View

Ye C, Ji G, Li L, Liang C . detectIR: a novel program for detecting perfect and imperfect inverted repeats using complex numbers and vector calculation. PLoS One. 2014; 9(11):e113349. PMC: 4237412. DOI: 10.1371/journal.pone.0113349. View

Nakamura K, Oshima T, Morimoto T, Ikeda S, Yoshikawa H, Shiwa Y . Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res. 2011; 39(13):e90. PMC: 3141275. DOI: 10.1093/nar/gkr344. View

Jurka J, Kapitonov V, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J . Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005; 110(1-4):462-7. DOI: 10.1159/000084979. View

10.

Spannagl M, Nussbaumer T, Bader K, Martis M, Seidel M, Kugler K . PGSB PlantsDB: updates to the database framework for comparative plant genome research. Nucleic Acids Res. 2015; 44(D1):D1141-7. PMC: 4702821. DOI: 10.1093/nar/gkv1130. View

11.

Ouyang S, Buell C . The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants. Nucleic Acids Res. 2003; 32(Database issue):D360-3. PMC: 308833. DOI: 10.1093/nar/gkh099. View

12.

Bousios A, Minga E, Kalitsou N, Pantermali M, Tsaballa A, Darzentas N . MASiVEdb: the Sirevirus Plant Retrotransposon Database. BMC Genomics. 2012; 13:158. PMC: 3414828. DOI: 10.1186/1471-2164-13-158. View

13.

Chen J, Hu Q, Zhang Y, Lu C, Kuang H . P-MITE: a database for plant miniature inverted-repeat transposable elements. Nucleic Acids Res. 2013; 42(Database issue):D1176-81. PMC: 3964958. DOI: 10.1093/nar/gkt1000. View

14.

Amselem J, Cornut G, Choisne N, Alaux M, Alfama-Depauw F, Jamilloux V . RepetDB: a unified resource for transposable element references. Mob DNA. 2019; 10:6. PMC: 6350395. DOI: 10.1186/s13100-019-0150-y. View

15.

Zhang R, Ge F, Li H, Chen Y, Zhao Y, Gao Y . PCIR: a database of Plant Chloroplast Inverted Repeats. Database (Oxford). 2019; 2019. PMC: 6835207. DOI: 10.1093/database/baz127. View

16.

Xu H, Zhang H, Xia T, Han M, Shen Y, Zhang Z . BmTEdb: a collective database of transposable elements in the silkworm genome. Database (Oxford). 2013; 2013:bat055. PMC: 3722987. DOI: 10.1093/database/bat055. View

17.

Li S, Zhang G, Zhang X, Yuan J, Deng C, Gu L . DPTEdb, an integrative database of transposable elements in dioecious plants. Database (Oxford). 2016; 2016. PMC: 4865326. DOI: 10.1093/database/baw078. View

18.

Ma B, Li T, Xiang Z, He N . MnTEdb, a collective resource for mulberry transposable elements. Database (Oxford). 2015; 2015. PMC: 4343074. DOI: 10.1093/database/bav004. View

19.

Du J, Grant D, Tian Z, Nelson R, Zhu L, Shoemaker R . SoyTEdb: a comprehensive database of transposable elements in the soybean genome. BMC Genomics. 2010; 11:113. PMC: 2830986. DOI: 10.1186/1471-2164-11-113. View

20.

Chan A, Pertea G, Cheung F, Lee D, Zheng L, Whitelaw C . The TIGR Maize Database. Nucleic Acids Res. 2005; 34(Database issue):D771-6. PMC: 1347435. DOI: 10.1093/nar/gkj072. View