» Articles » PMID: 26977803

REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads

Overview
Journal PLoS One
Date 2016 Mar 16
PMID 26977803
Citations 28
Authors
Affiliations
Soon will be listed here.
Abstract

Repeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this problem we develop a new method, REPdenovo, which assembles repeat sequences directly from raw shotgun sequencing data. REPdenovo can construct various types of repeats that are highly repetitive and have low sequence divergence within copies. We show that REPdenovo is substantially better than existing methods both in terms of the number and the completeness of the repeat sequences that it recovers. The key advantage of REPdenovo is that it can reconstruct long repeats from sequence reads. We apply the method to human data and discover a number of potentially new repeats sequences that have been missed by previous repeat annotations. Many of these sequences are incorporated into various parasite genomes, possibly because the filtering process for host DNA involved in the sequencing of the parasite genomes failed to exclude the host derived repeat sequences. REPdenovo is a new powerful computational tool for annotating genomes and for addressing questions regarding the evolution of repeat families. The software tool, REPdenovo, is available for download at https://github.com/Reedwarbler/REPdenovo.

Citing Articles

Study of Dispersed Repeats in the Genome.

Rudenko V, Korotkov E Int J Mol Sci. 2024; 25(8).

PMID: 38674025 PMC: 11050394. DOI: 10.3390/ijms25084441.


Centuries of genome instability and evolution in soft-shell clam, Mya arenaria, bivalve transmissible neoplasia.

Hart S, Yonemitsu M, Giersch R, Garrett F, Beal B, Arriagada G Nat Cancer. 2023; 4(11):1561-1574.

PMID: 37783804 PMC: 10663159. DOI: 10.1038/s43018-023-00643-7.


Repetitive DNA sequence detection and its role in the human genome.

Liao X, Zhu W, Zhou J, Li H, Xu X, Zhang B Commun Biol. 2023; 6(1):954.

PMID: 37726397 PMC: 10509279. DOI: 10.1038/s42003-023-05322-y.


Genome assembly composition of the String "ACGT" array: a review of data structure accuracy and performance challenges.

Magdy Mohamed Abdelaziz Barakat S, Sallehuddin R, Yuhaniz S, Khairuddin R, Mahmood Y PeerJ Comput Sci. 2023; 9:e1180.

PMID: 37547391 PMC: 10403225. DOI: 10.7717/peerj-cs.1180.


Twinkle twinkle brittle star: the draft genome of Ophioderma brevispinum (Echinodermata: Ophiuroidea) as a resource for regeneration research.

Mashanov V, Jacob Machado D, Reid R, Brouwer C, Kofsky J, Janies D BMC Genomics. 2022; 23(1):574.

PMID: 35953768 PMC: 9367165. DOI: 10.1186/s12864-022-08750-y.


References
1.
Batzer M, Deininger P . Alu repeats and human genomic diversity. Nat Rev Genet. 2002; 3(5):370-9. DOI: 10.1038/nrg798. View

2.
Berlin K, Koren S, Chin C, Drake J, Landolin J, Phillippy A . Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotechnol. 2015; 33(6):623-30. DOI: 10.1038/nbt.3238. View

3.
Myers E . Toward simplifying and accurately formulating fragment assembly. J Comput Biol. 1995; 2(2):275-90. DOI: 10.1089/cmb.1995.2.275. View

4.
SanMiguel P, Tikhonov A, Jin Y, Motchoulskaia N, Zakharov D, Melake-Berhan A . Nested retrotransposons in the intergenic regions of the maize genome. Science. 1996; 274(5288):765-8. DOI: 10.1126/science.274.5288.765. View

5.
Kazazian Jr H, Moran J . The impact of L1 retrotransposons on the human genome. Nat Genet. 1998; 19(1):19-24. DOI: 10.1038/ng0598-19. View