» Articles » PMID: 37131874

De Novo Reconstruction of Satellite Repeat Units from Sequence Data

Overview
Journal ArXiv
Date 2023 May 3
PMID 37131874
Authors
Affiliations
Soon will be listed here.
Abstract

Satellite DNA are long tandemly repeating sequences in a genome and may be organized as high-order repeats (HORs). They are enriched in centromeres and are challenging to assemble. Existing algorithms for identifying satellite repeats either require the complete assembly of satellites or only work for simple repeat structures without HORs. Here we describe Satellite Repeat Finder (SRF), a new algorithm for reconstructing satellite repeat units and HORs from accurate reads or assemblies without prior knowledge on repeat structures. Applying SRF to real sequence data, we showed that SRF could reconstruct known satellites in human and well-studied model organisms. We also found satellite repeats are pervasive in various other species, accounting for up to 12% of their genome contents but are often underrepresented in assemblies. With the rapid progress on genome sequencing, SRF will help the annotation of new genomes and the study of satellite DNA evolution even if such repeats are not fully assembled.

References
1.
Melters D, Bradnam K, Young H, Telis N, May M, Ruby J . Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution. Genome Biol. 2013; 14(1):R10. PMC: 4053949. DOI: 10.1186/gb-2013-14-1-r10. View

2.
Sinding M, Gopalakrishnan S, Raundrup K, Dalen L, Threlfall J, Gilbert T . The genome sequence of the grey wolf, Linnaeus 1758. Wellcome Open Res. 2021; 6():310. PMC: 8649967. DOI: 10.12688/wellcomeopenres.17332.1. View

3.
Lohse K, Wright C, Talavera G, Garcia-Berro A . The genome sequence of the painted lady, Linnaeus 1758. Wellcome Open Res. 2023; 6:324. PMC: 10061037. DOI: 10.12688/wellcomeopenres.17358.1. View

4.
Naish M, Alonge M, Wlodzimierz P, Tock A, Abramson B, Schmucker A . The genetic and epigenetic landscape of the centromeres. Science. 2021; 374(6569):eabi7489. PMC: 10164409. DOI: 10.1126/science.abi7489. View

5.
Miga K, Newton Y, Jain M, Altemose N, Willard H, Kent W . Centromere reference models for human chromosomes X and Y satellite arrays. Genome Res. 2014; 24(4):697-707. PMC: 3975068. DOI: 10.1101/gr.159624.113. View