» Articles » PMID: 22124482

Repetitive DNA and Next-generation Sequencing: Computational Challenges and Solutions

Overview
Journal Nat Rev Genet
Specialty Genetics
Date 2011 Nov 30
PMID 22124482
Citations 818
Authors
Affiliations
Soon will be listed here.
Abstract

Repetitive DNA sequences are abundant in a broad range of species, from bacteria to mammals, and they cover nearly half of the human genome. Repeats have always presented technical challenges for sequence alignment and assembly programs. Next-generation sequencing projects, with their short read lengths and high data volumes, have made these challenges more difficult. From a computational perspective, repeats create ambiguities in alignment and assembly, which, in turn, can produce biases and errors when interpreting results. Simply ignoring repeats is not an option, as this creates problems of its own and may mean that important biological phenomena are missed. We discuss the computational problems surrounding repeats and describe strategies used by current bioinformatics systems to solve them.

Citing Articles

Variant ribosomal DNA is essential for female differentiation in zebrafish.

Moser T, Bond D, Hore T Philos Trans R Soc Lond B Biol Sci. 2025; 380(1921):20240107.

PMID: 40045777 PMC: 11883429. DOI: 10.1098/rstb.2024.0107.


ScatTR: Estimating the Size of Long Tandem Repeat Expansions from Short-Reads.

Al-Abri R, Gursoy G bioRxiv. 2025; .

PMID: 40027646 PMC: 11870476. DOI: 10.1101/2025.02.15.638440.


Statistical Distributions of Genome Assemblies Reveal Random Effects in Ancient Viral DNA Reconstructions.

Antoneli F, Peter C, Briones M Viruses. 2025; 17(2).

PMID: 40006948 PMC: 11861991. DOI: 10.3390/v17020195.


High-Fidelity Long-Read Sequencing of an Avian Herpesvirus Reveals Extensive Intrapopulation Diversity in Tandem Repeat Regions.

Ortigas-Vasquez A, Bowen C, Renner D, Baigent S, Zhang Y, Yao Y bioRxiv. 2025; .

PMID: 39990410 PMC: 11844383. DOI: 10.1101/2025.02.10.637388.


Identification of cryptic breakpoints through single-tube long fragment read whole genome sequencing based on preimplantation genetic testing.

Jiang L, Mai Z, Peng J, Du T, Wang W, Chen X NPJ Genom Med. 2025; 10(1):15.

PMID: 39984519 PMC: 11845665. DOI: 10.1038/s41525-025-00471-x.


References
1.
Pop M, Salzberg S . Bioinformatics challenges of new sequencing technology. Trends Genet. 2008; 24(3):142-9. PMC: 2680276. DOI: 10.1016/j.tig.2007.12.006. View

2.
Homer N, Merriman B, Nelson S . BFAST: an alignment tool for large scale genome resequencing. PLoS One. 2009; 4(11):e7767. PMC: 2770639. DOI: 10.1371/journal.pone.0007767. View

3.
Li Y, Hu Y, Bolund L, Wang J . State of the art de novo assembly of human genomes from massively parallel sequencing data. Hum Genomics. 2010; 4(4):271-7. PMC: 3525208. DOI: 10.1186/1479-7364-4-4-271. View

4.
He D, Hormozdiari F, Furlotte N, Eskin E . Efficient algorithms for tandem copy number variation reconstruction in repeat-rich regions. Bioinformatics. 2011; 27(11):1513-20. PMC: 3102223. DOI: 10.1093/bioinformatics/btr169. View

5.
Pevzner P, Tang H, Waterman M . An Eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci U S A. 2001; 98(17):9748-53. PMC: 55524. DOI: 10.1073/pnas.171285098. View