» Articles » PMID: 22144907

Repetitive Elements May Comprise over Two-thirds of the Human Genome

Overview
Journal PLoS Genet
Specialty Genetics
Date 2011 Dec 7
PMID 22144907
Citations 575
Authors
Affiliations
Soon will be listed here.
Abstract

Transposable elements (TEs) are conventionally identified in eukaryotic genomes by alignment to consensus element sequences. Using this approach, about half of the human genome has been previously identified as TEs and low-complexity repeats. We recently developed a highly sensitive alternative de novo strategy, P-clouds, that instead searches for clusters of high-abundance oligonucleotides that are related in sequence space (oligo "clouds"). We show here that P-clouds predicts >840 Mbp of additional repetitive sequences in the human genome, thus suggesting that 66%-69% of the human genome is repetitive or repeat-derived. To investigate this remarkable difference, we conducted detailed analyses of the ability of both P-clouds and a commonly used conventional approach, RepeatMasker (RM), to detect different sized fragments of the highly abundant human Alu and MIR SINEs. RM can have surprisingly low sensitivity for even moderately long fragments, in contrast to P-clouds, which has good sensitivity down to small fragment sizes (∼25 bp). Although short fragments have a high intrinsic probability of being false positives, we performed a probabilistic annotation that reflects this fact. We further developed "element-specific" P-clouds (ESPs) to identify novel Alu and MIR SINE elements, and using it we identified ∼100 Mb of previously unannotated human elements. ESP estimates of new MIR sequences are in good agreement with RM-based predictions of the amount that RM missed. These results highlight the need for combined, probabilistic genome annotation approaches and suggest that the human genome consists of substantially more repetitive sequence than previously believed.

Citing Articles

intronic expansion identified by poly-glycine-arginine pathology increases Alzheimer's disease risk.

Nguyen L, Ajredini R, Guo S, Romano L, Tomas R, Bell L Proc Natl Acad Sci U S A. 2025; 122(7):e2416885122.

PMID: 39937857 PMC: 11848317. DOI: 10.1073/pnas.2416885122.


Somatic piRNA and PIWI-mediated post-transcriptional gene regulation in stem cells and disease.

Patel M, Jiang Y, Kakumani P Front Cell Dev Biol. 2024; 12:1495035.

PMID: 39717847 PMC: 11663942. DOI: 10.3389/fcell.2024.1495035.


Evolution of Repetitive Elements, Their Roles in Homeostasis and Human Disease, and Potential Therapeutic Applications.

Snowbarger J, Koganti P, Spruck C Biomolecules. 2024; 14(10).

PMID: 39456183 PMC: 11506328. DOI: 10.3390/biom14101250.


Ni(II) Cylinders Damage DNA in Cancer Cells and Preferentially Bind Y-Shaped DNA Three-Way Junctions Blocking DNA Synthesis.

Malina J, Kostrhunova H, Brabec V Small. 2024; 20(52):e2406814.

PMID: 39428899 PMC: 11673443. DOI: 10.1002/smll.202406814.


MHConstructor: a high-throughput, haplotype-informed solution to the MHC assembly challenge.

Wade K, Suseno R, Kizer K, Williams J, Boquett J, Caillier S Genome Biol. 2024; 25(1):274.

PMID: 39420419 PMC: 11484429. DOI: 10.1186/s13059-024-03412-6.


References
1.
Bao Z, Eddy S . Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 2002; 12(8):1269-76. PMC: 186642. DOI: 10.1101/gr.88502. View

2.
Pontius J, Mullikin J, Smith D, Lindblad-Toh K, Gnerre S, Clamp M . Initial sequence and comparative analysis of the cat genome. Genome Res. 2007; 17(11):1675-89. PMC: 2045150. DOI: 10.1101/gr.6380007. View

3.
Jurka J, Kapitonov V, Kohany O, Jurka M . Repetitive sequences in complex genomes: structure and evolution. Annu Rev Genomics Hum Genet. 2007; 8:241-59. DOI: 10.1146/annurev.genom.8.080706.092416. View

4.
Warren W, Clayton D, Ellegren H, Arnold A, Hillier L, Kunstner A . The genome of a songbird. Nature. 2010; 464(7289):757-62. PMC: 3187626. DOI: 10.1038/nature08819. View

5.
Edgar R, Myers E . PILER: identification and classification of genomic repeats. Bioinformatics. 2005; 21 Suppl 1:i152-8. DOI: 10.1093/bioinformatics/bti1003. View