» Articles » PMID: 24831296

Genomic Characterization of Large Heterochromatic Gaps in the Human Genome Assembly

Overview
Specialty Biology
Date 2014 May 17
PMID 24831296
Citations 65
Authors
Affiliations
Soon will be listed here.
Abstract

The largest gaps in the human genome assembly correspond to multi-megabase heterochromatic regions composed primarily of two related families of tandem repeats, Human Satellites 2 and 3 (HSat2,3). The abundance of repetitive DNA in these regions challenges standard mapping and assembly algorithms, and as a result, the sequence composition and potential biological functions of these regions remain largely unexplored. Furthermore, existing genomic tools designed to predict consensus-based descriptions of repeat families cannot be readily applied to complex satellite repeats such as HSat2,3, which lack a consistent repeat unit reference sequence. Here we present an alignment-free method to characterize complex satellites using whole-genome shotgun read datasets. Utilizing this approach, we classify HSat2,3 sequences into fourteen subfamilies and predict their chromosomal distributions, resulting in a comprehensive satellite reference database to further enable genomic studies of heterochromatic regions. We also identify 1.3 Mb of non-repetitive sequence interspersed with HSat2,3 across 17 unmapped assembly scaffolds, including eight annotated gene predictions. Finally, we apply our satellite reference database to high-throughput sequence data from 396 males to estimate array size variation of the predominant HSat3 array on the Y chromosome, confirming that satellite array sizes can vary between individuals over an order of magnitude (7 to 98 Mb) and further demonstrating that array sizes are distributed differently within distinct Y haplogroups. In summary, we present a novel framework for generating initial reference databases for unassembled genomic regions enriched with complex satellite DNA, and we further demonstrate the utility of these reference databases for studying patterns of sequence variation within human populations.

Citing Articles

Locus-specific differential expression of human satellite sequences in the nuclei of cancer cells and heat-shocked cells.

Rabeler C, Paterna N, Potluri R, DAlessandro L, Bhatia A, Chen S Nucleus. 2024; 15(1):2431239.

PMID: 39620275 PMC: 11622622. DOI: 10.1080/19491034.2024.2431239.


The Structure of Simple Satellite Variation in the Human Genome and Its Correlation With Centromere Ancestry.

Said I, Barbash D, Clark A Genome Biol Evol. 2024; 16(8).

PMID: 39018452 PMC: 11305138. DOI: 10.1093/gbe/evae153.


Genetic variation in recalcitrant repetitive regions of the genome.

Shukla H, Chakraborty M, Emerson J bioRxiv. 2024; .

PMID: 38915508 PMC: 11195212. DOI: 10.1101/2024.06.11.598575.


More than the SRY: The Non-Coding Landscape of the Y Chromosome and Its Importance in Human Disease.

Westemeier-Rice E, Winters M, Rawson T, Martinez I Noncoding RNA. 2024; 10(2).

PMID: 38668379 PMC: 11054740. DOI: 10.3390/ncrna10020021.


Oncogenic ETS fusions promote DNA damage and proinflammatory responses via pericentromeric RNAs in extracellular vesicles.

Ruzanov P, Evdokimova V, Pachva M, Minkovich A, Zhang Z, Langman S J Clin Invest. 2024; 134(9).

PMID: 38530366 PMC: 11060741. DOI: 10.1172/JCI169470.


References
1.
Gosden J, Lawrie S, Cooke H . A cloned repeated DNA sequence in human chromosome heteromorphisms. Cytogenet Cell Genet. 1981; 29(1):32-9. DOI: 10.1159/000131549. View

2.
Sun X, Wahlstrom J, Karpen G . Molecular structure of a functional Drosophila centromere. Cell. 1998; 91(7):1007-19. PMC: 3209480. DOI: 10.1016/s0092-8674(00)80491-2. View

3.
Bandyopadhyay R, McQuillan C, Page S, Choo K, Shaffer L . Identification and characterization of satellite III subfamilies to the acrocentric chromosomes. Chromosome Res. 2001; 9(3):223-33. DOI: 10.1023/a:1016648404388. View

4.
Rudd M, Willard H . Analysis of the centromeric regions of the human genome assembly. Trends Genet. 2004; 20(11):529-33. DOI: 10.1016/j.tig.2004.08.008. View

5.
She X, Horvath J, Jiang Z, Liu G, Furey T, Christ L . The structure and evolution of centromeric transition regions within the human genome. Nature. 2004; 430(7002):857-64. DOI: 10.1038/nature02806. View