» Articles » PMID: 26163063

Utilizing Mapping Targets of Sequences Underrepresented in the Reference Assembly to Reduce False Positive Alignments

Overview
Specialty Biochemistry
Date 2015 Jul 12
PMID 26163063
Citations 24
Authors
Affiliations
Soon will be listed here.
Abstract

The human reference assembly remains incomplete due to the underrepresentation of repeat-rich sequences that are found within centromeric regions and acrocentric short arms. Although these sequences are marginally represented in the assembly, they are often fully represented in whole-genome short-read datasets and contribute to inappropriate alignments and high read-depth signals that localize to a small number of assembled homologous regions. As a consequence, these regions often provide artifactual peak calls that confound hypothesis testing and large-scale genomic studies. To address this problem, we have constructed mapping targets that represent roughly 8% of the human genome generally omitted from the human reference assembly. By integrating these data into standard mapping and peak-calling pipelines we demonstrate a 10-fold reduction in signals in regions common to the blacklisted region and identify a comprehensive set of regions that exhibit mapping sensitivity with the presence of the repeat-rich targets.

Citing Articles

Beyond Blacklists: A Critical Assessment of Exclusion Set Generation Strategies and Alternative Approaches.

Wall B, Ogata J, Nguyen M, McClay J, Harrell J, Dozmorov M bioRxiv. 2025; .

PMID: 39975128 PMC: 11839099. DOI: 10.1101/2025.02.06.636968.


PancrESS - a meta-analysis resource for understanding cell-type specific expression in the human pancreas.

Sturgill D, Wang L, Arda H BMC Genomics. 2024; 25(1):76.

PMID: 38238687 PMC: 10797729. DOI: 10.1186/s12864-024-09964-y.


excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies.

Ogata J, Mu W, Davis E, Xue B, Harrell J, Sheffield N Bioinformatics. 2023; 39(4).

PMID: 37067481 PMC: 10126321. DOI: 10.1093/bioinformatics/btad198.


GFI1-Dependent Repression of Increases Multiple Myeloma Cell Survival.

Petrusca D, Mulcrone P, Macar D, Bishop R, Berdyshev E, Suvannasankha A Cancers (Basel). 2022; 14(3).

PMID: 35159039 PMC: 8833953. DOI: 10.3390/cancers14030772.


Precise Identification of Recurrent Somatic Mutations in Oral Cancer Through Whole-Exome Sequencing Using Multiple Mutation Calling Pipelines.

Lin L, Chou C, Cheng H, Chang K, Liu C Front Oncol. 2021; 11:741626.

PMID: 34912705 PMC: 8666431. DOI: 10.3389/fonc.2021.741626.


References
1.
Eichler E, Clark R, She X . An assessment of the sequence gaps: unfinished business in a finished human genome. Nat Rev Genet. 2004; 5(5):345-54. DOI: 10.1038/nrg1322. View

2.
Robin E, Wong R . Mitochondrial DNA molecules and virtual number of mitochondria per cell in mammalian cells. J Cell Physiol. 1988; 136(3):507-13. DOI: 10.1002/jcp.1041360316. View

3.
Gonzalez I, Sylvester J . Complete sequence of the 43-kb human ribosomal DNA repeat: analysis of the intergenic spacer. Genomics. 1995; 27(2):320-8. DOI: 10.1006/geno.1995.1049. View

4.
Benson G . Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1998; 27(2):573-80. PMC: 148217. DOI: 10.1093/nar/27.2.573. View

5.
Jurka J, Kapitonov V, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J . Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005; 110(1-4):462-7. DOI: 10.1159/000084979. View