» Articles » PMID: 32013076

Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2

Overview
Journal Genes (Basel)
Publisher MDPI
Date 2020 Feb 5
PMID 32013076
Citations 12
Authors
Affiliations
Soon will be listed here.
Abstract

Gene duplication is a major mechanism for the evolution of gene novelty, and copy-number variation makes a major contribution to inter-individual genetic diversity. However, most approaches for studying copy-number variation rely upon uniquely mapping reads to a genome reference and are unable to distinguish among duplicated sequences. Specialized approaches to interrogate specific paralogs are comparatively slow and have a high degree of computational complexity, limiting their effective application to emerging population-scale data sets. We present QuicK-mer2, a self-contained, mapping-free approach that enables the rapid construction of paralog-specific copy-number maps from short-read sequence data. This approach is based on the tabulation of unique k-mer sequences from short-read data sets, and is able to analyze a 20X coverage human genome in approximately 20 min. We applied our approach to newly released sequence data from the 1000 Genomes Project, constructed paralog-specific copy-number maps from 2457 unrelated individuals, and uncovered copy-number variation of paralogous genes. We identify nine genes where none of the analyzed samples have a copy number of two, 92 genes where the majority of samples have a copy number other than two, and describe rare copy number variation effecting multiple genes at the APOBEC3 locus.

Citing Articles

Gene expansions contributing to human brain evolution.

Soto D, Uribe-Salazar J, Kaya G, Valdarrago R, Sekar A, Haghani N bioRxiv. 2024; .

PMID: 39386494 PMC: 11463660. DOI: 10.1101/2024.09.26.615256.


Duplications and Retrogenes Are Numerous and Widespread in Modern Canine Genomic Assemblies.

Nguyen A, Blacksmith M, Kidd J Genome Biol Evol. 2024; 16(7).

PMID: 38946312 PMC: 11259980. DOI: 10.1093/gbe/evae142.


LRRC37B is a human modifier of voltage-gated sodium channels and axon excitability in cortical neurons.

Libe-Philippot B, Lejeune A, Wierda K, Louros N, Erkol E, Vlaeminck I Cell. 2023; 186(26):5766-5783.e25.

PMID: 38134874 PMC: 10754148. DOI: 10.1016/j.cell.2023.11.028.


GeneToCN: an alignment-free method for gene copy number estimation directly from next-generation sequencing reads.

Pajuste F, Remm M Sci Rep. 2023; 13(1):17765.

PMID: 37853040 PMC: 10584998. DOI: 10.1038/s41598-023-44636-z.


Genome sequencing of 2000 canids by the Dog10K consortium advances the understanding of demography, genome function and architecture.

Meadows J, Kidd J, Wang G, Parker H, Schall P, Bianchi M Genome Biol. 2023; 24(1):187.

PMID: 37582787 PMC: 10426128. DOI: 10.1186/s13059-023-03023-7.


References
1.
Wang J, Huda A, Lunyak V, Jordan I . A Gibbs sampling strategy applied to the mapping of ambiguous short-sequence tags. Bioinformatics. 2010; 26(20):2501-8. PMC: 2951085. DOI: 10.1093/bioinformatics/btq460. View

2.
Zeng X, Li B, Welch R, Rojo C, Zheng Y, Dewey C . Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping. PLoS Comput Biol. 2015; 11(10):e1004491. PMC: 4618727. DOI: 10.1371/journal.pcbi.1004491. View

3.
Kidd J, Sampas N, Antonacci F, Graves T, Fulton R, Hayden H . Characterization of missing human genome sequences and copy-number polymorphic insertions. Nat Methods. 2010; 7(5):365-71. PMC: 2875995. DOI: 10.1038/nmeth.1451. View

4.
Harpak A, Lan X, Gao Z, Pritchard J . Frequent nonallelic gene conversion on the human lineage and its effect on the divergence of gene duplicates. Proc Natl Acad Sci U S A. 2017; 114(48):12779-12784. PMC: 5715747. DOI: 10.1073/pnas.1708151114. View

5.
Zheng G, Lau B, Schnall-Levin M, Jarosz M, Bell J, Hindson C . Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat Biotechnol. 2016; 34(3):303-11. PMC: 4786454. DOI: 10.1038/nbt.3432. View