» Articles » PMID: 39975192

Structural Variation, Selection, and Diversification of the Gene Family from the Human Pangenome

Overview
Journal bioRxiv
Date 2025 Feb 20
PMID 39975192
Authors
Affiliations
Soon will be listed here.
Abstract

The (nuclear pore interacting protein) gene family has expanded to high copy number in humans and African apes where it has been subject to an excess of amino acid replacement consistent with positive selection (1). Due to the limitations of short-read sequencing, human genetic diversity has been poorly understood. Using highly accurate assemblies generated from long-read sequencing as part of the human pangenome, we completely characterize 169 human haplotypes (4,665 paralogs and alleles). Of the 28 paralogs, just three (, , and ) are fixed at a single copy, and only a single locus, , shows no structural variation. Four paralogs map to large segmental duplication blocks that mediate polymorphic inversions (355 kbp-1.6 Mbp) corresponding to microdeletions associated with developmental delay and autism. Haplotype-based tests of positive selection and selective sweeps identify two paralogs, and , within the top percentile for both tests. Using full-length cDNA data from 101 tissue/cell types, we construct paralog-specific gene models and show that 56% (31/55 most abundant isoforms) have not been previously described in RefSeq. We define six distinct translation start sites and other protein structural features that distinguish paralogs, including a variable number tandem repeat that encodes a beta helix of variable size that emerged ~3.1 million years ago in human evolution. Among the 28 paralogs, we identify distinct tissue and developmental patterns of expression with only a few maintaining the ancestral testis-enriched expression. A subset of paralogs (, , , , and ) show increased brain expression. Our results suggest ongoing positive selection in the human population and rapid diversification of gene models.

References
1.
Ferrer-Admetlla A, Liang M, Korneliussen T, Nielsen R . On detecting incomplete soft or hard selective sweeps using haplotype structure. Mol Biol Evol. 2014; 31(5):1275-91. PMC: 3995338. DOI: 10.1093/molbev/msu077. View

2.
Gonzalez J, Caceres A, Esko T, Cusco I, Puig M, Esnaola M . A common 16p11.2 inversion underlies the joint susceptibility to asthma and obesity. Am J Hum Genet. 2014; 94(3):361-72. PMC: 3951940. DOI: 10.1016/j.ajhg.2014.01.015. View

3.
Hallast P, Ebert P, Loftus M, Yilmaz F, Audano P, Logsdon G . Assembly of 43 human Y chromosomes reveals extensive complexity and variation. Nature. 2023; 621(7978):355-364. PMC: 10726138. DOI: 10.1038/s41586-023-06425-6. View

4.
Giannuzzi G, Siswara P, Malig M, Marques-Bonet T, Mullikin J, Ventura M . Evolutionary dynamism of the primate LRRC37 gene family. Genome Res. 2012; 23(1):46-59. PMC: 3530683. DOI: 10.1101/gr.138842.112. View

5.
Stallings R, Whitmore S, Doggett N, Callen D . Refined physical mapping of chromosome 16-specific low-abundance repetitive DNA sequences. Cytogenet Cell Genet. 1993; 63(2):97-101. DOI: 10.1159/000133509. View