» Articles » PMID: 35468861

Large-scale Discovery of Novel Neurodevelopmental Disorder-related Genes Through a Unified Analysis of Single-nucleotide and Copy Number Variants

Abstract

Background: Previous large-scale studies of de novo variants identified a number of genes associated with neurodevelopmental disorders (NDDs); however, it was also predicted that many NDD-associated genes await discovery. Such genes can be discovered by integrating copy number variants (CNVs), which have not been fully considered in previous studies, and increasing the sample size.

Methods: We first constructed a model estimating the rates of de novo CNVs per gene from several factors such as gene length and number of exons. Second, we compiled a comprehensive list of de novo single-nucleotide variants (SNVs) in 41,165 individuals and de novo CNVs in 3675 individuals with NDDs by aggregating our own and publicly available datasets, including denovo-db and the Deciphering Developmental Disorders study data. Third, summing up the de novo CNV rates that we estimated and SNV rates previously established, gene-based enrichment of de novo deleterious SNVs and CNVs were assessed in the 41,165 cases. Significantly enriched genes were further prioritized according to their similarity to known NDD genes using a deep learning model that considers functional characteristics (e.g., gene ontology and expression patterns).

Results: We identified a total of 380 genes achieving statistical significance (5% false discovery rate), including 31 genes affected by de novo CNVs. Of the 380 genes, 52 have not previously been reported as NDD genes, and the data of de novo CNVs contributed to the significance of three genes (GLTSCR1, MARK2, and UBR3). Among the 52 genes, we reasonably excluded 18 genes [a number almost identical to the theoretically expected false positives (i.e., 380 × 0.05 = 19)] given their constraints against deleterious variants and extracted 34 "plausible" candidate genes. Their validity as NDD genes was consistently supported by their similarity in function and gene expression patterns to known NDD genes. Quantifying the overall similarity using deep learning, we identified 11 high-confidence (> 90% true-positive probabilities) candidate genes: HDAC2, SUPT16H, HECTD4, CHD5, XPO1, GSK3B, NLGN2, ADGRB1, CTR9, BRD3, and MARK2.

Conclusions: We identified dozens of new candidates for NDD genes. Both the methods and the resources developed here will contribute to the further identification of novel NDD-associated genes.

Citing Articles

Genomics of rare diseases in the Greater Middle East.

Chekroun I, Shenbagam S, Almarri M, Mokrab Y, Uddin M, Alkhnbashi O Nat Genet. 2025; 57(3):505-514.

PMID: 39901015 DOI: 10.1038/s41588-025-02075-8.


Molecular Mediators of Neutrophil Primary Granule Release Following Acute Ischemic Stroke and their Associated Epigenetic Modulation by HDAC2.

Li X, Geng X, Fan J, Yan F, Wang R, Yang Z Mol Neurobiol. 2025; .

PMID: 39832064 DOI: 10.1007/s12035-025-04699-7.


Genome-Wide Structural Variation Analysis and Breed Comparison of Local Domestic Ducks in Shandong Province, China.

Ren P, Zhang M, Khan M, Yang L, Jing Y, Liu X Animals (Basel). 2025; 14(24.

PMID: 39765561 PMC: 11672513. DOI: 10.3390/ani14243657.


Monoallelic loss-of-function variants in GSK3B lead to autism and developmental delay.

Tan S, Zhang Q, Zhan R, Luo S, Han Y, Yu B Mol Psychiatry. 2024; .

PMID: 39472663 DOI: 10.1038/s41380-024-02806-z.


MARK2 variants cause autism spectrum disorder via the downregulation of WNT/β-catenin signaling pathway.

Gong M, Li J, Qin Z, Machado Bressan Wilke M, Liu Y, Li Q Am J Hum Genet. 2024; 111(11):2392-2410.

PMID: 39419027 PMC: 11568763. DOI: 10.1016/j.ajhg.2024.09.006.


References
1.
Firth H, Richards S, Bevan A, Clayton S, Corpas M, Rajan D . DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. Am J Hum Genet. 2009; 84(4):524-33. PMC: 2667985. DOI: 10.1016/j.ajhg.2009.03.010. View

2.
Coe B, Stessman H, Sulovari A, Geisheker M, Bakken T, Lake A . Neurodevelopmental disease genes implicated by de novo mutation and copy number variation morbidity. Nat Genet. 2018; 51(1):106-116. PMC: 6309590. DOI: 10.1038/s41588-018-0288-4. View

3.
Takata A, Nakashima M, Saitsu H, Mizuguchi T, Mitsuhashi S, Takahashi Y . Comprehensive analysis of coding variants highlights genetic complexity in developmental and epileptic encephalopathy. Nat Commun. 2019; 10(1):2506. PMC: 6555845. DOI: 10.1038/s41467-019-10482-9. View

4.
Petrovski S, Wang Q, Heinzen E, Allen A, Goldstein D . Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 2013; 9(8):e1003709. PMC: 3749936. DOI: 10.1371/journal.pgen.1003709. View

5.
Karczewski K, Francioli L, Tiao G, Cummings B, Alfoldi J, Wang Q . The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020; 581(7809):434-443. PMC: 7334197. DOI: 10.1038/s41586-020-2308-7. View