» Articles » PMID: 32183904

Gene Family Information Facilitates Variant Interpretation and Identification of Disease-associated Genes in Neurodevelopmental Disorders

Abstract

Background: Classifying pathogenicity of missense variants represents a major challenge in clinical practice during the diagnoses of rare and genetic heterogeneous neurodevelopmental disorders (NDDs). While orthologous gene conservation is commonly employed in variant annotation, approximately 80% of known disease-associated genes belong to gene families. The use of gene family information for disease gene discovery and variant interpretation has not yet been investigated on a genome-wide scale. We empirically evaluate whether paralog-conserved or non-conserved sites in human gene families are important in NDDs.

Methods: Gene family information was collected from Ensembl. Paralog-conserved sites were defined based on paralog sequence alignments; 10,068 NDD patients and 2078 controls were statistically evaluated for de novo variant burden in gene families.

Results: We demonstrate that disease-associated missense variants are enriched at paralog-conserved sites across all disease groups and inheritance models tested. We developed a gene family de novo enrichment framework that identified 43 exome-wide enriched gene families including 98 de novo variant carrying genes in NDD patients of which 28 represent novel candidate genes for NDD which are brain expressed and under evolutionary constraint.

Conclusion: This study represents the first method to incorporate gene family information into a statistical framework to interpret variant data for NDDs and to discover new NDD-associated genes.

Citing Articles

Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants.

Haque B, Cheerie D, Pan A, Curtis M, Nalpathamkalam T, Nguyen J PLoS Genet. 2025; 21(1):e1011540.

PMID: 39761285 PMC: 11737861. DOI: 10.1371/journal.pgen.1011540.


Exome sequencing of 20,979 individuals with epilepsy reveals shared and distinct ultra-rare genetic risk across disorder subtypes.

Nat Neurosci. 2024; 27(10):1864-1879.

PMID: 39363051 PMC: 11646479. DOI: 10.1038/s41593-024-01747-8.


Boolean Modeling of Biological Network Applied to Protein-Protein Interaction Network of Autism Patients.

Nezamuldeen L, Jafri M Biology (Basel). 2024; 13(8).

PMID: 39194544 PMC: 11352122. DOI: 10.3390/biology13080606.


Genetic constraint at single amino acid resolution in protein domains improves missense variant prioritisation and gene discovery.

Zhang X, Theotokis P, Li N, Wright C, Samocha K, Whiffin N Genome Med. 2024; 16(1):88.

PMID: 38992748 PMC: 11238507. DOI: 10.1186/s13073-024-01358-9.


Genotypic and phenotypic characteristics of sodium channel-associated epilepsy in Chinese population.

Dong R, Jin R, Zhang H, Zhang H, Xue M, Li Y J Hum Genet. 2024; 69(9):441-453.

PMID: 38880818 DOI: 10.1038/s10038-024-01257-2.


References
1.
Samocha K, Robinson E, Sanders S, Stevens C, Sabo A, McGrath L . A framework for the interpretation of de novo mutation in human disease. Nat Genet. 2014; 46(9):944-50. PMC: 4222185. DOI: 10.1038/ng.3050. View

2.
Chen S, Krinsky B, Long M . New genes as drivers of phenotypic evolution. Nat Rev Genet. 2013; 14(9):645-60. PMC: 4236023. DOI: 10.1038/nrg3521. View

3.
Perez-Palma E, May P, Iqbal S, Niestroj L, Du J, Heyne H . Identification of pathogenic variant enriched regions across genes and gene families. Genome Res. 2019; 30(1):62-71. PMC: 6961572. DOI: 10.1101/gr.252601.119. View

4.
Lek M, Karczewski K, Minikel E, Samocha K, Banks E, Fennell T . Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016; 536(7616):285-91. PMC: 5018207. DOI: 10.1038/nature19057. View

5.
Farrell C, OLeary N, Harte R, Loveland J, Wilming L, Wallin C . Current status and new features of the Consensus Coding Sequence database. Nucleic Acids Res. 2013; 42(Database issue):D865-72. PMC: 3965069. DOI: 10.1093/nar/gkt1059. View