» Articles » PMID: 38178268

Using Multi-scale Genomics to Associate Poorly Annotated Genes with Rare Diseases

Abstract

Background: Next-generation sequencing (NGS) has significantly transformed the landscape of identifying disease-causing genes associated with genetic disorders. However, a substantial portion of sequenced patients remains undiagnosed. This may be attributed not only to the challenges posed by harder-to-detect variants, such as non-coding and structural variations but also to the existence of variants in genes not previously associated with the patient's clinical phenotype. This study introduces EvORanker, an algorithm that integrates unbiased data from 1,028 eukaryotic genomes to link mutated genes to clinical phenotypes.

Methods: EvORanker utilizes clinical data, multi-scale phylogenetic profiling, and other omics data to prioritize disease-associated genes. It was evaluated on solved exomes and simulated genomes, compared with existing methods, and applied to 6260 knockout genes with mouse phenotypes lacking human associations. Additionally, EvORanker was made accessible as a user-friendly web tool.

Results: In the analyzed exomic cohort, EvORanker accurately identified the "true" disease gene as the top candidate in 69% of cases and within the top 5 candidates in 95% of cases, consistent with results from the simulated dataset. Notably, EvORanker outperformed existing methods, particularly for poorly annotated genes. In the case of the 6260 knockout genes with mouse phenotypes, EvORanker linked 41% of these genes to observed human disease phenotypes. Furthermore, in two unsolved cases, EvORanker successfully identified DLGAP2 and LPCAT3 as disease candidates for previously uncharacterized genetic syndromes.

Conclusions: We highlight clade-based phylogenetic profiling as a powerful systematic approach for prioritizing potential disease genes. Our study showcases the efficacy of EvORanker in associating poorly annotated genes to disease phenotypes observed in patients. The EvORanker server is freely available at https://ccanavati.shinyapps.io/EvORanker/ .

Citing Articles

The Unified Phenotype Ontology (uPheno): A framework for cross-species integrative phenomics.

Matentzoglu N, Bello S, Stefancsik R, Alghamdi S, Anagnostopoulos A, Balhoff J bioRxiv. 2024; .

PMID: 39345458 PMC: 11429889. DOI: 10.1101/2024.09.18.613276.

References
1.
Bamshad M, Nickerson D, Chong J . Mendelian Gene Discovery: Fast and Furious with No End in Sight. Am J Hum Genet. 2019; 105(3):448-455. PMC: 6731362. DOI: 10.1016/j.ajhg.2019.07.011. View

2.
Davydov E, Goode D, Sirota M, Cooper G, Sidow A, Batzoglou S . Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput Biol. 2010; 6(12):e1001025. PMC: 2996323. DOI: 10.1371/journal.pcbi.1001025. View

3.
Findlay S, Heath J, Luo V, Malina A, Morin T, Coulombe Y . SHLD2/FAM35A co-operates with REV7 to coordinate DNA double-strand break repair pathway choice. EMBO J. 2018; 37(18). PMC: 6138439. DOI: 10.15252/embj.2018100158. View

4.
Rong X, Wang B, Dunham M, Hedde P, Wong J, Gratton E . Lpcat3-dependent production of arachidonoyl phospholipids is a key determinant of triglyceride secretion. Elife. 2015; 4. PMC: 4400582. DOI: 10.7554/eLife.06557. View

5.
Jiang Z . Protein function predictions based on the phylogenetic profile method. Crit Rev Biotechnol. 2008; 28(4):233-8. DOI: 10.1080/07388550802512633. View