» Articles » PMID: 40060867

XGBMUT: Predicting the Functional Impact of Missense Mutations Using an Extreme Gradient Boost Classifier

Abstract

Millions of new mutations have been discovered largely due to advancements in genome projects, but characterizing their effects through traditional wet-lab experiments remains labor-intensive and time-consuming. Functional prediction algorithms offer a solution by enabling the efficient screening of mutations, thereby saving time and resources. The objective of this study was to develop a competitive algorithm for predicting the functional impact of missense mutations. A unified database and substitution matrices containing predictor variables specifically for missense mutations were initially constructed. Subsequently, values for the predictor variables were collected from the training and test sets derived from the ClinVar and HumsaVar databases. A series of supervised machine learning classifiers were then trained, and their performance was evaluated using the test set. The best-performing model was additionally compared against ten currently available functional prediction algorithms. The proposed algorithm, XGBMut, demonstrates exceptional accuracy in classifying missense mutations while also exhibiting a competitive performance. Additionally, a user-friendly graphical interface was developed to enhance accessibility for professionals in various fields. Unlike most existing methods, XGBMut eliminates the need for a web server dependency and the installation of third-party software, making it a more versatile tool for users.

References
1.
Gong T, Yang L, Shen F, Chen H, Pan Z, Zhang Q . Computational and Mass Spectrometry-Based Approach Identify Deleterious Non-Synonymous Single Nucleotide Polymorphisms (nsSNPs) in JMJD6. Molecules. 2021; 26(15). PMC: 8347302. DOI: 10.3390/molecules26154653. View

2.
Jiang T, Fang L, Wang K . Deciphering "the language of nature": A transformer-based language model for deleterious mutations in proteins. Innovation (Camb). 2023; 4(5):100487. PMC: 10448337. DOI: 10.1016/j.xinn.2023.100487. View

3.
Pejaver V, Urresti J, Lugo-Martinez J, Pagel K, Lin G, Nam H . Inferring the molecular and phenotypic impact of amino acid variants with MutPred2. Nat Commun. 2020; 11(1):5918. PMC: 7680112. DOI: 10.1038/s41467-020-19669-x. View

4.
Da Conceicao L, Cabral L, Pereira G, De Mesquita J . An In Silico Analysis of Genetic Variants and Structural Modeling of the Human Frataxin Protein in Friedreich's Ataxia. Int J Mol Sci. 2024; 25(11). PMC: 11172458. DOI: 10.3390/ijms25115796. View

5.
Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J . Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015; 17(5):405-24. PMC: 4544753. DOI: 10.1038/gim.2015.30. View