» Articles » PMID: 31312416

An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features

Overview
Specialty Biotechnology
Date 2019 Jul 18
PMID 31312416
Citations 13
Authors
Affiliations
Soon will be listed here.
Abstract

The availability of whole-genome sequences and associated multi-omics data sets, combined with advances in gene knockout and knockdown methods, has enabled large-scale annotation and exploration of gene and protein functions in eukaryotes. Knowing which genes are essential for the survival of eukaryotic organisms is paramount for an understanding of the basic mechanisms of life, and could assist in identifying intervention targets in eukaryotic pathogens and cancer. Here, we studied essential gene orthologs among selected species of eukaryotes, and then employed a systematic machine-learning approach, using protein sequence-derived features and selection procedures, to investigate essential gene predictions within and among species. We showed that the numbers of essential gene orthologs comprise small fractions when compared with the total number of orthologs among the eukaryotic species studied. In addition, we demonstrated that machine-learning models trained with subsets of essentiality-related data performed better than random guessing of gene essentiality for a particular species. Consistent with our gene ortholog analysis, the predictions of essential genes among multiple (including distantly-related) species is possible, yet challenging, suggesting that most essential genes are unique to a species. The present work provides a foundation for the expansion of genome-wide essentiality investigations in eukaryotes using machine learning approaches.

Citing Articles

Differentially used codons among essential genes in bacteria identified by machine learning-based analysis.

Kurmi A, Sen P, Dash M, Ray S, Satapathy S Mol Genet Genomics. 2024; 299(1):72.

PMID: 39060647 DOI: 10.1007/s00438-024-02163-0.


Integration of graph neural networks and genome-scale metabolic models for predicting gene essentiality.

Hasibi R, Michoel T, Oyarzun D NPJ Syst Biol Appl. 2024; 10(1):24.

PMID: 38448436 PMC: 10917767. DOI: 10.1038/s41540-024-00348-2.


'Bingo'-a large language model- and graph neural network-based workflow for the prediction of essential genes from protein data.

Ma J, Song J, Young N, Chang B, Korhonen P, Campos T Brief Bioinform. 2023; 25(1).

PMID: 38152979 PMC: 10753293. DOI: 10.1093/bib/bbad472.


Heuristic-enabled active machine learning: A case study of predicting essential developmental stage and immune response genes in Drosophila melanogaster.

Aromolaran O, Isewon I, Adedeji E, Oswald M, Adebiyi E, Koenig R PLoS One. 2023; 18(8):e0288023.

PMID: 37556452 PMC: 10411809. DOI: 10.1371/journal.pone.0288023.


Predicting and explaining the impact of genetic disruptions and interactions on organismal viability.

Al-Anzi B, Khajah M, Fakhraldeen S Bioinformatics. 2022; 38(17):4088-4099.

PMID: 35861390 PMC: 9438956. DOI: 10.1093/bioinformatics/btac519.


References
1.
Koonin E, Fedorova N, Jackson J, Jacobs A, Krylov D, Makarova K . A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol. 2004; 5(2):R7. PMC: 395751. DOI: 10.1186/gb-2004-5-2-r7. View

2.
Rancati G, Moffat J, Typas A, Pavelka N . Emerging and evolving concepts in gene essentiality. Nat Rev Genet. 2017; 19(1):34-49. DOI: 10.1038/nrg.2017.74. View

3.
Coulomb S, Bauer M, Bernard D, Marsolier-Kergoat M . Gene essentiality and the topology of protein interaction networks. Proc Biol Sci. 2005; 272(1573):1721-5. PMC: 1559853. DOI: 10.1098/rspb.2005.3128. View

4.
Seringhaus M, Paccanaro A, Borneman A, Snyder M, Gerstein M . Predicting essential genes in fungal genomes. Genome Res. 2006; 16(9):1126-35. PMC: 1557763. DOI: 10.1101/gr.5144106. View

5.
Cherry J, Adler C, Ball C, Chervitz S, Dwight S, Hester E . SGD: Saccharomyces Genome Database. Nucleic Acids Res. 1998; 26(1):73-9. PMC: 147204. DOI: 10.1093/nar/26.1.73. View