» Articles » PMID: 19758426

Towards the Prediction of Essential Genes by Integration of Network Topology, Cellular Localization and Biological Process Information

Overview
Publisher Biomed Central
Specialty Biology
Date 2009 Sep 18
PMID 19758426
Citations 55
Authors
Affiliations
Soon will be listed here.
Abstract

Background: The identification of essential genes is important for the understanding of the minimal requirements for cellular life and for practical purposes, such as drug design. However, the experimental techniques for essential genes discovery are labor-intensive and time-consuming. Considering these experimental constraints, a computational approach capable of accurately predicting essential genes would be of great value. We therefore present here a machine learning-based computational approach relying on network topological features, cellular localization and biological process information for prediction of essential genes.

Results: We constructed a decision tree-based meta-classifier and trained it on datasets with individual and grouped attributes-network topological features, cellular compartments and biological processes-to generate various predictors of essential genes. We showed that the predictors with better performances are those generated by datasets with integrated attributes. Using the predictor with all attributes, i.e., network topological features, cellular compartments and biological processes, we obtained the best predictor of essential genes that was then used to classify yeast genes with unknown essentiality status. Finally, we generated decision trees by training the J48 algorithm on datasets with all network topological features, cellular localization and biological process information to discover cellular rules for essentiality. We found that the number of protein physical interactions, the nuclear localization of proteins and the number of regulating transcription factors are the most important factors determining gene essentiality.

Conclusion: We were able to demonstrate that network topological features, cellular localization and biological process information are reliable predictors of essential genes. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing essentiality.

Citing Articles

Essential genes identification model based on sequence feature map and graph convolutional neural network.

Hu W, Li M, Xiao H, Guan L BMC Genomics. 2024; 25(1):47.

PMID: 38200437 PMC: 10777564. DOI: 10.1186/s12864-024-09958-w.


Essential proteins discovery based on dominance relationship and neighborhood similarity centrality.

Li G, Luo X, Hu Z, Wu J, Peng W, Liu J Health Inf Sci Syst. 2023; 11(1):55.

PMID: 37981988 PMC: 10654316. DOI: 10.1007/s13755-023-00252-9.


An unsupervised deep learning framework for predicting human essential genes from population and functional genomic data.

LaPolice T, Huang Y BMC Bioinformatics. 2023; 24(1):347.

PMID: 37723435 PMC: 10506225. DOI: 10.1186/s12859-023-05481-z.


Machine learning on large scale perturbation screens for SARS-CoV-2 host factors identifies β-catenin/CBP inhibitor PRI-724 as a potent antiviral.

Kelch M, Vera-Guapi A, Beder T, Oswald M, Hiemisch A, Beil N Front Microbiol. 2023; 14:1193320.

PMID: 37342561 PMC: 10277617. DOI: 10.3389/fmicb.2023.1193320.


Comparative analysis of gene ontology-based semantic similarity measurements for the application of identifying essential proteins.

Xue X, Zhang W, Fan A PLoS One. 2023; 18(4):e0284274.

PMID: 37083829 PMC: 10121005. DOI: 10.1371/journal.pone.0284274.


References
1.
Cullen L, Arndt G . Genome-wide screening for gene function using RNAi in mammalian cells. Immunol Cell Biol. 2005; 83(3):217-23. DOI: 10.1111/j.1440-1711.2005.01332.x. View

2.
Itaya M . An estimation of minimal genome size required for life. FEBS Lett. 1995; 362(3):257-60. DOI: 10.1016/0014-5793(95)00233-y. View

3.
RAY B, White C, Haber J . The TSM1 gene of Saccharomyces cerevisiae overlaps the MAT locus. Curr Genet. 1991; 20(1-2):25-31. DOI: 10.1007/BF00312761. View

4.
Febres D, Pramanik A, Caton M, Doherty K, McKoy J, Garcia E . The novel BLM3 gene encodes a protein that protects against lethal effects of oxidative damage. Cell Mol Biol (Noisy-le-grand). 2002; 47(7):1149-62. View

5.
Seraphin B . Sm and Sm-like proteins assemble in two related complexes of deep evolutionary origin. EMBO J. 1999; 18(12):3451-62. PMC: 1171424. DOI: 10.1093/emboj/18.12.3451. View