Gene Index Analysis of the Human Genome Estimates Approximately 120,000 Genes
Affiliations
Although sequencing of the human genome will soon be completed, gene identification and annotation remains a challenge. Early estimates suggested that there might be 60,000-100,000 (ref. 1) human genes, but recent analyses of the available data from EST sequencing projects have estimated as few as 45,000 (ref. 2) or as many as 140, 000 (ref. 3) distinct genes. The Chromosome 22 Sequencing Consortium estimated a minimum of 45,000 genes based on their annotation of the complete chromosome, although their data suggests there may be additional genes. The nearly 2,000,000 human ESTs in dbEST provide an important resource for gene identification and genome annotation, but these single-pass sequences must be carefully analysed to remove contaminating sequences, including those from genomic DNA, spurious transcription, and vector and bacterial sequences. We have developed a highly refined and rigorously tested protocol for cleaning, clustering and assembling EST sequences to produce high-fidelity consensus sequences for the represented genes (F.L. et al., manuscript submitted) and used this to create the TIGR Gene Indices-databases of expressed genes for human, mouse, rat and other species (http://www.tigr.org/tdb/tgi.html). Using highly refined and tested algorithms for EST analysis, we have arrived at two independent estimates indicating the human genome contains approximately 120,000 genes.
Evidence for widespread translation of 5' untranslated regions.
Rodriguez J, Abascal F, Cerdan-Velez D, Gomez L, Vazquez J, Tress M Nucleic Acids Res. 2024; 52(14):8112-8126.
PMID: 38953162 PMC: 11317171. DOI: 10.1093/nar/gkae571.
Alternative Transcripts Diversify Genome Function for Phenome Relevance to Health and Diseases.
Carrion S, Michal J, Jiang Z Genes (Basel). 2023; 14(11).
PMID: 38002994 PMC: 10671453. DOI: 10.3390/genes14112051.
Genome annotation: From human genetics to biodiversity genomics.
Guigo R Cell Genom. 2023; 3(8):100375.
PMID: 37601977 PMC: 10435374. DOI: 10.1016/j.xgen.2023.100375.
Non-coding RNA-related antitumor mechanisms of marine-derived agents.
Zhou Z, Cao Q, Diao Y, Wang Y, Long L, Wang S Front Pharmacol. 2022; 13:1053556.
PMID: 36532760 PMC: 9752855. DOI: 10.3389/fphar.2022.1053556.
Omics Data and Data Representations for Deep Learning-Based Predictive Modeling.
Tsimenidis S, Vrochidou E, Papakostas G Int J Mol Sci. 2022; 23(20).
PMID: 36293133 PMC: 9603455. DOI: 10.3390/ijms232012272.