» Articles » PMID: 24939910

Multiple Evidence Strands Suggest That There May Be As Few As 19,000 Human Protein-coding Genes

Overview
Journal Hum Mol Genet
Date 2014 Jun 19
PMID 24939910
Citations 205
Authors
Affiliations
Soon will be listed here.
Abstract

Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein-coding potential is the detection of cellular protein expression through peptide mass spectrometry (MS) experiments. Here, we mapped peptides detected in seven large-scale proteomics studies to almost 60% of the protein-coding genes in the GENCODE annotation of the human genome. We found a strong relationship between detection in proteomics experiments and both gene family age and cross-species conservation. Most of the genes for which we detected peptides were highly conserved. We found peptides for >96% of genes that evolved before bilateria. At the opposite end of the scale, we identified almost no peptides for genes that have appeared since primates, for genes that did not have any protein-like features or for genes with poor cross-species conservation. These results motivated us to describe a set of 2001 potential non-coding genes based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments. We identified peptides for just 3% of these genes. We show that many of these genes behave more like non-coding genes than protein-coding genes and suggest that most are unlikely to code for proteins under normal circumstances. We believe that their inclusion in the human protein-coding gene catalogue should be revised as part of the ongoing human genome annotation effort.

Citing Articles

Three- and four-stranded nucleic acid structures and their ligands.

Hashimoto Y, Shil S, Tsuruta M, Kawauchi K, Miyoshi D RSC Chem Biol. 2025; .

PMID: 40007865 PMC: 11848209. DOI: 10.1039/d4cb00287c.


More than 2,500 coding genes in the human reference gene set still have unsettled status.

Maquedano M, Cerdan-Velez D, Tress M bioRxiv. 2024; .

PMID: 39713347 PMC: 11661123. DOI: 10.1101/2024.12.05.626965.


A deep audit of the PeptideAtlas database uncovers evidence for unannotated coding genes and aberrant translation.

Rodriguez J, Maquedano M, Cerdan-Velez D, Calvo E, Vazquez J, Tress M bioRxiv. 2024; .

PMID: 39605392 PMC: 11601488. DOI: 10.1101/2024.11.14.623419.


Advancements in Single-Cell Proteomics and Mass Spectrometry-Based Techniques for Unmasking Cellular Diversity in Triple Negative Breast Cancer.

Nalla L, Kanukolanu A, Yeduvaka M, Gajula S Proteomics Clin Appl. 2024; 19(1):e202400101.

PMID: 39568435 PMC: 11726282. DOI: 10.1002/prca.202400101.


The influence of lifestyle and environmental factors on host resilience through a homeostatic skin microbiota: An EAACI Task Force Report.

Kortekaas Krohn I, Callewaert C, Belasri H, De Pessemier B, Lopez C, Mortz C Allergy. 2024; 79(12):3269-3284.

PMID: 39485000 PMC: 11657040. DOI: 10.1111/all.16378.


References
1.
Cannarozzi G, Schneider A, Gonnet G . A phylogenomic study of human, dog, and mouse. PLoS Comput Biol. 2007; 3(1):e2. PMC: 1761043. DOI: 10.1371/journal.pcbi.0030002. View

2.
Uhlen M, Oksvold P, Fagerberg L, Lundberg E, Jonasson K, Forsberg M . Towards a knowledge-based Human Protein Atlas. Nat Biotechnol. 2010; 28(12):1248-50. DOI: 10.1038/nbt1210-1248. View

3.
Lindblad-Toh K, Garber M, Zuk O, Lin M, Parker B, Washietl S . A high-resolution map of human evolutionary constraint using 29 mammals. Nature. 2011; 478(7370):476-82. PMC: 3207357. DOI: 10.1038/nature10530. View

4.
Munoz J, Low T, Kok Y, Chin A, Frese C, Ding V . The quantitative proteomes of human-induced pluripotent stem cells and embryonic stem cells. Mol Syst Biol. 2011; 7:550. PMC: 3261715. DOI: 10.1038/msb.2011.84. View

5.
Harrow J, Frankish A, Gonzalez J, Tapanari E, Diekhans M, Kokocinski F . GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012; 22(9):1760-74. PMC: 3431492. DOI: 10.1101/gr.135350.111. View