» Articles » PMID: 19098097

Discovery and Revision of Arabidopsis Genes by Proteogenomics

Overview
Specialty Science
Date 2008 Dec 23
PMID 19098097
Citations 128
Authors
Affiliations
Soon will be listed here.
Abstract

Gene annotation underpins genome science. Most often protein coding sequence is inferred from the genome based on transcript evidence and computational predictions. While generally correct, gene models suffer from errors in reading frame, exon border definition, and exon identification. To ascertain the error rate of Arabidopsis thaliana gene models, we isolated proteins from a sample of Arabidopsis tissues and determined the amino acid sequences of 144,079 distinct peptides by tandem mass spectrometry. The peptides corresponded to 1 or more of 3 different translations of the genome: a 6-frame translation, an exon splice-graph, and the currently annotated proteome. The majority of the peptides (126,055) resided in existing gene models (12,769 confirmed proteins), comprising 40% of annotated genes. Surprisingly, 18,024 novel peptides were found that do not correspond to annotated genes. Using the gene finding program AUGUSTUS and 5,426 novel peptides that occurred in clusters, we discovered 778 new protein-coding genes and refined the annotation of an additional 695 gene models. The remaining 13,449 novel peptides provide high quality annotation (>99% correct) for thousands of additional genes. Our observation that 18,024 of 144,079 peptides did not match current gene models suggests that 13% of the Arabidopsis proteome was incomplete due to approximately equal numbers of missing and incorrect gene models.

Citing Articles

Plant genome information facilitates plant functional genomics.

Bernal-Gallardo J, de Folter S Planta. 2024; 259(5):117.

PMID: 38592421 PMC: 11004055. DOI: 10.1007/s00425-024-04397-z.


The role of the AP-1 adaptor complex in outgoing and incoming membrane traffic.

Robinson M, Antrobus R, Sanger A, Davies A, Gershlick D J Cell Biol. 2024; 223(7).

PMID: 38578286 PMC: 10996651. DOI: 10.1083/jcb.202310071.


Protein nonadditive expression and solubility contribute to heterosis in hybrids and allotetraploids.

June V, Xu D, Papoulas O, Boutz D, Marcotte E, Chen Z Front Plant Sci. 2023; 14:1252564.

PMID: 37780492 PMC: 10538547. DOI: 10.3389/fpls.2023.1252564.


Deep Proteogenomics of a Photosynthetic Cyanobacterium.

Spat P, Krauspe V, Hess W, Macek B, Nalpas N J Proteome Res. 2023; 22(6):1969-1983.

PMID: 37146978 PMC: 10243305. DOI: 10.1021/acs.jproteome.3c00065.


PepQuery2 democratizes public MS proteomics data for rapid peptide searching.

Wen B, Zhang B Nat Commun. 2023; 14(1):2213.

PMID: 37072382 PMC: 10113256. DOI: 10.1038/s41467-023-37462-4.


References
1.
Jiang N, Bao Z, Zhang X, Eddy S, Wessler S . Pack-MULE transposable elements mediate gene evolution in plants. Nature. 2004; 431(7008):569-73. DOI: 10.1038/nature02953. View

2.
Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B . AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 2006; 34(Web Server issue):W435-9. PMC: 1538822. DOI: 10.1093/nar/gkl200. View

3.
Brunner E, Ahrens C, Mohanty S, Baetschmann H, Loevenich S, Potthast F . A high-quality catalog of the Drosophila melanogaster proteome. Nat Biotechnol. 2007; 25(5):576-83. DOI: 10.1038/nbt1300. View

4.
Baerenfaller K, Grossmann J, Grobei M, Hull R, Hirsch-Hoffmann M, Yalovsky S . Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics. Science. 2008; 320(5878):938-41. DOI: 10.1126/science.1157956. View

5.
Kellis M, Patterson N, Endrizzi M, Birren B, Lander E . Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature. 2003; 423(6937):241-54. DOI: 10.1038/nature01644. View