» Articles » PMID: 37601977

Genome Annotation: From Human Genetics to Biodiversity Genomics

Overview
Journal Cell Genom
Date 2023 Aug 21
PMID 37601977
Authors
Affiliations
Soon will be listed here.
Abstract

Within the next decade, the genomes of 1.8 million eukaryotic species will be sequenced. Identifying genes in these sequences is essential to understand the biology of the species. This is challenging due to the transcriptional complexity of eukaryotic genomes, which encode hundreds of thousands of transcripts of multiple types. Among these, a small set of protein-coding mRNAs play a disproportionately large role in defining phenotypes. Due to their sequence conservation, orthology can be established, making it possible to define the universal catalog of eukaryotic protein-coding genes. This catalog should substantially contribute to uncovering the genomic events underlying the emergence of eukaryotic phenotypes. This piece briefly reviews the basics of protein-coding gene prediction, discusses challenges in finalizing annotation of the human genome, and proposes strategies for producing annotations across the eukaryotic Tree of Life. This lays the groundwork for obtaining the catalog of all genes-the Earth's code of life.

Citing Articles

Hookworm genes encoding intestinal excreted-secreted proteins are transcriptionally upregulated in response to the host's immune system.

Schwarz E, Noon J, Chicca J, Garceau C, Li H, Antoshechkin I bioRxiv. 2025; .

PMID: 39975173 PMC: 11838427. DOI: 10.1101/2025.02.01.636063.


GENCODE: massively expanding the lncRNA catalog through capture long-read RNA sequencing.

Kaur G, Perteghella T, Carbonell-Sala S, Gonzalez-Martinez J, Hunt T, Madry T bioRxiv. 2024; .

PMID: 39554180 PMC: 11565817. DOI: 10.1101/2024.10.29.620654.


Quest for Orthologs in the Era of Biodiversity Genomics.

Langschied F, Bordin N, Cosentino S, Fuentes-Palacios D, Glover N, Hiller M Genome Biol Evol. 2024; 16(10).

PMID: 39404012 PMC: 11523110. DOI: 10.1093/gbe/evae224.


The Catalan initiative for the Earth BioGenome Project: contributing local data to global biodiversity genomics.

Corominas M, Marques-Bonet T, Arnedo M, Bayes M, Belmonte J, Escriva H NAR Genom Bioinform. 2024; 6(3):lqae075.

PMID: 39022326 PMC: 11252852. DOI: 10.1093/nargab/lqae075.


and genes independently evolved RNA structures to control unproductive splicing.

Petrova M, Margasyuk S, Vorobeva M, Skvortsov D, Dontsova O, Pervouchine D NAR Genom Bioinform. 2024; 6(1):lqad113.

PMID: 38226395 PMC: 10789245. DOI: 10.1093/nargab/lqad113.

References
1.
Hosmani P, Shippy T, Miller S, Benoit J, Munoz-Torres M, Flores-Gonzalez M . A quick guide for student-driven community genome annotation. PLoS Comput Biol. 2019; 15(4):e1006682. PMC: 6447164. DOI: 10.1371/journal.pcbi.1006682. View

2.
Sakurai A, Fujimori S, Kochiwa H, Washio T, Saito R, Carninci P . On biased distribution of introns in various eukaryotes. Gene. 2002; 300(1-2):89-95. DOI: 10.1016/s0378-1119(02)01035-1. View

3.
Gramates L, Agapite J, Attrill H, Calvi B, Crosby M, Dos Santos G . FlyBase: a guided tour of highlighted features. Genetics. 2022; 220(4). PMC: 8982030. DOI: 10.1093/genetics/iyac035. View

4.
Gingeras T . Implications of chimaeric non-co-linear transcripts. Nature. 2009; 461(7261):206-11. PMC: 4020519. DOI: 10.1038/nature08452. View

5.
Forrest A, Kawaji H, Rehli M, Baillie J, de Hoon M, Haberle V . A promoter-level mammalian expression atlas. Nature. 2014; 507(7493):462-70. PMC: 4529748. DOI: 10.1038/nature13182. View