» Articles » PMID: 35588743

Mixing Genome Annotation Methods in a Comparative Analysis Inflates the Apparent Number of Lineage-specific Genes

Overview
Journal Curr Biol
Publisher Cell Press
Specialty Biology
Date 2022 May 19
PMID 35588743
Authors
Affiliations
Soon will be listed here.
Abstract

Comparisons of genomes of different species are used to identify lineage-specific genes, those genes that appear unique to one species or clade. Lineage-specific genes are often thought to represent genetic novelty that underlies unique adaptations. Identification of these genes depends not only on genome sequences, but also on inferred gene annotations. Comparative analyses typically use available genomes that have been annotated using different methods, increasing the risk that orthologous DNA sequences may be erroneously annotated as a gene in one species but not another, appearing lineage specific as a result. To evaluate the impact of such "annotation heterogeneity," we identified four clades of species with sequenced genomes with more than one publicly available gene annotation, allowing us to compare the number of lineage-specific genes inferred when differing annotation methods are used to those resulting when annotation method is uniform across the clade. In these case studies, annotation heterogeneity increases the apparent number of lineage-specific genes by up to 15-fold, suggesting that annotation heterogeneity is a substantial source of potential artifact.

Citing Articles

Gene novelty and gene family expansion in the early evolution of Lepidoptera.

Hoile A, Holland P, Mulhair P BMC Genomics. 2025; 26(1):161.

PMID: 39966712 PMC: 11837612. DOI: 10.1186/s12864-025-11338-x.


Convergent Evolution and Predictability of Gene Copy Numbers Associated with Diets in Mammals.

Wilhoit K, Yamanouchi S, Chen B, Yamasaki Y, Ishikawa A, Inoue J Genome Biol Evol. 2025; 17(2).

PMID: 39849899 PMC: 11797053. DOI: 10.1093/gbe/evaf008.


MATEdb2, a Collection of High-Quality Metazoan Proteomes across the Animal Tree of Life to Speed Up Phylogenomic Studies.

Martinez-Redondo G, Vargas-Chavez C, Eleftheriadi K, Benitez-Alvarez L, Vazquez-Valls M, Fernandez R Genome Biol Evol. 2024; 16(11).

PMID: 39540856 PMC: 11534026. DOI: 10.1093/gbe/evae235.


Orphan genes are not a distinct biological entity.

Pereira A, Marano M, Bathala R, Zaragoza R, Neira A, Samano A Bioessays. 2024; 47(1):e2400146.

PMID: 39491810 PMC: 11662153. DOI: 10.1002/bies.202400146.


The Highly Repetitive Genome of Myxobolus rasmusseni, an Emerging Myxozoan Parasite of Fathead Minnows.

Muthye V, Leon Coria A, Liu H, Goater C, Finney C, Wasmuth J Genome Biol Evol. 2024; 16(11).

PMID: 39403974 PMC: 11557904. DOI: 10.1093/gbe/evae220.


References
1.
McLysaght A, Guerzoni D . New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation. Philos Trans R Soc Lond B Biol Sci. 2015; 370(1678):20140332. PMC: 4571571. DOI: 10.1098/rstb.2014.0332. View

2.
Weisman C, Murray A, Eddy S . Many, but not all, lineage-specific genes can be explained by homology detection failure. PLoS Biol. 2020; 18(11):e3000862. PMC: 7660931. DOI: 10.1371/journal.pbio.3000862. View

3.
Giraldo-Calderon G, Emrich S, MacCallum R, Maslen G, Dialynas E, Topalis P . VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases. Nucleic Acids Res. 2014; 43(Database issue):D707-13. PMC: 4383932. DOI: 10.1093/nar/gku1117. View

4.
McLysaght A, Hurst L . Open questions in the study of de novo genes: what, how and why. Nat Rev Genet. 2016; 17(9):567-78. DOI: 10.1038/nrg.2016.78. View

5.
Wilson B, Foy S, Neme R, Masel J . Young Genes are Highly Disordered as Predicted by the Preadaptation Hypothesis of Gene Birth. Nat Ecol Evol. 2017; 1(6):0146-146. PMC: 5476217. DOI: 10.1038/s41559-017-0146. View