» Articles » PMID: 39212967

De Novo Emerged Gene Search in Eukaryotes with DENSE

Overview
Date 2024 Aug 30
PMID 39212967
Authors
Affiliations
Soon will be listed here.
Abstract

The discovery of de novo emerged genes, originating from previously noncoding DNA regions, challenges traditional views of species evolution. Indeed, the hypothesis of neutrally evolving sequences giving rise to functional proteins is highly unlikely. This conundrum has sparked numerous studies to quantify and characterize these genes, aiming to understand their functional roles and contributions to genome evolution. Yet, no fully automated pipeline for their identification is available. Therefore, we introduce DENSE (DE Novo emerged gene SEarch), an automated Nextflow pipeline based on two distinct steps: detection of taxonomically restricted genes (TRGs) through phylostratigraphy, and filtering of TRGs for de novo emerged genes via genome comparisons and synteny search. DENSE is available as a user-friendly command-line tool, while the second step is accessible through a web server upon providing a list of TRGs. Highly flexible, DENSE provides various strategy and parameter combinations, enabling users to adapt to specific configurations or define their own strategy through a rational framework, facilitating protocol communication, and study interoperability. We apply DENSE to seven model organisms, exploring the impact of its strategies and parameters on de novo gene predictions. This thorough analysis across species with different evolutionary rates reveals useful metrics for users to define input datasets, identify favorable/unfavorable conditions for de novo gene detection, and control potential biases in genome annotations. Additionally, predictions made for the seven model organisms are compiled into a requestable database, which we hope will serve as a reference for de novo emerged gene lists generated with specific criteria combinations.

Citing Articles

Orphan genes are not a distinct biological entity.

Pereira A, Marano M, Bathala R, Zaragoza R, Neira A, Samano A Bioessays. 2024; 47(1):e2400146.

PMID: 39491810 PMC: 11662153. DOI: 10.1002/bies.202400146.

References
1.
Zhang L, Ren Y, Yang T, Li G, Chen J, Gschwend A . Rapid evolution of protein diversity by de novo origination in Oryza. Nat Ecol Evol. 2019; 3(4):679-690. DOI: 10.1038/s41559-019-0822-5. View

2.
Ranz J, Casals F, Ruiz A . How malleable is the eukaryotic genome? Extreme rate of chromosomal rearrangement in the genus Drosophila. Genome Res. 2001; 11(2):230-9. PMC: 311025. DOI: 10.1101/gr.162901. View

3.
Yang Z . PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007; 24(8):1586-91. DOI: 10.1093/molbev/msm088. View

4.
Cai J, Zhao R, Jiang H, Wang W . De novo origination of a new protein-coding gene in Saccharomyces cerevisiae. Genetics. 2008; 179(1):487-96. PMC: 2390625. DOI: 10.1534/genetics.107.084491. View

5.
Liu D, Hunt M, Tsai I . Inferring synteny between genome assemblies: a systematic evaluation. BMC Bioinformatics. 2018; 19(1):26. PMC: 5791376. DOI: 10.1186/s12859-018-2026-4. View