» Articles » PMID: 39404012

Quest for Orthologs in the Era of Biodiversity Genomics

Abstract

The era of biodiversity genomics is characterized by large-scale genome sequencing efforts that aim to represent each living taxon with an assembled genome. Generating knowledge from this wealth of data has not kept up with this pace. We here discuss major challenges to integrating these novel genomes into a comprehensive functional and evolutionary network spanning the tree of life. In summary, the expanding datasets create a need for scalable gene annotation methods. To trace gene function across species, new methods must seek to increase the resolution of ortholog analyses, e.g. by extending analyses to the protein domain level and by accounting for alternative splicing. Additionally, the scope of orthology prediction should be pushed beyond well-investigated proteomes. This demands the development of specialized methods for the identification of orthologs to short proteins and noncoding RNAs and for the functional characterization of novel gene families. Furthermore, protein structures predicted by machine learning are now readily available, but this new information is yet to be integrated with orthology-based analyses. Finally, an increasing focus should be placed on making orthology assignments adhere to the findable, accessible, interoperable, and reusable (FAIR) principles. This fosters green bioinformatics by avoiding redundant computations and helps integrating diverse scientific communities sharing the need for comparative genetics and genomics information. It should also help with communicating orthology-related concepts in a format that is accessible to the public, to counteract existing misinformation about evolution.

Citing Articles

EvANI benchmarking workflow for evolutionary distance estimation.

Majidian S, Hwang S, Zakeri M, Langmead B bioRxiv. 2025; .

PMID: 40027788 PMC: 11870633. DOI: 10.1101/2025.02.23.639716.

References
1.
Grealey J, Lannelongue L, Saw W, Marten J, Meric G, Ruiz-Carmona S . The Carbon Footprint of Bioinformatics. Mol Biol Evol. 2022; 39(3). PMC: 8892942. DOI: 10.1093/molbev/msac034. View

2.
Blatter M, Zahn-Zabal M, Moix S, Pichon B, Dessimoz C, Glover N . Bringing science to the public in the light of evolution. Biol Methods Protoc. 2023; 8(1):bpad040. PMC: 10752581. DOI: 10.1093/biomethods/bpad040. View

3.
Manni M, Berkeley M, Seppey M, Zdobnov E . BUSCO: Assessing Genomic Data Quality and Beyond. Curr Protoc. 2021; 1(12):e323. DOI: 10.1002/cpz1.323. View

4.
Cosentino S, Sriswasdi S, Iwasaki W . SonicParanoid2: fast, accurate, and comprehensive orthology inference with machine learning and language models. Genome Biol. 2024; 25(1):195. PMC: 11270883. DOI: 10.1186/s13059-024-03298-4. View

5.
Mattick J, Amaral P, Carninci P, Carpenter S, Chang H, Chen L . Long non-coding RNAs: definitions, functions, challenges and recommendations. Nat Rev Mol Cell Biol. 2023; 24(6):430-447. PMC: 10213152. DOI: 10.1038/s41580-022-00566-8. View