» Articles » PMID: 37090650

GALBA: Genome Annotation with Miniprot and AUGUSTUS

Overview
Journal bioRxiv
Date 2023 Apr 24
PMID 37090650
Authors
Affiliations
Soon will be listed here.
Abstract

The Earth Biogenome Project has rapidly increased the number of available eukaryotic genomes, but most released genomes continue to lack annotation of protein-coding genes. In addition, no transcriptome data is available for some genomes. Various gene annotation tools have been developed but each has its limitations. Here, we introduce GALBA, a fully automated pipeline that utilizes miniprot, a rapid protein- to-genome aligner, in combination with AUGUSTUS to predict genes with high accuracy. Accuracy results indicate that GALBA is particularly strong in the annotation of large vertebrate genomes. We also present use cases in insects, vertebrates, and a previously unannotated land plant. GALBA is fully open source and available as a docker image for easy execution with Singularity in high-performance computing environments. Our pipeline addresses the critical need for accurate gene annotation in newly sequenced genomes, and we believe that GALBA will greatly facilitate genome annotation for diverse organisms.

Citing Articles

The nuclear and mitochondrial genome assemblies of Tetragonisca angustula (Apidae: Meliponini), a tiny yet remarkable pollinator in the Neotropics.

Ferrari R, Ricardo P, Dias F, de Souza Araujo N, Soares D, Zhou Q BMC Genomics. 2024; 25(1):587.

PMID: 38862915 PMC: 11167848. DOI: 10.1186/s12864-024-10502-z.

References
1.
Chen N . Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics. 2008; Chapter 4:Unit 4.10. DOI: 10.1002/0471250953.bi0410s05. View

2.
Kuznetsov D, Tegenfeldt F, Manni M, Seppey M, Berkeley M, Kriventseva E . OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity. Nucleic Acids Res. 2022; 51(D1):D445-D451. PMC: 9825584. DOI: 10.1093/nar/gkac998. View

3.
Keilwagen J, Wenk M, Erickson J, Schattat M, Grau J, Hartung F . Using intron position conservation for homology-based gene prediction. Nucleic Acids Res. 2016; 44(9):e89. PMC: 4872089. DOI: 10.1093/nar/gkw092. View

4.
Holt C, Yandell M . MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics. 2011; 12:491. PMC: 3280279. DOI: 10.1186/1471-2105-12-491. View

5.
Konig S, Romoth L, Stanke M . Comparative Genome Annotation. Methods Mol Biol. 2017; 1704:189-212. DOI: 10.1007/978-1-4939-7463-4_6. View