» Articles » PMID: 37636259

Fully Automated Annotation of Mitochondrial Genomes Using a Cluster-based Approach with De Bruijn Graphs

Overview
Journal Front Genet
Date 2023 Aug 28
PMID 37636259
Authors
Affiliations
Soon will be listed here.
Abstract

A wide range of scientific fields, such as forensics, anthropology, medicine, and molecular evolution, benefits from the analysis of mitogenomic data. With the development of new sequencing technologies, the amount of mitochondrial sequence data to be analyzed has increased exponentially over the last few years. The accurate annotation of mitochondrial DNA is a prerequisite for any mitogenomic comparative analysis. To sustain with the growth of the available mitochondrial sequence data, highly efficient automatic computational methods are, hence, needed. Automatic annotation methods are typically based on databases that contain information about already annotated (and often pre-curated) mitogenomes of different species. However, the existing approaches have several shortcomings: 1) they do not scale well with the size of the database; 2) they do not allow for a fast (and easy) update of the database; and 3) they can only be applied to a relatively small taxonomic subset of all species. Here, we present a novel approach that does not have any of these aforementioned shortcomings, (1), (2), and (3). The reference database of mitogenomes is represented as a richly annotated de Bruijn graph. To generate gene predictions for a new user-supplied mitogenome, the method utilizes a clustering routine that uses the mapping information of the provided sequence to this graph. The method is implemented in a software package called DeGeCI Bruijn graph ne luster dentification). For a large set of mitogenomes, for which expert-curated annotations are available, DeGeCI generates gene predictions of high conformity. In a comparative evaluation with MITOS2, a state-of-the-art annotation tool for mitochondrial genomes, DeGeCI shows better database scalability while still matching MITOS2 in terms of result quality and providing a fully automated means to update the underlying database. Moreover, unlike MITOS2, DeGeCI can be run in parallel on several processors to make use of modern multi-processor systems.

Citing Articles

DeGeCI 1.1: a web platform for gene annotation of mitochondrial genomes.

Fiedler L, Bernt M, Middendorf M Bioinform Adv. 2024; 4(1):vbae072.

PMID: 38799704 PMC: 11116735. DOI: 10.1093/bioadv/vbae072.

References
1.
Benson D, Lipman D, Ostell J, Rapp B, Wheeler D . GenBank. Nucleic Acids Res. 1999; 28(1):15-8. PMC: 102453. DOI: 10.1093/nar/28.1.15. View

2.
Iwasaki W, Fukunaga T, Isagozawa R, Yamada K, Maeda Y, Satoh T . MitoFish and MitoAnnotator: a mitochondrial genome database of fish with an accurate and automatic annotation pipeline. Mol Biol Evol. 2013; 30(11):2531-40. PMC: 3808866. DOI: 10.1093/molbev/mst141. View

3.
Lowe T, Eddy S . tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997; 25(5):955-64. PMC: 146525. DOI: 10.1093/nar/25.5.955. View

4.
Eddy S . A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure. BMC Bioinformatics. 2002; 3:18. PMC: 119854. DOI: 10.1186/1471-2105-3-18. View

5.
Zerbino D, Birney E . Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008; 18(5):821-9. PMC: 2336801. DOI: 10.1101/gr.074492.107. View