HPC-T-Annotator: an HPC Tool for De Novo Transcriptome Assembly Annotation

Overview

Journal BMC Bioinformatics

Publisher Biomed Central

Specialty Biology

Date 2024 Aug 21

PMID 39169276

Authors

Lorenzo Arcioni

Manuel Arcieri

Jessica Di Martino

Franco Liberati

Paolo Bottoni

Tiziana Castrignano

Affiliations

Soon will be listed here.

Abstract

Background: The availability of transcriptomic data for species without a reference genome enables the construction of de novo transcriptome assemblies as alternative reference resources from RNA-Seq data. A transcriptome provides direct information about a species' protein-coding genes under specific experimental conditions. The de novo assembly process produces a unigenes file in FASTA format, subsequently targeted for the annotation. Homology-based annotation, a method to infer the function of sequences by estimating similarity with other sequences in a reference database, is a computationally demanding procedure.

Results: To mitigate the computational burden, we introduce HPC-T-Annotator, a tool for de novo transcriptome homology annotation on high performance computing (HPC) infrastructures, designed for straightforward configuration via a Web interface. Once the configuration data are given, the entire parallel computing software for annotation is automatically generated and can be launched on a supercomputer using a simple command line. The output data can then be easily viewed using post-processing utilities in the form of Python notebooks integrated in the proposed software.

Conclusions: HPC-T-Annotator expedites homology-based annotation in de novo transcriptome assemblies. Its efficient parallelization strategy on HPC infrastructures significantly reduces computational load and execution times, enabling large-scale transcriptome analysis and comparison projects, while its intuitive graphical interface extends accessibility to users without IT skills.

Citing Articles

De novo transcriptome assembly of the Mediterranean sea-rock pool mosquitoes Aedes mariae and Aedes zammitii.

Mastrantonio V, Porretta D, Liberati F, Bisconti R, Castrignano T, Canestrelli D Sci Data. 2025; 12(1):115.

PMID: 39833234 PMC: 11746941. DOI: 10.1038/s41597-025-04393-2.

References

Zhang J, Wang H, Feng W . cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on CPU+GPU. IEEE/ACM Trans Comput Biol Bioinform. 2015; 14(4):830-843. DOI: 10.1109/TCBB.2015.2489662. View

Pruitt K, Tatusova T, Maglott D . NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2006; 35(Database issue):D61-5. PMC: 1716718. DOI: 10.1093/nar/gkl842. View

Libro P, Chiocchio A, De Rysky E, Martino J, Bisconti R, Castrignano T . De novo transcriptome assembly and annotation for gene discovery in Salamandra salamandra at the larval stage. Sci Data. 2023; 10(1):330. PMC: 10224929. DOI: 10.1038/s41597-023-02217-9. View

Palomba M, Libro P, Martino J, Rughetti A, Santoro M, Mattiucci S . De novo transcriptome assembly and annotation of the third stage larvae of the zoonotic parasite Anisakis pegreffii. BMC Res Notes. 2022; 15(1):223. PMC: 9233829. DOI: 10.1186/s13104-022-06099-9. View

Bolis M, Garattini E, Paroni G, Zanetti A, Kurosaki M, Castrignano T . Network-guided modeling allows tumor-type independent prediction of sensitivity to all-trans-retinoic acid. Ann Oncol. 2016; 28(3):611-621. PMC: 5834014. DOI: 10.1093/annonc/mdw660. View

Jackson D, Cerveau N, Posnien N . De novo assembly of transcriptomes and differential gene expression analysis using short-read data from emerging model organisms - a brief guide. Front Zool. 2024; 21(1):17. PMC: 11188175. DOI: 10.1186/s12983-024-00538-y. View

Buchfink B, Reuter K, Drost H . Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods. 2021; 18(4):366-368. PMC: 8026399. DOI: 10.1038/s41592-021-01101-x. View

Sato M, Seki M, Suzuki Y, Ueki S . The dataset of de novo assembly and inferred functional annotation of the transcriptome of , a bloom-forming, cosmopolitan raphidophyte. Data Brief. 2023; 48:109071. PMC: 10090238. DOI: 10.1016/j.dib.2023.109071. View

Chiocchio A, Libro P, Martino G, Bisconti R, Castrignano T, Canestrelli D . Brain de novo transcriptome assembly of a toad species showing polymorphic anti-predatory behavior. Sci Data. 2022; 9(1):619. PMC: 9561626. DOI: 10.1038/s41597-022-01724-5. View

10.

Bairoch A, Apweiler R . The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 1999; 28(1):45-8. PMC: 102476. DOI: 10.1093/nar/28.1.45. View

11.

Buchfink B, Xie C, Huson D . Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2014; 12(1):59-60. DOI: 10.1038/nmeth.3176. View

12.

Tremblay J, Schreiber L, Greer C . High-resolution shotgun metagenomics: the more data, the better?. Brief Bioinform. 2022; 23(6). DOI: 10.1093/bib/bbac443. View

13.

Joudaki F, Ismaili A, Sohrabi S, Hosseini S, Kahrizi D, Ahmadi H . Transcriptome analysis of gall oak (Quercus infectoria): De novo assembly, functional annotation and metabolic pathways analysis. Genomics. 2023; 115(2):110588. DOI: 10.1016/j.ygeno.2023.110588. View

14.

Harshan P, Sukumaran S, Gopalakrishnan A . De novo transcriptome for Chiloscyllium griseum, a long-tail carpet shark of the Indian waters. Sci Data. 2024; 11(1):285. PMC: 10924892. DOI: 10.1038/s41597-024-03093-7. View

15.

Martino J, Arcieri M, Madeddu F, Pieroni M, Carotenuto G, Bottoni P . Molecular Dynamics Investigations of Human DNA-Topoisomerase I Interacting with Novel Dewar Valence Photo-Adducts: Insights into Inhibitory Activity. Int J Mol Sci. 2024; 25(1). PMC: 10778928. DOI: 10.3390/ijms25010234. View

16.

Buccitelli C, Selbach M . mRNAs, proteins and the emerging principles of gene expression control. Nat Rev Genet. 2020; 21(10):630-644. DOI: 10.1038/s41576-020-0258-4. View

17.

Bushmanova E, Antipov D, Lapidus A, Prjibelski A . rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data. Gigascience. 2019; 8(9). PMC: 6736328. DOI: 10.1093/gigascience/giz100. View

18.

Palomba M, Libro P, Martino J, Roca-Gerones X, Macali A, Castrignano T . De novo transcriptome assembly of an Antarctic nematode for the study of thermal adaptation in marine parasites. Sci Data. 2023; 10(1):720. PMC: 10587230. DOI: 10.1038/s41597-023-02591-4. View

19.

Muers M . Gene expression: Transcriptome to proteome and back to genome. Nat Rev Genet. 2011; 12(8):518. DOI: 10.1038/nrg3037. View

20.

Fallon T, calounova T, Mokrejs M, Weng J, Pluskal T . transXpress: a Snakemake pipeline for streamlined de novo transcriptome assembly and annotation. BMC Bioinformatics. 2023; 24(1):133. PMC: 10074830. DOI: 10.1186/s12859-023-05254-8. View