» Articles » PMID: 39169276

HPC-T-Annotator: an HPC Tool for De Novo Transcriptome Assembly Annotation

Overview
Publisher Biomed Central
Specialty Biology
Date 2024 Aug 21
PMID 39169276
Authors
Affiliations
Soon will be listed here.
Abstract

Background: The availability of transcriptomic data for species without a reference genome enables the construction of de novo transcriptome assemblies as alternative reference resources from RNA-Seq data. A transcriptome provides direct information about a species' protein-coding genes under specific experimental conditions. The de novo assembly process produces a unigenes file in FASTA format, subsequently targeted for the annotation. Homology-based annotation, a method to infer the function of sequences by estimating similarity with other sequences in a reference database, is a computationally demanding procedure.

Results: To mitigate the computational burden, we introduce HPC-T-Annotator, a tool for de novo transcriptome homology annotation on high performance computing (HPC) infrastructures, designed for straightforward configuration via a Web interface. Once the configuration data are given, the entire parallel computing software for annotation is automatically generated and can be launched on a supercomputer using a simple command line. The output data can then be easily viewed using post-processing utilities in the form of Python notebooks integrated in the proposed software.

Conclusions: HPC-T-Annotator expedites homology-based annotation in de novo transcriptome assemblies. Its efficient parallelization strategy on HPC infrastructures significantly reduces computational load and execution times, enabling large-scale transcriptome analysis and comparison projects, while its intuitive graphical interface extends accessibility to users without IT skills.

Citing Articles

De novo transcriptome assembly of the Mediterranean sea-rock pool mosquitoes Aedes mariae and Aedes zammitii.

Mastrantonio V, Porretta D, Liberati F, Bisconti R, Castrignano T, Canestrelli D Sci Data. 2025; 12(1):115.

PMID: 39833234 PMC: 11746941. DOI: 10.1038/s41597-025-04393-2.

References
1.
Zhang J, Wang H, Feng W . cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on CPU+GPU. IEEE/ACM Trans Comput Biol Bioinform. 2015; 14(4):830-843. DOI: 10.1109/TCBB.2015.2489662. View

2.
Pruitt K, Tatusova T, Maglott D . NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2006; 35(Database issue):D61-5. PMC: 1716718. DOI: 10.1093/nar/gkl842. View

3.
Libro P, Chiocchio A, De Rysky E, Martino J, Bisconti R, Castrignano T . De novo transcriptome assembly and annotation for gene discovery in Salamandra salamandra at the larval stage. Sci Data. 2023; 10(1):330. PMC: 10224929. DOI: 10.1038/s41597-023-02217-9. View

4.
Palomba M, Libro P, Martino J, Rughetti A, Santoro M, Mattiucci S . De novo transcriptome assembly and annotation of the third stage larvae of the zoonotic parasite Anisakis pegreffii. BMC Res Notes. 2022; 15(1):223. PMC: 9233829. DOI: 10.1186/s13104-022-06099-9. View

5.
Bolis M, Garattini E, Paroni G, Zanetti A, Kurosaki M, Castrignano T . Network-guided modeling allows tumor-type independent prediction of sensitivity to all-trans-retinoic acid. Ann Oncol. 2016; 28(3):611-621. PMC: 5834014. DOI: 10.1093/annonc/mdw660. View