» Articles » PMID: 38941113

A Cloud-based Training Module for Efficient De Novo Transcriptome Assembly Using Nextflow and Google Cloud

Overview
Journal Brief Bioinform
Specialty Biology
Date 2024 Jun 28
PMID 38941113
Authors
Affiliations
Soon will be listed here.
Abstract

This study describes the development of a resource module that is part of a learning platform named "NIGMS Sandbox for Cloud-based Learning" (https://github.com/NIGMS/NIGMS-Sandbox). The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module delivers learning materials on de novo transcriptome assembly using Nextflow in an interactive format that uses appropriate cloud resources for data access and analysis. Cloud computing is a powerful new means by which biomedical researchers can access resources and capacity that were previously either unattainable or prohibitively expensive. To take advantage of these resources, however, the biomedical research community needs new skills and knowledge. We present here a cloud-based training module, developed in conjunction with Google Cloud, Deloitte Consulting, and the NIH STRIDES Program, that uses the biological problem of de novo transcriptome assembly to demonstrate and teach the concepts of computational workflows (using Nextflow) and cost- and resource-efficient use of Cloud services (using Google Cloud Platform). Our work highlights the reduced necessity of on-site computing resources and the accessibility of cloud-based infrastructure for bioinformatics applications.

Citing Articles

NIGMS Sandbox: a learning platform toward democratizing cloud computing for biomedical research.

Lei M, Matukumalli L, Arora K, Weber N, Malashock R, Mao F Brief Bioinform. 2024; 25(Supplement_1).

PMID: 39376084 PMC: 11458913. DOI: 10.1093/bib/bbae478.

References
1.
Wang J, Fu L, Koganti P, Wang L, Hand J, Ma H . Identification and Functional Prediction of Large Intergenic Noncoding RNAs (lincRNAs) in Rainbow Trout (Oncorhynchus mykiss). Mar Biotechnol (NY). 2016; 18(2):271-82. DOI: 10.1007/s10126-016-9689-5. View

2.
Al-Tobasei R, Paneru B, Salem M . Genome-Wide Discovery of Long Non-Coding RNAs in Rainbow Trout. PLoS One. 2016; 11(2):e0148940. PMC: 4764514. DOI: 10.1371/journal.pone.0148940. View

3.
Holzer M, Marz M . De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers. Gigascience. 2019; 8(5). PMC: 6511074. DOI: 10.1093/gigascience/giz039. View

4.
Grabherr M, Haas B, Yassour M, Levin J, Thompson D, Amit I . Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011; 29(7):644-52. PMC: 3571712. DOI: 10.1038/nbt.1883. View

5.
Ewels P, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A . The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020; 38(3):276-278. DOI: 10.1038/s41587-020-0439-x. View