DDBJ Read Annotation Pipeline: a Cloud Computing-based Pipeline for High-throughput Analysis of Next-generation Sequencing Data

Overview

Journal DNA Res

Publisher Oxford University Press

Specialties Genetics
Molecular Biology

Date 2013 May 10

PMID 23657089

Citations 34

Authors

Hideki Nagasaki

Takako Mochizuki

Yuichi Kodama

Satoshi Saruhashi

Shota Morizaki

Hideaki Sugawara

Hajime Ohyanagi

Nori Kurata

Kousaku Okubo

Toshihisa Takagi

Eli Kaminuma

Yasukazu Nakamura

Affiliations

Soon will be listed here.

Abstract

High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/.

Citing Articles

Relationship between the Rod complex and peptidoglycan structure in Escherichia coli.

Ago R, Tahara Y, Yamaguchi H, Saito M, Ito W, Yamasaki K Microbiologyopen. 2023; 12(5):e1385.

PMID: 37877652 PMC: 10561026. DOI: 10.1002/mbo3.1385.

Lineage-specific, fast-evolving GATA-like gene regulates zygotic gene activation to promote endoderm specification and pattern formation in the Theridiidae spider.

Iwasaki-Yokozawa S, Nanjo R, Akiyama-Oda Y, Oda H BMC Biol. 2022; 20(1):223.

PMID: 36203191 PMC: 9535882. DOI: 10.1186/s12915-022-01421-0.

Metatranscriptomic Analysis of Corals Inoculated With Tolerant and Non-Tolerant Symbiont Exposed to High Temperature and Light Stress.

Yuyama I, Higuchi T, Mezaki T, Tashiro H, Ikeo K Front Physiol. 2022; 13:806171.

PMID: 35480050 PMC: 9037784. DOI: 10.3389/fphys.2022.806171.

Golgi-localized membrane protein AtTMN1/EMP12 functions in the deposition of rhamnogalacturonan II and I for cell growth in Arabidopsis.

Hiroguchi A, Sakamoto S, Mitsuda N, Miwa K J Exp Bot. 2021; 72(10):3611-3629.

PMID: 33587102 PMC: 8096605. DOI: 10.1093/jxb/erab065.

Expression of , the Flowering Inducer of Asiatic Hybrid Lily, in the Bulb Scales.

Kurokawa K, Kobayashi J, Nemoto K, Nozawa A, Sawasaki T, Nakatsuka T Front Plant Sci. 2020; 11:570915.

PMID: 33304361 PMC: 7693649. DOI: 10.3389/fpls.2020.570915.

References

Li H, Durbin R . Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009; 25(14):1754-60. PMC: 2705234. DOI: 10.1093/bioinformatics/btp324. View

Cochrane G, Karsch-Mizrachi I, Nakamura Y . The International Nucleotide Sequence Database Collaboration. Nucleic Acids Res. 2010; 39(Database issue):D15-8. PMC: 3013722. DOI: 10.1093/nar/gkq1150. View

Grabherr M, Haas B, Yassour M, Levin J, Thompson D, Amit I . Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011; 29(7):644-52. PMC: 3571712. DOI: 10.1038/nbt.1883. View

Li H, Durbin R . Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010; 26(5):589-95. PMC: 2828108. DOI: 10.1093/bioinformatics/btp698. View

Li R, Yu C, Li Y, Lam T, Yiu S, Kristiansen K . SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 2009; 25(15):1966-7. DOI: 10.1093/bioinformatics/btp336. View

Cock P, Fields C, Goto N, Heuer M, Rice P . The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2009; 38(6):1767-71. PMC: 2847217. DOI: 10.1093/nar/gkp1137. View

Narzisi G, Mishra B . Comparing de novo genome assembly: the long and short of it. PLoS One. 2011; 6(4):e19175. PMC: 3084767. DOI: 10.1371/journal.pone.0019175. View

Smigielski E, Sirotkin K, Ward M, Sherry S . dbSNP: a database of single nucleotide polymorphisms. Nucleic Acids Res. 1999; 28(1):352-5. PMC: 102496. DOI: 10.1093/nar/28.1.352. View

Li R, Li Y, Kristiansen K, Wang J . SOAP: short oligonucleotide alignment program. Bioinformatics. 2008; 24(5):713-4. DOI: 10.1093/bioinformatics/btn025. View

10.

Leinonen R, Sugawara H, Shumway M . The sequence read archive. Nucleic Acids Res. 2010; 39(Database issue):D19-21. PMC: 3013647. DOI: 10.1093/nar/gkq1019. View

11.

Kent W . BLAT--the BLAST-like alignment tool. Genome Res. 2002; 12(4):656-64. PMC: 187518. DOI: 10.1101/gr.229202. View

12.

Burge C, Karlin S . Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997; 268(1):78-94. DOI: 10.1006/jmbi.1997.0951. View

13.

Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W . Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997; 25(17):3389-402. PMC: 146917. DOI: 10.1093/nar/25.17.3389. View

14.

Leinonen R, Akhtar R, Birney E, Bower L, Cerdeno-Tarraga A, Cheng Y . The European Nucleotide Archive. Nucleic Acids Res. 2010; 39(Database issue):D28-31. PMC: 3013801. DOI: 10.1093/nar/gkq967. View

15.

Zerbino D, McEwen G, Margulies E, Birney E . Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler. PLoS One. 2009; 4(12):e8407. PMC: 2793427. DOI: 10.1371/journal.pone.0008407. View

16.

Kwon Y, Shigemoto Y, Kuwana Y, Sugawara H . Web API for biology with a workflow navigation system. Nucleic Acids Res. 2009; 37(Web Server issue):W11-6. PMC: 2703950. DOI: 10.1093/nar/gkp300. View

17.

Hillier L, Marth G, Quinlan A, Dooling D, Fewell G, Barnett D . Whole-genome sequencing and variant discovery in C. elegans. Nat Methods. 2008; 5(2):183-8. DOI: 10.1038/nmeth.1179. View

18.

Langmead B, Trapnell C, Pop M, Salzberg S . Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009; 10(3):R25. PMC: 2690996. DOI: 10.1186/gb-2009-10-3-r25. View

19.

Mardis E . The impact of next-generation sequencing technology on genetics. Trends Genet. 2008; 24(3):133-41. DOI: 10.1016/j.tig.2007.12.007. View

20.

Trapnell C, Williams B, Pertea G, Mortazavi A, Kwan G, van Baren M . Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010; 28(5):511-5. PMC: 3146043. DOI: 10.1038/nbt.1621. View