The Sequence Read Archive

Overview

Journal Nucleic Acids Res

Publisher Oxford University Press

Specialty Biochemistry

Date 2010 Nov 11

PMID 21062823

Citations 1418

Authors

Rasko Leinonen

Hideaki Sugawara

Martin Shumway

Affiliations

Soon will be listed here.

Abstract

The combination of significantly lower cost and increased speed of sequencing has resulted in an explosive growth of data submitted into the primary next-generation sequence data archive, the Sequence Read Archive (SRA). The preservation of experimental data is an important part of the scientific record, and increasing numbers of journals and funding agencies require that next-generation sequence data are deposited into the SRA. The SRA was established as a public repository for the next-generation sequence data and is operated by the International Nucleotide Sequence Database Collaboration (INSDC). INSDC partners include the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). The SRA is accessible at http://www.ncbi.nlm.nih.gov/Traces/sra from NCBI, at http://www.ebi.ac.uk/ena from EBI and at http://trace.ddbj.nig.ac.jp from DDBJ. In this article, we present the content and structure of the SRA, detail our support for sequencing platforms and provide recommended data submission levels and formats. We also briefly outline our response to the challenge of data growth.

Citing Articles

Optical genome and epigenome mapping of clear cell renal cell carcinoma.

Margalit S, Tulpova Z, Michaeli Y, Zur T, Deek J, Louzoun-Zada S NAR Cancer. 2025; 7(1):zcaf008.

PMID: 40061565 PMC: 11886815. DOI: 10.1093/narcan/zcaf008.

A computational framework for extracting biological insights from SRA cancer data.

Guimaraes P, Carvalho M, Ruiz J Sci Rep. 2025; 15(1):8117.

PMID: 40057525 PMC: 11890766. DOI: 10.1038/s41598-025-91781-8.

Three-dimensional regulatory hubs support oncogenic programs in glioblastoma.

Breves S, Di Giammartino D, Nicholson J, Cirigliano S, Mahmood S, Lee U bioRxiv. 2025; .

PMID: 40034649 PMC: 11875237. DOI: 10.1101/2024.12.20.629544.

Novel AI-powered computational method using tensor decomposition for identification of common optimal bin sizes when integrating multiple Hi-C datasets.

Taguchi Y, Turki T Sci Rep. 2025; 15(1):7459.

PMID: 40033014 PMC: 11876364. DOI: 10.1038/s41598-025-91355-8.

Comparative Profiling of Regulatory Modules as a Tool for Identifying the Transcription Factor Network Linked to Leukemogenesis.

Subramanian S, Phongbunchoo Y, Cauchy P, Ramamoorthy S Methods Mol Biol. 2025; 2909:179-209.

PMID: 40029523 DOI: 10.1007/978-1-0716-4442-3_13.

References

Benson D, Karsch-Mizrachi I, Lipman D, Ostell J, Sayers E . GenBank. Nucleic Acids Res. 2009; 38(Database issue):D46-51. PMC: 2808980. DOI: 10.1093/nar/gkp1024. View

Cochrane G, Karsch-Mizrachi I, Nakamura Y . The International Nucleotide Sequence Database Collaboration. Nucleic Acids Res. 2010; 39(Database issue):D15-8. PMC: 3013722. DOI: 10.1093/nar/gkq1150. View

Bonfield J, Staden R . ZTR: a new format for DNA sequence trace data. Bioinformatics. 2002; 18(1):3-10. DOI: 10.1093/bioinformatics/18.1.3. View

Kaminuma E, Mashima J, Kodama Y, Gojobori T, Ogasawara O, Okubo K . DDBJ launches a new archive database with analytical tools for next-generation sequence data. Nucleic Acids Res. 2009; 38(Database issue):D33-8. PMC: 2808917. DOI: 10.1093/nar/gkp847. View

Cock P, Fields C, Goto N, Heuer M, Rice P . The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2009; 38(6):1767-71. PMC: 2847217. DOI: 10.1093/nar/gkp1137. View