» Articles » PMID: 22009675

The Sequence Read Archive: Explosive Growth of Sequencing Data

Overview
Specialty Biochemistry
Date 2011 Oct 20
PMID 22009675
Citations 501
Authors
Affiliations
Soon will be listed here.
Abstract

New generation sequencing platforms are producing data with significantly higher throughput and lower cost. A portion of this capacity is devoted to individual and community scientific projects. As these projects reach publication, raw sequencing datasets are submitted into the primary next-generation sequence data archive, the Sequence Read Archive (SRA). Archiving experimental data is the key to the progress of reproducible science. The SRA was established as a public repository for next-generation sequence data as a part of the International Nucleotide Sequence Database Collaboration (INSDC). INSDC is composed of the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). The SRA is accessible at www.ncbi.nlm.nih.gov/sra from NCBI, at www.ebi.ac.uk/ena from EBI and at trace.ddbj.nig.ac.jp from DDBJ. In this article, we present the content and structure of the SRA and report on updated metadata structures, submission file formats and supported sequencing platforms. We also briefly outline our various responses to the challenge of explosive data growth.

Citing Articles

Defining bovine CpG epigenetic diversity by analyzing RRBS data from sperm of Montbéliarde and Holstein bulls.

Capra E, Lazzari B, Cozzi P, Turri F, Negrini R, Ajmone-Marsan P Front Cell Dev Biol. 2025; 13:1532711.

PMID: 40052148 PMC: 11882585. DOI: 10.3389/fcell.2025.1532711.


Transcriptional activation of genes associated with the matrisome is a common feature of senescent endothelial cells.

Gonzalez I, Arredondo S, Maldonado-Agurto R Biogerontology. 2025; 26(2):59.

PMID: 39948317 PMC: 11825616. DOI: 10.1007/s10522-025-10191-5.


An Integrated Database for Exploring Alternative Promoters in Animals.

Xue F, Yan Y, Jin W, Zhu H, Yang Y, Yu Z Sci Data. 2025; 12(1):231.

PMID: 39920194 PMC: 11805906. DOI: 10.1038/s41597-025-04548-1.


The Venus score for the assessment of the quality and trustworthiness of biomedical datasets.

Chicco D, Fabris A, Jurman G BioData Min. 2025; 18(1):1.

PMID: 39780220 PMC: 11716409. DOI: 10.1186/s13040-024-00412-x.


Transcriptional dynamics during karyogamy in rice zygotes.

Toda E, Koshimizu S, Kinoshita A, Higashiyama T, Izawa T, Yano K Development. 2025; 152(2).

PMID: 39777484 PMC: 11829756. DOI: 10.1242/dev.204497.


References
1.
Kodama Y, Mashima J, Kaminuma E, Gojobori T, Ogasawara O, Takagi T . The DNA Data Bank of Japan launches a new resource, the DDBJ Omics Archive of functional genomics experiments. Nucleic Acids Res. 2011; 40(Database issue):D38-42. PMC: 3244990. DOI: 10.1093/nar/gkr994. View

2.
Shumway M, Cochrane G, Sugawara H . Archiving next generation sequencing data. Nucleic Acids Res. 2009; 38(Database issue):D870-1. PMC: 2808927. DOI: 10.1093/nar/gkp1078. View

3.
Hsi-Yang Fritz M, Leinonen R, Cochrane G, Birney E . Efficient storage of high throughput DNA sequencing data using reference-based compression. Genome Res. 2011; 21(5):734-40. PMC: 3083090. DOI: 10.1101/gr.114819.110. View

4.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N . The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009; 25(16):2078-9. PMC: 2723002. DOI: 10.1093/bioinformatics/btp352. View

5.
Karsch-Mizrachi I, Nakamura Y, Cochrane G . The International Nucleotide Sequence Database Collaboration. Nucleic Acids Res. 2011; 40(Database issue):D33-7. PMC: 3244996. DOI: 10.1093/nar/gkr1006. View