» Articles » PMID: 38969627

Exploring and Retrieving Sequence and Metadata for Species Across the Tree of Life with NCBI Datasets

Abstract

To explore complex biological questions, it is often necessary to access various data types from public data repositories. As the volume and complexity of biological sequence data grow, public repositories face significant challenges in ensuring that the data is easily discoverable and usable by the biological research community. To address these challenges, the National Center for Biotechnology Information (NCBI) has created NCBI Datasets. This resource provides straightforward, comprehensive, and scalable access to biological sequences, annotations, and metadata for a wide range of taxa. Following the FAIR (Findable, Accessible, Interoperable, and Reusable) data management principles, NCBI Datasets offers user-friendly web interfaces, command-line tools, and documented APIs, empowering researchers to access NCBI data seamlessly. The data is delivered as packages of sequences and metadata, thus facilitating improved data retrieval, sharing, and usability in research. Moreover, this data delivery method fosters effective data attribution and promotes its further reuse. This paper outlines the current scope of data accessible through NCBI Datasets and explains various options for exploring and downloading the data.

Citing Articles

Genomic and phenotypic characterisation of isolates from canine otitis externa reveals high-risk sequence types identical to those found in human nosocomial infections.

Secker B, Shaw S, Hobley L, Atterbury R Front Microbiol. 2025; 16:1526843.

PMID: 40066269 PMC: 11891389. DOI: 10.3389/fmicb.2025.1526843.


sp. nov., sp. nov. and sp. nov.: three members of group 1 .

McKnight D, Wong-Bajracharya J, Okoh E, Snijders F, Lidbetter F, Webster J Int J Syst Evol Microbiol. 2025; 75(3).

PMID: 40063667 PMC: 11893732. DOI: 10.1099/ijsem.0.006686.


Benchmarking DNA Sequence Models for Causal Regulatory Variant Prediction in Human Genetics.

Benegas G, Eraslan G, Song Y bioRxiv. 2025; .

PMID: 39990426 PMC: 11844472. DOI: 10.1101/2025.02.11.637758.


Krait2: a versatile software for microsatellite investigation, visualization and marker development.

Du L, Chen J, Sun D, Zhao K, Zeng Q, Yang N BMC Genomics. 2025; 26(1):72.

PMID: 39863857 PMC: 11762079. DOI: 10.1186/s12864-025-11252-2.


Exploring the Structural Diversity and Biotechnological Potential of the Rhodophyte Phycolectome.

Rodrigues E, Verza F, Nishimura F, Beleboni R, Hermans C, Janssens K Mar Drugs. 2025; 23(1).

PMID: 39852510 PMC: 11766507. DOI: 10.3390/md23010008.


References
1.
Fan J . Why it's worth making computational methods easy to use. Nature. 2023; . DOI: 10.1038/d41586-023-01440-z. View

2.
Ricci M, Peona V, Boattini A, Taccioli C . Comparative analysis of bats and rodents' genomes suggests a relation between non-LTR retrotransposons, cancer incidence, and ageing. Sci Rep. 2023; 13(1):9039. PMC: 10239488. DOI: 10.1038/s41598-023-36006-6. View

3.
Wilkinson M, Dumontier M, Aalbersberg I, Appleton G, Axton M, Baak A . The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016; 3:160018. PMC: 4792175. DOI: 10.1038/sdata.2016.18. View

4.
Najar F, Linde E, Murphy C, Borin V, Wang H, Haider S . Future COVID19 surges prediction based on SARS-CoV-2 mutations surveillance. Elife. 2023; 12. PMC: 9894583. DOI: 10.7554/eLife.82980. View

5.
Bornstein K, Gryan G, Chang E, Marchler-Bauer A, Schneider V . The NIH Comparative Genomics Resource: addressing the promises and challenges of comparative genomics on human health. BMC Genomics. 2023; 24(1):575. PMC: 10523801. DOI: 10.1186/s12864-023-09643-4. View