» Articles » PMID: 22121212

NCBI Reference Sequences (RefSeq): Current Status, New Features and Genome Annotation Policy

Overview
Specialty Biochemistry
Date 2011 Nov 29
PMID 22121212
Citations 667
Authors
Affiliations
Soon will be listed here.
Abstract

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of genomic, transcript and protein sequence records. These records are selected and curated from public sequence archives and represent a significant reduction in redundancy compared to the volume of data archived by the International Nucleotide Sequence Database Collaboration. The database includes over 16,00 organisms, 2.4 × 0(6) genomic records, 13 × 10(6) proteins and 2 × 10(6) RNA records spanning prokaryotes, eukaryotes and viruses (RefSeq release 49, September 2011). The RefSeq database is maintained by a combined approach of automated analyses, collaboration and manual curation to generate an up-to-date representation of the sequence, its features, names and cross-links to related sources of information. We report here on recent growth, the status of curating the human RefSeq data set, more extensive feature annotation and current policy for eukaryotic genome annotation via the NCBI annotation pipeline. More information about the resource is available online (see http://www.ncbi.nlm.nih.gov/RefSeq/).

Citing Articles

Convergent evolution of noncoding elements associated with short tarsus length in birds.

Shakya S, Edwards S, Sackton T BMC Biol. 2025; 23(1):52.

PMID: 39984930 PMC: 11846207. DOI: 10.1186/s12915-025-02156-4.


Prokaryotic cellulase gene clusters derived from 2,305 metagenomes.

Song B, Tria F, Skejo J Sci Data. 2025; 12(1):218.

PMID: 39910055 PMC: 11799192. DOI: 10.1038/s41597-025-04524-9.


Brain Transcriptome Changes Associated With an Acute Increase of Protein O-GlcNAcylation and Implications for Neurodegenerative Disease.

Bell M, Kane M, Ouyang X, Young M, Jegga A, Chatham J J Neurochem. 2025; 169(1):e16302.

PMID: 39823370 PMC: 11741514. DOI: 10.1111/jnc.16302.


Infection in Wild Trahira () and Farmed Arapaima () in Brazil: An Interspecies Transmission in Aquatic Environments Shared with Nile Tilapia ().

Leal C, Xavier R, Queiroz G, Silva T, Teixeira J, Aburjaile F Microorganisms. 2025; 12(12.

PMID: 39770595 PMC: 11677813. DOI: 10.3390/microorganisms12122393.


A conserved pilin from uncultured gut bacterial clade TANB77 enhances cancer immunotherapy.

Kim C, Park D, Ahn B, Baek S, Hong M, Nguyen L Nat Commun. 2024; 15(1):10726.

PMID: 39730328 PMC: 11680825. DOI: 10.1038/s41467-024-55388-3.


References
1.
Prakash T, Sharma V, Adati N, Ozawa R, Kumar N, Nishida Y . Expression of conjoined genes: another mechanism for gene regulation in eukaryotes. PLoS One. 2010; 5(10):e13284. PMC: 2953495. DOI: 10.1371/journal.pone.0013284. View

2.
Sherry S, Ward M, Kholodov M, Baker J, Phan L, Smigielski E . dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2000; 29(1):308-11. PMC: 29783. DOI: 10.1093/nar/29.1.308. View

3.
Marchler-Bauer A, Lu S, Anderson J, Chitsaz F, Derbyshire M, DeWeese-Scott C . CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res. 2010; 39(Database issue):D225-9. PMC: 3013737. DOI: 10.1093/nar/gkq1189. View

4.
Petersen T, Brunak S, von Heijne G, Nielsen H . SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods. 2011; 8(10):785-6. DOI: 10.1038/nmeth.1701. View

5.
Maglott D, Ostell J, Pruitt K, Tatusova T . Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 2010; 39(Database issue):D52-7. PMC: 3013746. DOI: 10.1093/nar/gkq1237. View