Patterns of Database Citation in Articles and Patents Indicate Long-term Scientific and Industry Value of Biological Data Resources
Overview
Science
Authors
Affiliations
Data from open access biomolecular data resources, such as the European Nucleotide Archive and the Protein Data Bank are extensively reused within life science research for comparative studies, method development and to derive new scientific insights. Indicators that estimate the extent and utility of such secondary use of research data need to reflect this complex and highly variable data usage. By linking open access scientific literature, via Europe PubMedCentral, to the metadata in biological data resources we separate data citations associated with a deposition statement from citations that capture the subsequent, long-term, reuse of data in academia and industry. We extend this analysis to begin to investigate citations of biomolecular resources in patent documents. We find citations in more than 8,000 patents from 2014, demonstrating substantial use and an important role for data resources in defining biological concepts in granted patents to both academic and industrial innovators. Combined together our results indicate that the citation patterns in biomedical literature and patents vary, not only due to citation practice but also according to the data resource cited. The results guard against the use of simple metrics such as citation counts and show that indicators of data use must not only take into account citations within the biomedical literature but also include reuse of data in industry and other parts of society by including patents and other scientific and technical documents such as guidelines, reports and grant applications.
Scientometrics Analysis of World Scientific Research of Pathology and Forensic Medicine.
Jozi Z, Nourmohammadi H Iran J Pathol. 2022; 17(2):191-201.
PMID: 35463726 PMC: 9013873. DOI: 10.30699/ijp.2022.541660.2756.
ELIXIR: providing a sustainable infrastructure for life science data at European scale.
Harrow J, Drysdale R, Smith A, Repo S, Lanfear J, Blomberg N Bioinformatics. 2021; 37(16):2506-2511.
PMID: 34175941 PMC: 8388016. DOI: 10.1093/bioinformatics/btab481.
The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences.
Drysdale R, Cook C, Petryszak R, Baillie-Gerritsen V, Barlow M, Gasteiger E Bioinformatics. 2020; 36(8):2636-2642.
PMID: 31950984 PMC: 7446027. DOI: 10.1093/bioinformatics/btz959.
Analysis of impact metrics for the Protein Data Bank.
Markosian C, Di Costanzo L, Sekharan M, Shao C, Burley S, Zardecki C Sci Data. 2018; 5:180212.
PMID: 30325351 PMC: 6190746. DOI: 10.1038/sdata.2018.212.
Worldwide Protein Data Bank validation information: usage and trends.
Smart O, Horsky V, Gore S, Varekova R, Bendova V, Kleywegt G Acta Crystallogr D Struct Biol. 2018; 74(Pt 3):237-244.
PMID: 29533231 PMC: 5947764. DOI: 10.1107/S2059798318003303.