» Articles » PMID: 30473617

Data Leakage and Loss in Biodiversity Informatics

Overview
Date 2018 Nov 27
PMID 30473617
Citations 14
Authors
Affiliations
Soon will be listed here.
Abstract

The field of biodiversity informatics is in a massive, "grow-out" phase of creating and enabling large-scale biodiversity data resources. Because perhaps 90% of existing biodiversity data nonetheless remains unavailable for science and policy applications, the question arises as to how these existing and available data records can be mobilized most efficiently and effectively. This situation led to our analysis of several large-scale biodiversity datasets regarding birds and plants, detecting information gaps and documenting data "leakage" or attrition, in terms of data on taxon, time, and place, in each data record. We documented significant data leakage in each data dimension in each dataset. That is, significant numbers of data records are lacking crucial information in terms of taxon, time, and/or place; information on place was consistently the least complete, such that geographic referencing presently represents the most significant factor in degradation of usability of information from biodiversity information resources. Although the full process of digital capture, quality control, and enrichment is important to developing a complete digital record of existing biodiversity information, payoffs in terms of immediate data usability will be greatest with attention paid to the georeferencing challenge.

Citing Articles

expowo: An R package for mining global plant diversity and distribution data.

Zuanny D, Vilela B, Moonlight P, Sarkinen T, Cardoso D Appl Plant Sci. 2024; 12(6):e11609.

PMID: 39628545 PMC: 11610411. DOI: 10.1002/aps3.11609.


florabr: An R package to explore and spatialize species distribution using Flora e Funga do Brasil.

Trindade W Appl Plant Sci. 2024; 12(6):e11616.

PMID: 39628542 PMC: 11610413. DOI: 10.1002/aps3.11616.


A globally synthesised and flagged bee occurrence dataset and cleaning workflow.

Dorey J, Fischer E, Chesshire P, Nava-Bolanos A, OReilly R, Bossert S Sci Data. 2023; 10(1):747.

PMID: 37919303 PMC: 10622554. DOI: 10.1038/s41597-023-02626-w.


phylogatR: Phylogeographic data aggregation and repurposing.

Pelletier T, Parsons D, Decker S, Crouch S, Franz E, Ohrstrom J Mol Ecol Resour. 2022; 22(8):2830-2842.

PMID: 35748425 PMC: 9796472. DOI: 10.1111/1755-0998.13673.


Open Data Practices among Users of Primary Biodiversity Data.

Mandeville C, Koch W, Nilsen E, Finstad A Bioscience. 2021; 71(11):1128-1147.

PMID: 34733117 PMC: 8560312. DOI: 10.1093/biosci/biab072.


References
1.
Pinto I, das Chagas B, Rodrigues A, Ferreira A, Rezende H, Bruno R . DNA Barcoding of Neotropical Sand Flies (Diptera, Psychodidae, Phlebotominae): Species Identification and Discovery within Brazil. PLoS One. 2015; 10(10):e0140636. PMC: 4624639. DOI: 10.1371/journal.pone.0140636. View

2.
Constable H, Guralnick R, Wieczorek J, Spencer C, Peterson A . VertNet: a new model for biodiversity data sharing. PLoS Biol. 2010; 8(2):e1000309. PMC: 2821892. DOI: 10.1371/journal.pbio.1000309. View

3.
Arita H, Christen J, Rodriguez P, Soberon J . Species diversity and distribution in presence-absence matrices: mathematical relationships and biological implications. Am Nat. 2008; 172(4):519-32. DOI: 10.1086/590954. View

4.
Hill A, Guralnick R, Flemons P, Beaman R, Wieczorek J, Ranipeta A . Location, location, location: utilizing pipelines and services to more effectively georeference the world's biodiversity data. BMC Bioinformatics. 2009; 10 Suppl 14:S3. PMC: 2775149. DOI: 10.1186/1471-2105-10-S14-S3. View

5.
Peterson A, Navarro-Siguenza A, Martinez-Meyer E, Cuervo-Robayo A, Berlanga H, Soberon J . Twentieth century turnover of Mexican endemic avifaunas: Landscape change versus climate drivers. Sci Adv. 2015; 1(4):e1400071. PMC: 4640638. DOI: 10.1126/sciadv.1400071. View