Biodiversity Data Cleaning
Create a report summarizing the results of data quality tests
Identify records with empty event date
Remove columns with the results of data quality tests
Filter out records according to their taxonomic status
Harmonizing taxon names against local stored taxonomic databases
Identify records from doubtful source (e.g., 'fossil', MachineObservat...
Clean and parse scientific names
Identify records within a reference country
Identify records with empty geographic coordinates
Identify records lacking or with invalid coordinates but containing lo...
Identify records with out-of-range geographic coordinates
Flag low-precise geographic coordinates
Identify transposed geographic coordinates
Get country names from coordinates
Standardizes country names and gets country code
Create figures reporting the results of the bdc package
Create a map of points using ggplot2
Identify records with empty scientific names
Standardize datasets columns based on metadata
Create or update the column summarizing the results of data quality te...
Extract year from eventDate
Identify records with year out-of-range
Pipe operator
It brings together several aspects of biodiversity data-cleaning in one place. 'bdc' is organized in thematic modules related to different biodiversity dimensions, including 1) Merge datasets: standardization and integration of different datasets; 2) pre-filter: flagging and removal of invalid or non-interpretable information, followed by data amendments; 3) taxonomy: cleaning, parsing, and harmonization of scientific names from several taxonomic groups against taxonomic databases locally stored through the application of exact and partial matching algorithms; 4) space: flagging of erroneous, suspect, and low-precision geographic coordinates; and 5) time: flagging and, whenever possible, correction of inconsistent collection date. In addition, it contains features to visualize, document, and report data quality – which is essential for making data quality assessment transparent and reproducible. The reference for the methodology is Ribeiro and colleagues (2022) <doi:10.1111/2041-210X.13868>.