Identify Records in the Vicinity of Biodiversity Institutions
Identify Records in the Vicinity of Biodiversity Institutions
Removes or flags records assigned to the location of zoos, botanical gardens, herbaria, universities and museums, based on a global database of ~10,000 such biodiversity institutions. Coordinates from these locations can be related to data-entry errors, false automated geo-reference or individuals in captivity/horticulture.
cc_inst( x, lon ="decimalLongitude", lat ="decimalLatitude", species ="species", buffer =100, geod =FALSE, ref =NULL, verify =FALSE, verify_mltpl =10, value ="clean", verbose =TRUE)
Arguments
x: data.frame. Containing geographical coordinates and species names.
lon: character string. The column with the longitude coordinates. Default = decimalLongitude .
lat: character string. The column with the latitude coordinates. Default = decimalLatitude .
species: character string. The column with the species identity. Only required if verify = TRUE.
buffer: numerical. The buffer around each institution, where records should be flagged as problematic, in decimal degrees. Default = 100m.
geod: logical. If TRUE the radius around each capital is calculated based on a sphere, buffer is in meters and independent of latitude. If FALSE the radius is calculated assuming planar coordinates and varies slightly with latitude. Default = TRUE. See https://seethedatablog.wordpress.com/ for detail and credits.
ref: SpatVector (geometry: polygons). Providing the geographic gazetteer. Can be any SpatVector (geometry: polygons), but the structure must be identical to institutions. Default = institutions
verify: logical. If TRUE, records close to institutions are only flagged, if there are no other records of the same species in the greater vicinity (a radius of buffer * verify_mltpl).
verify_mltpl: numerical. indicates the factor by which the radius for verify exceeds the radius of the initial test. Default = 10, which might be suitable if geod is TRUE, but might be too large otherwise.
value: character string. Defining the output value. See value.
verbose: logical. If TRUE reports the name of the test and the number of records flagged.
Returns
Depending on the value argument, either a data.frame
containing the records considered correct by the test (clean ) or a logical vector (flagged ), with TRUE = test passed and FALSE = test failed/potentially problematic . Default = clean .
Details
Note: the buffer radius is in degrees, thus will differ slightly between different latitudes.
Examples
x <- data.frame(species = letters[1:10], decimalLongitude = c(runif(99,-180,180),37.577800), decimalLatitude = c(runif(99,-90,90),55.710800))#large buffer for demonstration, using geod = FALSE for shorter runtimecc_inst(x, value ="flagged", buffer =10, geod =FALSE)## Not run:#' cc_inst(x, value = "flagged", buffer = 50000) #geod = T## End(Not run)