cc_dupl function

Identify Duplicated Records

Identify Duplicated Records

Removes or flags duplicated records based on species name and coordinates, as well as user-defined additional columns. True (specimen) duplicates or duplicates from the same species can make up the bulk of records in a biological collection database, but are undesirable for many analyses. Both can be flagged with this function, the former given enough additional information.

cc_dupl( x, lon = "decimalLongitude", lat = "decimalLatitude", species = "species", additions = NULL, value = "clean", verbose = TRUE )

Arguments

  • x: data.frame. Containing geographical coordinates and species names.
  • lon: character string. The column with the longitude coordinates. Default = decimalLongitude .
  • lat: character string. The column with the latitude coordinates. Default = decimalLatitude .
  • species: a character string. The column with the species name. Default = species .
  • additions: a vector of character strings. Additional columns to be included in the test for duplication. For example as below, collector name and collector number.
  • value: character string. Defining the output value. See value.
  • verbose: logical. If TRUE reports the name of the test and the number of records flagged.

Returns

Depending on the value argument, either a data.frame

containing the records considered correct by the test (clean ) or a logical vector (flagged ), with TRUE = test passed and FALSE = test failed/potentially problematic . Default = clean .

Examples

x <- data.frame(species = letters[1:10], decimalLongitude = sample(x = 0:10, size = 100, replace = TRUE), decimalLatitude = sample(x = 0:10, size = 100, replace = TRUE), collector = "Bonpl", collector.number = c(1001, 354), collection = rep(c("K", "WAG","FR", "P", "S"), 20)) cc_dupl(x, value = "flagged") cc_dupl(x, additions = c("collector", "collector.number"))

See Also

Other Coordinates: cc_aohi(), cc_cap(), cc_cen(), cc_coun(), cc_equ(), cc_gbif(), cc_inst(), cc_iucn(), cc_outl(), cc_sea(), cc_urb(), cc_val(), cc_zero()