extract_unique_references() R function from [revtools]

Create a de-duplicated data.frame

Take a data.frame of bibliographic information showing potential duplicates (as returned by find_duplicates), and return a data.frame of unique references.


extract_unique_references(x, matches)

Arguments

x: a data.frame to be subsetted
matches: either a vector of matches, e.g. as returned from find_duplicates, or a column name (specified as a number or a string) from x showing where matches are stored

Returns

a subsetted data.frame containing one entry for each group identified in matches.

Note

This function creates a simplified version of x, by extracting the reference from each group of 'identical' references that contains the most text. It is assumed that this is the most 'complete' record of those available in the dataset. This function does not merge data from multiple 'identical' records due to the potential for mis-matching that this approach would create.

Examples


# import data
file_location <- system.file(
  "extdata",
  "avian_ecology_bibliography.ris",
  package = "revtools"
)
x <- read_bibliography(file_location)

# generate duplicated references (for example purposes)
x_duplicated <- rbind(x, x[1:5,])

# locate and extract unique references
x_check <- find_duplicates(x_duplicated)
x_unique <- extract_unique_references(x_duplicated, matches = x_check)

revtools package Read PDF manual

Maintainer: Martin J. Westgate
License: GPL-3
Last published: 2019-12-17

Useful links

extract_unique_references function

Create a de-duplicated data.frame

Arguments

Returns

Note

See Also

Examples