Discover Probable Duplicates in Plant Genetic Resources Collections
Add probable duplicate sets fields to the PGR passport database
Clean PGR passport data
Get disjoint probable duplicate sets
'Double Metaphone' phonetic algorithm
Generate keyword counts
Create a KWIC index
Merge keyword strings
Merge two objects of class ProbDup
Parse an object of class ProbDup
to a data frame.
The PGRdup Package
Prints summary of KWIC
object.
Prints summary of ProbDup
object.
Identify probable duplicates of accessions
Convert 'Darwin Core - Germplasm' zip archive to a flat file
Reconstruct an object of class ProbDup
Retrieve probable duplicate set information from PGR passport database...
Split an object of class ProbDup
Validate if a data frame column confirms to primary key/ID constraints
Visualize the probable duplicate sets retrieved in a ProbDup
object
Provides functions to aid the identification of probable/possible duplicates in Plant Genetic Resources (PGR) collections using 'passport databases' comprising of information records of each constituent sample. These include methods for cleaning the data, creation of a searchable Key Word in Context (KWIC) index of keywords associated with sample records and the identification of nearly identical records with similar information by fuzzy, phonetic and semantic matching of keywords.
Useful links