getPatterns function

getPatterns

getPatterns

Get the full matching patterns for all matched pairs in dataset A and dataset B

getPatterns( matchesA, matchesB, varnames, stringdist.match, numeric.match, partial.match, stringdist.method = "jw", cut.a = 0.92, cut.p = 0.88, jw.weight = 0.1, cut.a.num = 1, cut.p.num = 2.5 )

Arguments

  • matchesA: A dataframe of the matched observations in dataset A, with all variables used to inform the match.
  • matchesB: A dataframe of the matched observations in dataset B, with all variables used to inform the match.
  • varnames: A vector of variable names to use for matching. Must be present in both matchesA and matchesB.
  • stringdist.match: A vector of booleans, indicating whether to use string distance matching when determining matching patterns on each variable. Must be same length as varnames.
  • numeric.match: A vector of booleans, indicating whether to use numeric pairwise distance matching when determining matching patterns on each variable. Must be same length as varnames.
  • partial.match: A vector of booleans, indicating whether to include a partial matching category for the string distances. Must be same length as varnames. Default is FALSE for all variables.
  • stringdist.method: String distance method for calculating similarity, options are: "jw" Jaro-Winkler (Default), "jaro" Jaro, and "lv" Edit
  • cut.a: Lower bound for full string-distance match, ranging between 0 and 1. Default is 0.92
  • cut.p: Lower bound for partial string-distance match, ranging between 0 and 1. Default is 0.88
  • jw.weight: Parameter that describes the importance of the first characters of a string (only needed if stringdist.method = "jw"). Default is .10
  • cut.a.num: Lower bound for full numeric match. Default is 1
  • cut.p.num: Lower bound for partial numeric match. Default is 2.5

Returns

getPatterns() returns a dataframe with a row for each matched pair, where each column indicates the matching pattern for each matching variable.

Author(s)

Ted Enamorado ted.enamorado@gmail.com and Ben Fifield benfifield@gmail.com

  • Maintainer: Ted Enamorado
  • License: GPL (>= 3)
  • Last published: 2023-11-17

Useful links