combine_rare_levels function

Combines rare levels of a categorical variable

Combines rare levels of a categorical variable

This function takes a categorical variable and combines all levels with frequencies less than a user-specified threshold named Combined

combine_rare_levels(x,threshold=20,newname="Combined")

Arguments

  • x: a vector of categorical values
  • threshold: levels that appear a total of threshold times or fewer will be combined into a new level called Combined
  • newname: defaults to Combined, but give the option as to what this new level will be called

Details

Returns a list of two objects:

values - The recoded values of the categorical variable. All levels which appeared threshold times or fewer are now known as Combined

combined - The levels that have been combined together

If, after being combined, the newname level has threshold or fewer instances, the remaining level that appears least often is combined as well.

References

Introduction to Regression and Modeling

Author(s)

Adam Petrie

Examples

data(EX6.CLICK) x <- EX6.CLICK[,15] table(x) #Combine all levels which appear 700 or fewer times (AA, CC, DD) y <- combine_rare_levels(x,700) table( y$values ) #Combine all levels which appear 1350 or fewer times. This forces BB (which #occurs 2422 times) into the Combined level since the three levels that appear #fewer than 1350 times do not appear more than 1350 times combined y <- combine_rare_levels(x,1350) table( y$values )
  • Maintainer: Adam Petrie
  • License: GPL (>= 2)
  • Last published: 2020-02-21

Useful links