cut2 function

Cut a Numeric Variable into Intervals

Cut a Numeric Variable into Intervals

cut2 is a function like cut but left endpoints are inclusive and labels are of the form [lower, upper), except that last interval is [lower,upper]. If cuts are given, will by default make sure that cuts include entire range of x. Also, if cuts are not given, will cut x into quantile groups (g given) or groups with a given minimum number of observations (m). Whereas cut creates a category object, cut2 creates a factor object. m is not guaranteed but is a target.

cutGn guarantees that the grouped variable will have a minimum of m observations in any group. This is done by an exhaustive algorithm that runs fast due to being coded in Fortran.

cut2(x, cuts, m=150, g, levels.mean=FALSE, digits, minmax=TRUE, oneval=TRUE, onlycuts=FALSE, formatfun=format, ...) cutGn(x, m, what=c('mean', 'factor', 'summary', 'cuts', 'function'), rcode=FALSE)

Arguments

  • x: numeric vector to classify into intervals
  • cuts: cut points
  • m: desired minimum number of observations in a group. The algorithm does not guarantee that all groups will have at least m observations.
  • g: number of quantile groups
  • levels.mean: set to TRUE to make the new categorical vector have levels attribute that is the group means of x instead of interval endpoint labels
  • digits: number of significant digits to use in constructing levels. Default is 3 (5 if levels.mean=TRUE)
  • minmax: if cuts is specified but min(x)<min(cuts) or max(x)>max(cuts), augments cuts to include min and max x
  • oneval: if an interval contains only one unique value, the interval will be labeled with the formatted version of that value instead of the interval endpoints, unless oneval=FALSE
  • onlycuts: set to TRUE to only return the vector of computed cuts. This consists of the interior values plus outer ranges.
  • formatfun: formatting function, supports formula notation (if rlang is installed)
  • ...: additional arguments passed to formatfun
  • what: specifies the kind of vector values to return from cutGn, the default being like 'levels.mean' of cut2. Specify 'summary' to return a numeric 3-column matrix that summarizes the intervals satisfying the m requirement. Use what='cuts' to only return the vector of computed cutpoints. To create a function that will recode the variable in play using the same intervals as computed by cutGn, specify what='function'. This function will have a what argument to allow the user to decide later whether to recode into interval means or into a factor variable.
  • rcode: set to TRUE to run the cutgn algorithm in R. This is useful for speed comparisons with the default compiled code.

Returns

a factor variable with levels of the form [a,b) or formatted means (character strings) unless onlycuts is TRUE in which case a numeric vector is returned

See Also

cut, quantile, combine.levels

Examples

set.seed(1) x <- runif(1000, 0, 100) z <- cut2(x, c(10,20,30)) table(z) table(cut2(x, g=10)) # quantile groups table(cut2(x, m=50)) # group x into intevals with at least 50 obs. table(cutGn(x, m=50, what='factor')) f <- cutGn(x, m=50, what='function') f f(c(-1, 2, 10), what='mean') f(c(-1, 2, 10), what='factor') ## Not run: x <- round(runif(200000), 3) system.time(a <- cutGn(x, m=20)) # 0.02s system.time(b <- cutGn(x, m=20, rcode=TRUE)) # 1.51s identical(a, b) ## End(Not run)