cut2 is a function like cut but left endpoints are inclusive and labels are of the form [lower, upper), except that last interval is [lower,upper]. If cuts are given, will by default make sure that cuts include entire range of x. Also, if cuts are not given, will cut x into quantile groups (g given) or groups with a given minimum number of observations (m). Whereas cut creates a category object, cut2 creates a factor object. m is not guaranteed but is a target.
cutGn guarantees that the grouped variable will have a minimum of m observations in any group. This is done by an exhaustive algorithm that runs fast due to being coded in Fortran.
m: desired minimum number of observations in a group. The algorithm does not guarantee that all groups will have at least m observations.
g: number of quantile groups
levels.mean: set to TRUE to make the new categorical vector have levels attribute that is the group means of x instead of interval endpoint labels
digits: number of significant digits to use in constructing levels. Default is 3 (5 if levels.mean=TRUE)
minmax: if cuts is specified but min(x)<min(cuts) or max(x)>max(cuts), augments cuts to include min and max x
oneval: if an interval contains only one unique value, the interval will be labeled with the formatted version of that value instead of the interval endpoints, unless oneval=FALSE
onlycuts: set to TRUE to only return the vector of computed cuts. This consists of the interior values plus outer ranges.
formatfun: formatting function, supports formula notation (if rlang is installed)
...: additional arguments passed to formatfun
what: specifies the kind of vector values to return from cutGn, the default being like 'levels.mean' of cut2. Specify 'summary' to return a numeric 3-column matrix that summarizes the intervals satisfying the m requirement. Use what='cuts' to only return the vector of computed cutpoints. To create a function that will recode the variable in play using the same intervals as computed by cutGn, specify what='function'. This function will have a what argument to allow the user to decide later whether to recode into interval means or into a factor variable.
rcode: set to TRUE to run the cutgn algorithm in R. This is useful for speed comparisons with the default compiled code.
Returns
a factor variable with levels of the form [a,b) or formatted means (character strings) unless onlycuts is TRUE in which case a numeric vector is returned
See Also
cut, quantile, combine.levels
Examples
set.seed(1)x <- runif(1000,0,100)z <- cut2(x, c(10,20,30))table(z)table(cut2(x, g=10))# quantile groupstable(cut2(x, m=50))# group x into intevals with at least 50 obs.table(cutGn(x, m=50, what='factor'))f <- cutGn(x, m=50, what='function')f
f(c(-1,2,10), what='mean')f(c(-1,2,10), what='factor')## Not run: x <- round(runif(200000),3) system.time(a <- cutGn(x, m=20))# 0.02s system.time(b <- cutGn(x, m=20, rcode=TRUE))# 1.51s identical(a, b)## End(Not run)