ordGroupBoot function

Minimally Group an Ordinal Variable So Bootstrap Samples Will Contain All Distinct Values

Minimally Group an Ordinal Variable So Bootstrap Samples Will Contain All Distinct Values

When bootstrapping models for ordinal Y when Y is fairly continuous, it is frequently the case that one or more bootstrap samples will not include one or more of the distinct original Y values. When fitting an ordinal model (including a Cox PH model), this means that an intercept cannot be estimated, and the parameter vectors will not align over bootstrap samples. To prevent this from happening, some grouping of Y may be necessary. The ordGroupBoot function uses cutGn() to group Y so that the minimum number in any group is guaranteed to not exceed a certain integer m. ordGroupBoot tries a range of m and stops at the lowest m such that either all B tested bootstrap samples contain all the original distinct values of Y (if B>0), or that the probability that a given sample of size n with replacement will contain all the distinct original values exceeds aprob (B=0). This probability is computed approximately using an approximation to the probability of complete sample coverage from the coupon collector's problem and is quite accurate for our purposes.

ordGroupBoot( y, B = 0, m = 7:min(15, floor(n/3)), what = c("mean", "factor", "m"), aprob = 0.9999, pr = TRUE )

Arguments

  • y: a numeric vector
  • B: number of bootstrap samples to test, or zero to use a coverage probability approximation
  • m: range of minimum group sizes to test; the default range is usually adequate
  • what: specifies that either the mean y in each group should be returned, a factor version of this with interval endpoints in the levels, or the computed value of m should be returned
  • aprob: minimum coverage probability sought
  • pr: set to FALSE to not print the computed value of the minimum m satisfying the needed condition

Returns

a numeric vector corresponding to y but grouped, containing eithr the mean of y in each group or a factor variable representing grouped y, either with the minimum m that satisfied the required sample covrage

Examples

set.seed(1) x <- c(1:6, NA, 7:22) ordGroupBoot(x, m=5:10) ordGroupBoot(x, m=5:10, B=5000, what='factor')

See Also

cutGn()

Author(s)

Frank Harrell