gendata function

Generate Data Frame with Predictor Combinations

Generate Data Frame with Predictor Combinations

If nobs is not specified, allows user to specify predictor settings by e.g. age=50, sex="male", and any omitted predictors are set to reference values (default=median for continuous variables, first level for categorical ones - see datadist). If any predictor has more than one value given, expand.grid is called to generate all possible combinations of values, unless expand=FALSE. If nobs is given, a data frame is first generated which has nobs of adjust-to values duplicated. Then an editor window is opened which allows the user to subset the variable names down to ones which she intends to vary (this streamlines the data.ed step). Then, if any predictors kept are discrete and viewvals=TRUE, a window (using page) is opened defining the possible values of this subset, to facilitate data editing. Then the data.ed function is invoked to allow interactive overriding of predictor settings in the nobs rows. The subset of variables are combined with the other predictors which were not displayed with data.ed, and a final full data frame is returned. gendata is most useful for creating a newdata data frame to pass to predict.

gendata(fit, ..., nobs, viewvals=FALSE, expand=TRUE, factors)

Arguments

  • fit: a fit object created with rms in effect

  • ...: predictor settings, if nobs is not given.

  • nobs: number of observations to create if doing it interactively using X-windows

  • viewvals: if nobs is given, set viewvals=TRUE to open a window displaying the possible value of categorical predictors

  • expand: set to FALSE to prevent expand.grid from being called, and to instead just convert to a data frame.

  • factors: a list containing predictor settings with their names. This is an alternative to specifying the variables separately in .... Unlike the usage of ..., variables getting default ranges in factors

    should have NA as their value.

Returns

a data frame with all predictors, and an attribute names.subset if nobs is specified. This attribute contains the vector of variable names for predictors which were passed to de and hence were allowed to vary. If neither nobs nor any predictor settings were given, returns a data frame with adjust-to values.

Side Effects

optionally writes to the terminal, opens X-windows, and generates a temporary file using sink.

Details

if you have a variable in ... that is named n, no, nob, nob, add nobs=FALSE to the invocation to prevent that variable from being misrecognized as nobs

Author(s)

Frank Harrell

Department of Biostatistics

Vanderbilt University

fh@fharrell.com

See Also

predict.rms, survest.cph, survest.psm, rmsMisc, expand.grid, de, page, print.datadist, Predict

Examples

set.seed(1) age <- rnorm(200, 50, 10) sex <- factor(sample(c('female','male'),200,TRUE)) race <- factor(sample(c('a','b','c','d'),200,TRUE)) y <- sample(0:1, 200, TRUE) dd <- datadist(age,sex,race) options(datadist="dd") f <- lrm(y ~ age*sex + race) gendata(f) gendata(f, age=50) d <- gendata(f, age=50, sex="female") # leave race=reference category d <- gendata(f, age=c(50,60), race=c("b","a")) # 4 obs. d$Predicted <- predict(f, d, type="fitted") d # Predicted column prints at the far right options(datadist=NULL) ## Not run: d <- gendata(f, nobs=5, view=TRUE) # 5 interactively defined obs. d[,attr(d,"names.subset")] # print variables which varied predict(f, d) ## End(Not run)
  • Maintainer: Frank E Harrell Jr
  • License: GPL (>= 2)
  • Last published: 2025-01-17