dummyVars creates a full set of dummy variables (i.e. less than full rank parameterization)
dummyVars(formula,...)## Default S3 method:dummyVars(formula, data, sep =".", levelsOnly =FALSE, fullRank =FALSE,...)## S3 method for class 'dummyVars'print(x,...)## S3 method for class 'dummyVars'predict(object, newdata, na.action = na.pass,...)contr.ltfr(n, contrasts =TRUE, sparse =FALSE)class2ind(x, drop2nd =FALSE)
Arguments
formula: An appropriate R model formula, see References
...: additional arguments to be passed to other methods
data: A data frame with the predictors of interest
sep: An optional separator between factor variable names and their levels. Use sep = NULL for no separator (i.e. normal behavior of model.matrix as shown in the Details section)
levelsOnly: A logical; TRUE means to completely remove the variable names from the column names
fullRank: A logical; should a full rank or less than full rank parameterization be used? If TRUE, factors are encoded to be consistent with model.matrix and the resulting there are no linear dependencies induced between the columns.
x: A factor vector.
object: An object of class dummyVars
newdata: A data frame with the required columns
na.action: A function determining what should be done with missing values in newdata. The default is to predict NA.
n: A vector of levels for a factor, or the number of levels.
contrasts: A logical indicating whether contrasts should be computed.
sparse: A logical indicating if the result should be sparse.
drop2nd: A logical: if the factor has two levels, should a single binary vector be returned?
Returns
The output of dummyVars is a list of class 'dummyVars' with elements - call: the function call - form: the model formula
vars: names of all the variables in the model - facVars: names of all the factor variables in the model - lvls: levels of any factor variables - sep: NULL or a character separator - terms: the terms.formula object - levelsOnly: a logical
The predict function produces a data frame.
class2ind returns a matrix (or a vector if drop2nd = TRUE).
contr.ltfr generates a design matrix.
Details
Most of the contrasts functions in R produce full rank parameterizations of the predictor data. For example, contr.treatment creates a reference cell in the data and defines dummy variables for all factor levels except those in the reference cell. For example, if a factor with 5 levels is used in a model formula alone, contr.treatment creates columns for the intercept and all the factor levels except the first level of the factor. For the data in the Example section below, this would produce:
Given a formula and initial data set, the class dummyVars gathers all
the information needed to produce a full set of dummy variables for any data
set. It uses contr.ltfr as the base function to do this.
class2ind is most useful for converting a factor outcome vector to a
matrix (or vector) of dummy variables.
Examples
when <- data.frame(time = c("afternoon","night","afternoon","morning","morning","morning","morning","afternoon","afternoon"), day = c("Mon","Mon","Mon","Wed","Wed","Fri","Sat","Sat","Fri"), stringsAsFactors =TRUE)levels(when$time)<- list(morning="morning", afternoon="afternoon", night="night")levels(when$day)<- list(Mon="Mon", Tue="Tue", Wed="Wed", Thu="Thu", Fri="Fri", Sat="Sat", Sun="Sun")## Default behavior:model.matrix(~day, when)mainEffects <- dummyVars(~ day + time, data = when)mainEffects
predict(mainEffects, when[1:3,])when2 <- when
when2[1,1]<-NApredict(mainEffects, when2[1:3,])predict(mainEffects, when2[1:3,], na.action = na.omit)interactionModel <- dummyVars(~ day + time + day:time, data = when, sep =".")predict(interactionModel, when[1:3,])noNames <- dummyVars(~ day + time + day:time, data = when, levelsOnly =TRUE)predict(noNames, when)head(class2ind(iris$Species))two_levels <- factor(rep(letters[1:2], each =5))class2ind(two_levels)class2ind(two_levels, drop2nd =TRUE)