Group Mean Center: Generate group summaries and individual deviations within groups
Group Mean Center: Generate group summaries and individual deviations within groups
Multilevel modelers often need to include predictors like the within-group mean and the deviations of individuals around the mean. This function makes it easy (almost foolproof) to calculate those variables.
x: Variable names or a vector of variable names. Do NOT supply a variable like dat$x1, do supply a quoted variable name "x1" or a vector c("x1", "x2")
by: A grouping variable name or a vector of grouping names. Do NOT supply a variable like dat$xfactor, do supply a name "xfactor", or a vector c("xfac1", "xfac2").
FUN: Defaults to the mean, have not tested alternatives
suffix: The suffixes to be added to column 1 and column 2
fulldataframe: Default TRUE. original data frame is returned with new columna added (which I would call "Stata style"). If FALSE, this will return only newly created columns, the variables with suffix[1] and suffix[2] appended to names. TRUE is easier (maybe safer), but also wastes memory.
Returns
Depending on fulldataframe, either a new data frame with center and deviation columns, or or original data frame with "x_mn" and "x_dev" variables appended (Stata style).
Details
This was originally just for "group mean-centered" data, but now is more general, can accept functions like median to calculate center and then deviations about that center value within the group.
Similar to Stata egen, except more versatile and fun! Will create 2 new columns for each variable, with suffixes for the summary and deviations (default suffixes are "_mn" and "_dev". Rows will match the rows of the original data frame, so it will be easy to merge or cbind them back together.
Examples
## Make a data frame out of the state data collection (see ?state)data(state)statenew <- as.data.frame(state.x77)statenew$region <- state.region
statenew$state <- rownames(statenew)head(statenew.gmc1 <- gmc(statenew, c("Income","Population"), by ="region"))head(statenew.gmc2 <- gmc(statenew, c("Income","Population"), by ="region", fulldataframe =FALSE))## Note dangerous step: assumes row alignment is correct.## return has rownames from original set to identify dangerhead(statenew2 <- cbind(statenew, statenew.gmc2))if(!all.equal(rownames(statenew), rownames(statenew.gmc2))){ warning("Data row-alignment probable error")}## The following box plots should be identicalboxplot(Income ~ region, statenew.gmc1)boxplot((Income_mn + Income_dev)~ region, statenew.gmc1)## Multiple by variablesfakedat <- data.frame(i =1:200, j = gl(4,50), k = gl(20,10), y1 = rnorm(200), y2 = rnorm(200))head(gmc(fakedat,"y1", by ="k"),20)head(gmc(fakedat,"y1", by = c("j","k"), fulldataframe =FALSE),40)head(gmc(fakedat, c("y1","y2"), by = c("j","k"), fulldataframe =FALSE))## Check missing value managementfakedat[2,"k"]<-NAfakedat[4,"j"]<-NA##' head(gmc(fakedat, "y1", by = "k"), 20)head(gmc(fakedat,"y1", by = c("j","k"), fulldataframe =FALSE),40)