Function to generate data that can be used to test Forward stagewise / Penalized Regression techniques. Currently marginally Gaussian and Poisson responses are possible.
Function is provided to allow the user simple data generation as sgee functions were designed for. Various parameters controlling aspects such as the response correlation, the covariate group structure, the marginal response distribution, and the signal to noise ratio for marginally gaussian responses are provided to allow a great deal of specificity over the kind of data that is generated.
clusterCorstr: String indicating cluster Correlation structure. Parameter is fed to genCorMat, so all possible entries for genCorMat are allowed.
yVariance: Optional scalar value specifying the marginal response variance; overrides SNR.
xVariance: Scalar value indicating marginal variance of the covariates.
numGroups: Number of covariate groups to be generated. Default behavior is to generate groups of size 1 (effectively no groups). If covariate groups are desired, numGroups and groupSize
must be given such that length(beta) equals numGroups * groupSize.
groupSize: Size of each group.
groupRho: Within group correlation parameter.
beta: Vector of coefficient values used to generate response.
numMainEffects: An integer indicating that the first numMainEffects terms in beta are to be treated as main effects and the remaining terms are pairwise interaction effects, which are in the same order as generated by model.matrix. Default value of NULL indicates no interaction terms are included. The use of numMainEffects overrides any covariate grouping structure provided by the user.
family: Marginal response family; currently gaussian() and poisson() are accepted.
SNR: Scalar value that allows fixing the signal to noise ratio as defined as the ratio of the (observed) variance in the linear predictor to the variance of the response conditioned on the covariates.
intercept: Scalar value indicating the true intercept value.
Returns
List containing the generated response, y, the generated covariates, x, a vector identifying the responses clusters, clusterID, and a vector identifying the covariate groups, groupID.
Note
Function is ued to generate both the desired covariate structure and the desired response structure. To generate poisson responses, functions from the R package coupla are used.
Current implementation of interactions overwrites any previous grouping structure; that is the number of groups becomes p and the group sizes are set to 1.
Examples
## A resonse variance can be given,dat1 <- genData(numClusters =10, clusterSize =4, clusterRho =.5, clusterCorstr ="exchangeable", yVariance =1, xVariance =1, numGroups =5, groupSize =4, groupRho =.5, beta = c(rep(1,8), rep(0,12)), family = gaussian(), intercept =1)## or the signal to noise ratio can be fixeddat2 <- genData(numClusters =10, clusterSize =4, clusterRho =.5, clusterCorstr ="exchangeable", xVariance =1, numGroups =5, groupSize =4, groupRho =.5, beta = c(rep(1,8), rep(0,12)), family = poisson(), SNR =10, intercept =1)