samplingDistCalculation function

samplingDistCalculation

samplingDistCalculation

Internal function to set up subsampling distribution to execute the stochastic version of a stagewise approach. The subsampling is coducted at the cluster level, not the individual observation level. Sampling probabilities are first calculated or provided for each observation individually, and then the sampling probability for each cluster is taken to be the average probability across all observations in the cluster.

samplingDistCalculation(sampleProb, y, x, clusterID, waves, beta, beta0, phi, alpha, offset, meanLinkInv, varianceLink, corstr, mu.eta)

Arguments

  • sampleProb: A user provided value for the probability associated with each observation. sampleProb can be provided as 1) a vector of fixed values of length equal to the resposne vector y, 2) a function that takes in a list of values (full list of values given in details) and returns a vector of length equal to the response vector y, or 3) the default value of NULL, which results in a uniform distribution

  • y: The vector of the response values provided to the original stagewise function

  • x: The covariate matrix provided to the original stagewise function

  • clusterID: The vector of cluster ID numbers provided to the original stagewise function

  • waves: The waves parameter identifying the order of observations within the clusters that is provided to the original stagewise function

  • beta: The vector of the current estimates of the coefficients

  • beta0: The current estimate of the intercept

  • phi: Current estimate of the scale parameter

  • alpha: Current estimate of the parameter affecting the within cluster correlation

  • offset: offset in the linear predictor provided to the original stagewise function

  • meanLinkInv: The link inverse function from the family

    object provided to the original stagewise function indicating what family of mean and variance structure is assumed

  • varianceLink: The variance link function from the family

    object provided to the original stagewise function indicating what family of mean and variance structure is assumed

  • corstr: The structure of the working correlation matrix that was provided to the original stagewise function

  • mu.eta: Derivative function of mu, the conditional mean of the response, with respect to eta, the linear predictor, from the family

    object provided to the original stagewise function indicating what family of mean and variance structure is assumed

Returns

The sampling distribution probabilities to be used for the sub sampling. distribution is provided as a vector with length equal to the number of clusters.

Note

Internal function.

The function provided to sampleProb (through the sgee.control function) needs to calculate probabilities for each observation in the response vector y. How these calculations are done is up to the user and the following values are provided to the sampleProb function as a list called values: y, x, clusterID, waves, beta, beta0, phi, alpha, offset, meanLinkInv, varianceLink, corstr, mu.eta. additionally, all of the values produced by sampleProb need to be non-negative.

Author(s)

Gregory Vaughan

  • Maintainer: Gregory Vaughan
  • License: GPL (>= 3)
  • Last published: 2018-01-08

Useful links