shapereg function

Shape-Restricted Regression

Shape-Restricted Regression

The regression model yi=f(ti)+xiβ+εi,i=1,,ny_i = f(t_i) + x_i'\beta + \varepsilon_i, i = 1,\ldots,n is considered, where the only assumptions about ff concern its shape. The vector expression for the model is y=θ+Xβ+εy = \theta + X\beta + \varepsilon. XX represents a parametrically modelled covariate, which could be a categorical covariate or a linear term. The shapereg function allows eight shapes: increasing, decreasing, convex, concave, increasing-convex, increasing-concave, decreasing-convex, and decreasing-concave. This routine employs a single cone projection to find θ\theta and β\beta simultaneously.

shapereg(formula, data = NULL, weights = NULL, test = FALSE, nloop = 1e+4)

Arguments

  • formula: A formula object which gives a symbolic description of the model to be fitted. It has the form "response ~ predictor". The response is a vector of length nn. A predictor can be a non-parametrically modelled variable with a shape restriction or a parametrically modelled unconstrained covariate. In terms of a non-parametrically modelled predictor, the user is supposed to indicate the relationship between E(y)E(y) and a predictor tt in the following way:

    • incr(t):: E(y)E(y) is increasing in tt. See incr for more details.
    • decr(t):: E(y)E(y) is decreasing in tt. See decr for more details.
    • conc(t):: E(y)E(y) is concave in tt. See conc for more details.
    • conv(t):: E(y)E(y) is convex in tt. See conv for more details.
    • incr.conc(t):: E(y)E(y) is increasing and concave in tt. See incr.conc for more details.
    • decr.conc(t):: E(y)E(y) is decreasing and concave in tt. See decr.conc for more details.
    • incr.conv(t):: E(y)E(y) is increasing and convex in tt. See incr.conv for more details.
    • decr.conv(t):: E(y)E(y) is decreasing and convex in tt. See decr.conv for more details.
  • data: An optional data frame, list or environment containing the variables in the model. The default is data = NULL.

  • weights: An optional non-negative vector of "replicate weights" which has the same length as the response vector. If weights are not given, all weights are taken to equal 1. The default is weights = NULL.

  • test: The test parameter given by the user.

  • nloop: The number of simulations used to get the p-value for the E01E_{01} test. The default is 1e+4.

Details

This routine constrains θ\theta in the equation y=θ+Xβ+εy = \theta + X\beta + \varepsilon by a shape parameter.

The constraint cone CC has the form {ϕ:ϕ=v+biδi,i=1,,m,b1,,bm0}\{\phi: \phi = v + \sum b_i\delta_i, i = 1,\ldots,m, b_1,\ldots, b_m \ge 0 \}, vv is in VV. The column vectors of XX are in VV, i.e., the linear space contained in the constraint cone.

The hypothesis test H0:ϕH_0: \phi is in VV versus H1:ϕH_1: \phi is in CC is an exact one-sided test, and the test statistic is E01=(SSE0SSE1)/(SSE0)E_{01} = (SSE_0 - SSE_1)/(SSE_0), which has a mixture-of-betas distribution when H0H_0 is true and ε\varepsilon is a vector following a standard multivariate normal distribution with mean 0. The mixing parameters are found through simulations. The number of simulations used to obtain the mixing distribution parameters for the test is 10,000. Such simulations usually take some time. For the "feet" data set used as an example in this section, whose sample size is 39, the time to get a p-value is roughly between 4 seconds.

This routine calls coneB for the cone projection part.

Returns

  • coefs: The estimated coefficients for XX, i.e., the estimation for the vector β\beta. Note that even if the user does not provide a constant vector in XX, the coefficient for the intercept will be returned.

  • constr.fit: The shape-restricted fit over the constraint cone CC of the form c("\\{\\phi: \\phi = v + \\sum b_i\\delta_i, \n", "i = 1,\\ldots,m, b_1,\\ldots, b_m \\ge 0 \\}"), vv is in VV.

  • linear.fit: The least-squares regression of yy on VV, i.e., the linear space contained in the constraint cone. If shape is 3 or shape is 4, VV is spanned by XX and tt. Otherwise, it is spanned by XX. XX must be full column rank, and the matrix formed by combining XX and tt must also be full column rank.

  • se.beta: The standard errors for the estimation of the vector β\beta. The degree of freedom is returned by coneB and is multiplied by 1.5. Note that even if the user does not provide a constant vector in XX, the standard error for the intercept will be returned.

  • pval: The p-value for the hypothesis test H0:ϕH_0: \phi is in VV versus H1:ϕH_1: \phi is in CC. CC is the constraint cone of the form {ϕ:ϕ=v+biδi,i=1,,m,b1,,bm0}\{\phi: \phi = v + \sum b_i\delta_i, i = 1,\ldots,m, b_1,\ldots, b_m \ge 0 \}, vv is in VV, and VV is the linear space contained in the constraint cone. If test == TRUE, a p-value is returned. Otherwise, the test is skipped and no p-value is returned.

  • pvals.beta: The approximate p-values for the estimation of the vector β\beta. A t-distribution is used as the approximate distribution. Note that even if the user does not provide a constant vector in XX, the approximate p-value for the intercept will be returned.

  • test: The test parameter given by the user.

  • SSE0: The sum of squared residuals for the linear part.

  • SSE1: The sum of squared residuals for the full model.

  • shape: A number showing the shape constraint given by the user in a shapereg fit.

  • tms: The terms objects extracted by the generic function terms from a shapereg fit.

  • zid: A vector keeping track of the position of the parametrically modelled covariate.

  • vals: A vector storing the levels of each variable used as a factor.

  • zid1: A vector keeping track of the beginning position of the levels of each variable used as a factor.

  • zid2: A vector keeping track of the end position of the levels of each variable used as a factor.

  • tnm: The name of the shape-restricted predictor.

  • ynm: The name of the response variable.

  • znms: A vector storing the name of the parametrically modelled covariate.

  • is_param: A logical scalar showing if or not a variable is a parametrically modelled covariate, which could be a factor or a linear term.

  • is_fac: A logical scalar showing if or not a variable is a factor.

  • xmat: A matrix whose columns represent the parametrically modelled covariate.

  • call: The matched call.

References

Raubertas, R. F., C.-I. C. Lee, and E. V. Nordheim (1986) Hypothesis tests for normals means constrained by linear inequalities. Communications in Statistics - Theory and Methods 15 (9), 2809--2833.

Robertson, T., F. Wright, and R. Dykstra (1988) Order Restricted Statistical Inference

New York: John Wiley and Sons.

Fraser, D. A. S. and H. Massam (1989) A mixed primal-dual bases algorithm for regression under inequality constraints application to concave regression. Scandinavian Journal of Statistics 16, 65--74.

Meyer, M. C. (2003) A test for linear vs convex regression function using shape-restricted regression. Biometrika 90(1), 223--232.

Cheng, G.(2009) Semiparametric additive isotonic regression. Journal of Statistical Planning and Inference 139, 1980--1991.

Meyer, M.C.(2013a) Semiparametric additive constrained regression. Journal of Nonparametric Statistics 25(3), 715--743.

Liao, X. and M. C. Meyer (2014) coneproj: An R package for the primal or dual cone projections with routines for constrained regression. Journal of Statistical Software 61(12), 1--22.

Author(s)

Mary C. Meyer and Xiyue Liao

See Also

coneB

Examples

# load the feet data set data(feet) # extract the continuous and constrained predictor l <- feet$length # extract the continuous response w <- feet$width # extract the categorical covariate: sex s <- feet$sex # make an increasing fit with test set as FALSE ans <- shapereg(w ~ incr(l) + factor(s)) # check the summary table summary(ans) # make an increasing fit with test set as TRUE ans <- shapereg(w ~ incr(l) + factor(s), test = TRUE, nloop = 1e+3) # check the summary table summary(ans) # make a plot comparing the unconstrained fit and the constrained fit par(mar = c(4, 4, 1, 1)) ord <- order(l) plot(sort(l), w[ord], type = "n", xlab = "foot length (cm)", ylab = "foot width (cm)") title("Shapereg Example Plot") # sort l according to sex ord1 <- order(l[s == "G"]) ord2 <- order(l[s == "B"]) # make the scatterplot of l vs w for boys and girls points(sort(l[s == "G"]), w[s == "G"][ord1], pch = 21, col = 1) points(sort(l[s == "B"]), w[s == "B"][ord2], pch = 24, col = 2) # make an unconstrained fit to boys and girls fit <- lm(w ~ l + factor(s)) # plot the unconstrained fit lines(sort(l), (coef(fit)[1] + coef(fit)[2] * l + coef(fit)[3])[ord], lty = 2) lines(sort(l), (coef(fit)[1] + coef(fit)[2] * l)[ord], lty = 2, col = 2) legend(21.5, 9.8, c("boy","girl"), pch = c(24, 21), col = c(2, 1)) # plot the constrained fit lines(sort(l), (ans$constr.fit - ans$linear.fit + coef(ans)[1])[ord], col = 1) lines(sort(l), (ans$constr.fit - ans$linear.fit + coef(ans)[1] + coef(ans)[2])[ord], col = 2)
  • Maintainer: Xiyue Liao
  • License: GPL (>= 2)
  • Last published: 2025-02-19

Useful links