Constrained base-learners for fitting effects of scalar covariates in models with functional response
bbsc(..., by =NULL, index =NULL, knots =10, boundary.knots =NULL, degree =3, differences =2, df =4, lambda =NULL, center =FALSE, cyclic =FALSE)bolsc(..., by =NULL, index =NULL, intercept =TRUE, df =NULL, lambda =0, K =NULL, weights =NULL, contrasts.arg ="contr.treatment")brandomc(..., contrasts.arg ="contr.dummy", df =4)
Arguments
...: one or more predictor variables or one matrix or data frame of predictor variables.
by: an optional variable defining varying coefficients, either a factor or numeric variable.
index: a vector of integers for expanding the variables in ....
knots: either the number of knots or a vector of the positions of the interior knots (for more details see bbs).
boundary.knots: boundary points at which to anchor the B-spline basis (default the range of the data). A vector (of length 2) for the lower and the upper boundary knot can be specified.
degree: degree of the regression spline.
differences: a non-negative integer, typically 1, 2 or 3. If differences = k, k-th-order differences are used as a penalty (0-th order differences specify a ridge penalty).
df: trace of the hat matrix for the base-learner defining the base-learner complexity. Low values of df correspond to a large amount of smoothing and thus to "weaker" base-learners.
lambda: smoothing parameter of the penalty, computed from df when df is specified.
center: See bbs.
cyclic: if cyclic = TRUE the fitted values coincide at the boundaries (useful for cyclic covariates such as day time etc.).
intercept: if intercept = TRUE an intercept is added to the design matrix of a linear base-learner.
K: in bolsc it is possible to specify the penalty matrix K
weights: experiemtnal! weights that are used for the computation of the transformation matrix Z.
contrasts.arg: Note that a special contrasts.arg exists in package mboost, namely "contr.dummy". This contrast is used per default in brandomc. It leads to a dummy coding as returned by model.matrix(~ x - 1) were the intercept is implicitly included but each factor level gets a separate effect estimate (for more details see brandom).
Returns
Equally to the base-learners of package mboost:
An object of class blg (base-learner generator) with a dpp function (data pre-processing) and other functions.
The call to dpp returns an object of class bl (base-learner) with a fit function. The call to fit finally returns an object of class bm (base-model).
Details
The base-learners bbsc, bolsc and brandomc are the base-learners bbs, bols and brandom with additional identifiability constraints. The constraints enforce that ∑ih^(xi,t)=0 for all t, so that effects varying over t can be interpreted as deviations from the global functional intercept, see Web Appendix A of Scheipl et al. (2015). The constraint is enforced by a basis transformation of the design and penalty matrix. In particular, it is sufficient to apply the constraint on the covariate-part of the design and penalty matrix and thus, it is not necessary to change the basis in t-direction. See Appendix A of Brockhaus et al. (2015) for technical details on how to enforce this sum-to-zero constraint.
Cannot deal with any missing values in the covariates.
Examples
#### simulate data with functional response and scalar covariate (functional ANOVA)n <-60## number of casesGy <-27## number of observation poionts per response curve dat <- list()dat$t <-(1:Gy-1)^2/(Gy-1)^2set.seed(123)dat$z1 <- rep(c(-1,1), length = n)dat$z1_fac <- factor(dat$z1, levels = c(-1,1), labels = c("1","2"))# dat$z1 <- runif(n)# dat$z1 <- dat$z1 - mean(dat$z1)# mean and standard deviation for the functional response mut <- matrix(2*sin(pi*dat$t), ncol = Gy, nrow = n, byrow =TRUE)+ outer(dat$z1, dat$t,function(z1, t) z1*cos(pi*t))# true linear predictorsigma <-0.1# draw respone y_i(t) ~ N(mu_i(t), sigma)dat$y <- apply(mut,2,function(x) rnorm(mean = x, sd = sigma, n = n))## fit function-on-scalar model with a linear effect of z1m1 <- FDboost(y ~1+ bolsc(z1_fac, df =1), timeformula =~ bbs(t, df =6), data = dat)# look for optimal mSTOP using cvrisk() or validateFDboost()cvm <- cvrisk(m1, grid =1:500)m1[mstop(cvm)]m1[200]# use 200 boosting iterations # plot true and estimated coefficients plot(dat$t,2*sin(pi*dat$t), col =2, type ="l", main ="intercept")plot(m1, which =1, lty =2, add =TRUE)plot(dat$t,1*cos(pi*dat$t), col =2, type ="l", main ="effect of z1")lines(dat$t,-1*cos(pi*dat$t), col =2, type ="l")plot(m1, which =2, lty =2, col =1, add =TRUE)
References
Brockhaus, S., Scheipl, F., Hothorn, T. and Greven, S. (2015): The functional linear array model. Statistical Modelling, 15(3), 279-300.
Scheipl, F., Staicu, A.-M. and Greven, S. (2015): Functional Additive Mixed Models, Journal of Computational and Graphical Statistics, 24(2), 477-501.
See Also
FDboost for the model fit. bbs, bols
and brandom for the corresponding base-learners in mboost.