Base fitting functions in R will seek variables in the environment where the formula was defined (i.e., typically in the global environment), if they are not in the data. This increases the memory size of fit objects (as the formula and attached environment are part of such objects). This also easily leads to errors (see example in the discussion of update.HLfit). Indeed Chambers (2008, p.221), after describing how the environment is defined, comments that Where clear and trustworthy software is a priority, I would personallyavoid such tricks. Ideally, all the variables in the model frame shouldcome from an explicit, verifiable data source... . Fitting functions in spaMM try to adhere to such a principle, as they assume by default that all variables from the formula should be in the data argument (and then, one never needs to specify “‘data’”inthe‘formula’..‘spaMM‘implementsthisbydefaultbystrippingtheformulaenvironmentfromanyvariable.Itisalsopossibletoassignagivenenvironmenttotheformula,throughthecontrol‘control.HLfitformula_env`: see Examples.
The variables defining the prior.weights should also be in the data. However, the implementation of the prior.weights argument has limitations that can be overcome by using the more recently introduced weights.formula argument of spaMM fitting functions (see Examples, where this is also compared with stats::lm's handling of its weights argument).
However, variables used in other arguments such as ranFix are looked up neither in the data nor in the formula environment, but in the calling environment as usual.
References
Chambers J.M. (2008) Software for data analysis: Programming with R. Springer-Verlag New York
Examples
####### Controlling the formula environmentset.seed(123)d2 <- data.frame(y = seq(10)/2+rnorm(5)[gl(5,2)], x1 = sample(10), grp=gl(5,2), seq10=seq(10))# Using only variables in the data: basic usage# HLfit(y ~ x1 + seq10+(1|grp), data = d2)# is practically equivalent toHLfit(y ~ x1 + seq10+(1|grp), data = d2, control.HLfit = list(formula_env=list2env(list(data=d2))))# # The 'formula_env' avoids the need for the 'seq10' variable:HLfit(y ~ x1 + I(seq_len(nrow(data)))+(1|grp), data = d2, control.HLfit = list(formula_env=list2env(list(data=d2))))## Internal imprementation exploits partial matching of argument names# so that this can also be in 'control' if 'control.HLfit' is absent: fitme(y ~ x1 + I(seq_len(nrow(data)))+(1|grp), data = d2, control = list(formula_env=list2env(list(data=d2))))####### Prior-weights miserydata("hills", package="MASS")(fit <- lm(time ~ dist + climb, data = hills, weights=1/dist^2))# same as(fit <- fitme(time ~ dist + climb, data = hills, prior.weights=1/dist^2, method="REML"))# possible calls:(fit <- fitme(time ~ dist + climb, data = hills, prior.weights=quote(1/dist^2)))(fit <- fitme(time ~ dist + climb, data = hills, prior.weights=1/hills$dist^2))(fit <- fitme(time ~ dist + climb, data = hills, weights.form=~1/dist^2))(fit <- fitme(time ~ dist + climb, data = hills, weights.form=~ I(1/dist^2)))# Also syntactically correct since 'dist' is found in the data:(fit <- fitme(time ~ dist + climb, data = hills, weights.form=~ rep(2,length(dist))))#### Programming with prior weights:## Different ways of passing prior weights to fitme() from another function:wrap_as_form <-function(weights.form){ fitme(time ~ dist + climb, data = hills, weights.form=weights.form)}wrap_as_pw <-function(prior.weights){ fitme(time ~ dist + climb, data = hills, prior.weights=prior.weights)}wrap_as_dots <-function(...){ fitme(time ~ dist + climb, data = hills,...)}## Similarly for lm:wrap_lm_as_dots <-function(...){ lm(time ~ dist + climb, data = hills,...)}wrap_lm_as_arg <-function(weights){ lm(time ~ dist + climb, data = hills,weights=weights)}## Programming errors with stats::lm():pw <- rep(1e-6,35)# or even NULL(fit <- wrap_lm_as_arg(weights=pw))# catches weights from global envir!(fit <- lm(time ~ dist + climb, data = hills, weights=pw))# idem!(fit <- lm(time ~ dist + climb, data = hills, weights=hills$pw))# fails silently - no $pw in 'hills'(fit <- wrap_lm_as_dots(weights=hills$pw))# idem!(fit <- wrap_lm_as_arg(weights=hills$pw))# idem!## Safer spaMM results:try(fit <- wrap_as_pw(prior.weights= pw))# correctly catches problemtry(fit <- wrap_as_dots(prior.weights=hills$pw))# correctly catches problem(fit <- wrap_as_dots(prior.weights=1/dist^2))# correct(fit <- wrap_as_dots(prior.weights=quote(1/dist^2)))# correct## But 'prior.weights' limitations: try(fit <- wrap_as_pw(prior.weights=1/hills$dist^2))# fails (stop)try(fit <- wrap_as_pw(prior.weights=1/dist^2))# fails (stop)try(fit <- wrap_as_pw(prior.weights= quote(1/dist^2)))# fails (stop)## Problems all solved by using 'weights.form':try(fit <- wrap_as_form(weights.form=~ pw))# correctly catches problem(fit <- wrap_as_form(weights.form=~1/dist^2))# correct(fit <- wrap_as_form(weights.form=~1/hills$dist^2))# correct(fit <- wrap_as_dots(weights.form=~1/dist^2))# correctrm("pw")