This function carries out a functional regression analysis, where either the dependent variable or one or more independent variables are functional. Non-functional variables may be used on either side of the equation. In a simple problem where there is a single scalar independent covariate with values zi,i=1,…,N and a single functional covariate with values xi(t), the two versions of the model fit by fRegress are the scalar dependent variable model
In these models, the final term ei or ei(t) is a residual, lack of fit or error term.
In the concurrent functional linear model for a functional dependent variable, all functional variables are all evaluated at a common time or argument value t. That is, the fit is defined in terms of the behavior of all variables at a fixed time, or in terms of "now" behavior.
All regression coefficient functions βj(t) are considered to be functional. In the case of a scalar dependent variable, the regression coefficient for a scalar covariate is converted to a functional variable with a constant basis. All regression coefficient functions can be forced to be smooth through the use of roughness penalties, and consequently are specified in the argument list as functional parameter objects.
Description
This function carries out a functional regression analysis, where either the dependent variable or one or more independent variables are functional. Non-functional variables may be used on either side of the equation. In a simple problem where there is a single scalar independent covariate with values zi,i=1,…,N and a single functional covariate with values xi(t), the two versions of the model fit by fRegress are the scalar dependent variable model
yi=β1zi+∫xi(t)β2(t)dt+ei
and the concurrent functional dependent variable model
yi(t)=β1(t)zi+β2(t)xi(t)+ei(t).
In these models, the final term ei or ei(t) is a residual, lack of fit or error term.
In the concurrent functional linear model for a functional dependent variable, all functional variables are all evaluated at a common time or argument value t. That is, the fit is defined in terms of the behavior of all variables at a fixed time, or in terms of "now" behavior.
All regression coefficient functions βj(t) are considered to be functional. In the case of a scalar dependent variable, the regression coefficient for a scalar covariate is converted to a functional variable with a constant basis. All regression coefficient functions can be forced to be smooth through the use of roughness penalties, and consequently are specified in the argument list as functional parameter objects.
fRegress(y,...)## S3 method for class 'fd'fRegress(y, xfdlist, betalist, wt=NULL, y2cMap=NULL, SigmaE=NULL, returnMatrix=FALSE, method=c('fRegress','model'), sep='.',...)## S3 method for class 'double'fRegress(y, xfdlist, betalist, wt=NULL, y2cMap=NULL, SigmaE=NULL, returnMatrix=FALSE,...)## S3 method for class 'formula'fRegress(y, data=NULL, betalist=NULL, wt=NULL, y2cMap=NULL, SigmaE=NULL, method='fRegress', sep='.',...)## S3 method for class 'character'fRegress(y, data=NULL, betalist=NULL, wt=NULL, y2cMap=NULL, SigmaE=NULL, method='fRegress', sep='.',...)
Arguments
y: the dependent variable object. It may be an object of five possible classes or attributes:
character or formula: a formula object or a character object that can be coerced into a formula providing a symbolic description of the model to be fitted satisfying the following rules:
The left hand side, `formula` `y`, must be either a numeric vector or a univariate object of class `fd`.
All objects named on the right hand side must be either `numeric` or `fd` (functional data). The number of replications of `fd`
object(s) must match each other and the number of observations of `numeric` objects named, as well as the number of replications of the dependent variable object. The right hand side of this `formula` is translated into `xfdlist`, then passed to another method for fitting (unless `method`
= 'model'). Multivariate independent variables are allowed in a `formula` and are split into univariate independent variables in the resulting `xfdlist`. Similarly, categorical independent variables with $k$ levels are translated into $k-1$ contrasts in `xfdlist`. Any smoothing information is passed to the corresponding component of `betalist`.
numeric: a numeric vector object or a matrix object if the dependent variable is numeric or a matrix.
fd: a functional data object or an fdPar object if the dependent variable is functional.
data: an optional list or data.frame containing names of objects identified in the formula or character
y.
xfdlist: a list of length equal to the number of independent variables (including any intercept). Members of this list are the independent variables. They can be objects of either of these two classes:
scalar: a numeric vector if the independent variable is scalar.
fd: a (univariate) functional data object.
In either case, the object must have the same number of replications as the dependent variable object. That is, if it is a scalar, it must be of the same length as the dependent variable, and if it is functional, it must have the same number of replications as the dependent variable. (Only univariate independent variables are currently allowed in xfdlist.)
betalist: For the fd, fdPar, and numeric methods, betalist must be a list of length equal to length(xfdlist). Members of this list are functional parameter objects (class fdPar) defining the regression functions to be estimated. Even if a corresponding independent variable is scalar, its regression coefficient must be functional if the dependent variable is functional. (If the dependent variable is a scalar, the coefficients of scalar independent variables, including the intercept, must be constants, but the coefficients of functional independent variables must be functional.) Each of these functional parameter objects defines a single functional data object, that is, with only one replication.
For the formula and character methods, betalist
can be either a list, as for the other methods, or NULL, in which case a list is created. If betalist is created, it will use the bases from the corresponding component of xfdlist if it is function or from the response variable. Smoothing information (arguments Lfdobj, lambda, estimate, and penmat of function fdPar) will come from the corresponding component of xfdlist if it is of class fdPar (or for scalar independent variables from the response variable if it is of class fdPar) or from optional ... arguments if the reference variable is not of class fdPar.
wt: weights for weighted least squares
y2cMap: the matrix mapping from the vector of observed values to the coefficients for the dependent variable. This is output by function smooth.basis. If this is supplied, confidence limits are computed, otherwise not.
SigmaE: Estimate of the covariances among the residuals. This can only be estimated after a preliminary analysis with fRegress.
method: a character string matching either fRegress for functional regression estimation or mode without running it.
sep: separator for creating names for multiple variables for fRegress.fdPar or fRegress.numeric created from single variables on the right hand side of the formulay. This happens with multidimensional fd objects as well as with categorical variables.
returnMatrix: logical: If TRUE, a two-dimensional is returned using a special class from the Matrix package.
...: optional arguments
Details
Alternative forms of functional regression can be categorized with traditional least squares using the following 2 x 2 table:
explanatory
variable
response
|
scalar
|
function
|
|
scalar
|
lm
|
fRegress.numeric
|
|
function
|
fRegress.fd or
|
fRegress.fd or
|
fRegress.fdPar
|
fRegress.fdPar or linmod
For fRegress.numeric, the numeric response is assumed to be the sum of integrals of xfd * beta for all functional xfd terms.
fRegress.fd or .fdPar produces a concurrent regression with each beta being also a (univariate) function.
linmod predicts a functional response from a convolution integral, estimating a bivariate regression function.
In the computation of regression function estimates in fRegress, all independent variables are treated as if they are functional. If argument xfdlist contains one or more vectors, these are converted to functional data objects having the constant basis with coefficients equal to the elements of the vector.
Needless to say, if all the variables in the model are scalar, do NOT use this function. Instead, use either lm or lsfit.
These functions provide a partial implementation of Ramsay and Silverman (2005, chapters 12-20).
Returns
These functions return either a standard fRegress fit object or or a model specification: - The fRegress fit object case:: A list of class fRegress with the following components:
- **y:**: The first argument in the call to `fRegress`. This argument is coerced to `class` `fd` in fda version 5.1.9. Prior versions of the package converted it to an `fdPar`, but the extra structures in that class were not used in any of the `fRegress` codes.
- **xfdlist:**: The second argument in the call to `fRegress`.
- **betalist:**: The third argument in the call to `fRegress`.
- **betaestlist:**: A list of length equal to the number of independent variables and with members having the same functional parameter structure as the corresponding members of `betalist`. These are the estimated regression coefficient functions.
- **yhatfdobj:**: A functional parameter object (class `fdPar`) if the dependent variable is functional or a vector if the dependent variable is scalar. This is the set of predicted by the functional regression model for the dependent variable.
- **Cmatinv:**: A matrix containing the inverse of the coefficient matrix for the linear equations that define the solution to the regression problem. This matrix is required for function `fRegress.stderr` that estimates confidence regions for the regression coefficient function estimates.
- **wt:**: The vector of weights input or inferred.
If `class(y)` is numeric, the `fRegress` object also includes:
- **df:**: The equivalent degrees of freedom for the fit.
- **OCV**: the leave-one-out cross validation score for the model.
- **gcv:**: The generalized cross validation score.
If `class(y)` is `fd` or `fdPar`, the `fRegress`
object returned also includes 5 other components:
- **y2cMap:**: An input `y2cMap`.
- **SigmaE:**: An input `SigmaE`.
- **betastderrlist:**: An `fd` object estimating the standard errors of `betaestlist`.
- **bvar:**: A covariance matrix for regression coefficient estimates.
- **c2bMap:**: A mapping matrix that maps variation in Cmat to variation in regression coefficients.
The model specification object case:: The fRegress.formula and fRegress.character functions translate the formula into the argument list required by fRegress.fdPar
or fRegress.numeric. With the default value 'fRegress' for the argument method, this list is then used to call the appropriate other fRegress function. Alternatively, to see how the formula is translated, use the alternative 'model' value for the argument method. In that case, the function returns a list with the arguments otherwise passed to these other functions plus the following additional components:
xfdlist0:: A list of the objects named on the right hand side of formula. This will differ from xfdlist for any categorical or multivariate right hand side object.
type:: the type component of any fd object on the right hand side of formula.
nbasis:: A vector containing the nbasis components of variables named in formula having such components.
xVars:: An integer vector with all the variable names on the right hand side of formula containing the corresponding number of variables in xfdlist. This can exceed 1 for any multivariate object on the right hand side of class either numeric or fd as well as any categorical variable.
Author(s)
J. O. Ramsay, Giles Hooker, and Spencer Graves
References
Ramsay, James O., Hooker, Giles, and Graves, Spencer (2009), Functional data analysis with R and Matlab, Springer, New York.
Ramsay, James O., and Silverman, Bernard W. (2005), Functional Data Analysis, 2nd ed., Springer, New York.
Ramsay, James O., and Silverman, Bernard W. (2002), Applied Functional Data Analysis, Springer, New York.
oldpar <- par(no.readonly=TRUE)######### vector response with functional explanatory variable ####### data are in Canadian Weather object# print the names of the dataprint(names(CanadianWeather))# set up log10 of annual precipitation for 35 weather stationsannualprec <- log10(apply(CanadianWeather$dailyAv[,,"Precipitation.mm"],2,sum))# The simplest 'fRegress' call is singular with more bases# than observations, so we use only 25 basis functions, for this examplesmallbasis <- create.fourier.basis(c(0,365),25)# The covariate is the temperature curve for each station.tempfd <- smooth.basis(day.5, CanadianWeather$dailyAv[,,"Temperature.C"], smallbasis)$fd
#### formula interface: specify the model by a formula, the method## fRegress.formula automatically sets up the regression coefficient functions,## a constant function for the intercept, ## and a higher dimensional function## for the inner product with temperature##precip.Temp1 <- fRegress(annualprec ~ tempfd, method="fRegress")# the output is a list with class name fRegress, display namesnames(precip.Temp1)#[c1] "yvec" "xfdlist" "betalist" "betaestlist" "yhatfdobj" # [6] "Cmat" "Dmat" "Cmatinv" "wt" "df" #[11] "GCV" "OCV" "y2cMap" "SigmaE" "betastderrlist"#[16] "bvar" "c2bMap" # the vector of fits to the data is object precip.Temp1$yfdPar,# but since the dependent variable is a vector, so is the fitannualprec.fit1 <- precip.Temp1$yhatfdobj
# plot the data and the fitplot(annualprec.fit1, annualprec, type="p", pch="o")lines(annualprec.fit1, annualprec.fit1, lty=2)# print root mean squared errorRMSE <- round(sqrt(mean((annualprec-annualprec.fit1)^2)),3)print(paste("RMSE =",RMSE))# plot the estimated regression functionplot(precip.Temp1$betaestlist[[2]])# This isn't helpful either, the coefficient function is too# complicated to interpret.# display the number of basis functions used:print(precip.Temp1$betaestlist[[2]]$fd$basis$nbasis)# 25 basis functions to fit 35 values, no wonder we over-fit the data#### Get the default setup and modify it## the "model" value of the method argument causes the analysis## to produce a list vector of arguments for calling the## fRegress function##precip.Temp.mdl1 <- fRegress(annualprec ~ tempfd, method="model")# First confirm we get the same answer as above by calling# function fRegress() with these arguments:precip.Temp.m <- do.call('fRegress', precip.Temp.mdl1)all.equal(precip.Temp.m, precip.Temp1)# set up a smaller basis for beta2 than for temperature so that we# get a more parsimonious fit to the datanbetabasis2 <-21# not much less, but we add some roughness penalizationbetabasis2 <- create.fourier.basis(c(0,365), nbetabasis2)betafd2 <- fd(rep(0, nbetabasis2), betabasis2)# add smoothingbetafdPar2 <- fdPar(betafd2, lambda=10)# replace the regress coefficient function with this fdPar objectprecip.Temp.mdl2 <- precip.Temp.mdl1
precip.Temp.mdl2[['betalist']][['tempfd']]<- betafdPar2
# Now do re-fit the dataprecip.Temp2 <- do.call('fRegress', precip.Temp.mdl2)# Compare the two fits:# degrees of freedomprecip.Temp1[['df']]# 26precip.Temp2[['df']]# 22# root-mean-squared errors:RMSE1 <- round(sqrt(mean(with(precip.Temp1,(yhatfdobj-yvec)^2))),3)RMSE2 <- round(sqrt(mean(with(precip.Temp2,(yhatfdobj-yvec)^2))),3)print(c(RMSE1, RMSE2))# display further results for the more parsimonious modelannualprec.fit2 <- precip.Temp2$yhatfdobj
plot(annualprec.fit2, annualprec, type="p", pch="o")lines(annualprec.fit2, annualprec.fit2, lty=2)# plot the estimated regression functionplot(precip.Temp2$betaestlist[[2]])# now we see that it is primarily the temperatures in the# early winter that provide the fit to log precipitation by temperature#### Manual construction of xfdlist and betalist##xfdlist <- list(const=rep(1,35), tempfd=tempfd)# The intercept must be constant for a scalar responsebetabasis1 <- create.constant.basis(c(0,365))betafd1 <- fd(0, betabasis1)betafdPar1 <- fdPar(betafd1)betafd2 <- fd(matrix(0,7,1), create.bspline.basis(c(0,365),7))# convert to an fdPar objectbetafdPar2 <- fdPar(betafd2)betalist <- list(const=betafdPar1, tempfd=betafdPar2)precip.Temp3 <- fRegress(annualprec, xfdlist, betalist)annualprec.fit3 <- precip.Temp3$yhatfdobj
# plot the data and the fitplot(annualprec.fit3, annualprec, type="p", pch="o")lines(annualprec.fit3, annualprec.fit3)plot(precip.Temp3$betaestlist[[2]])######### functional response with vector explanatory variables ########## simplest: formula interface##daybasis65 <- create.fourier.basis(rangeval=c(0,365), nbasis=65, axes=list('axesIntervals'))Temp.fd <- with(CanadianWeather, smooth.basisPar(day.5, dailyAv[,,'Temperature.C'], daybasis65)$fd)TempRgn.f <- fRegress(Temp.fd ~ region, CanadianWeather)#### Get the default setup and possibly modify it##TempRgn.mdl <- fRegress(Temp.fd ~ region, CanadianWeather, method='model')# make desired modifications here# then runTempRgn.m <- do.call('fRegress', TempRgn.mdl)# no change, so match the first runall.equal(TempRgn.m, TempRgn.f)#### More detailed set up##region.contrasts <- model.matrix(~factor(CanadianWeather$region))rgnContr3 <- region.contrasts
dim(rgnContr3)<- c(1,35,4)dimnames(rgnContr3)<- list('', CanadianWeather$place, c('const', paste('region', c('Atlantic','Continental','Pacific'), sep='.')))const365 <- create.constant.basis(c(0,365))region.fd.Atlantic <- fd(matrix(rgnContr3[,,2],1), const365)# str(region.fd.Atlantic)region.fd.Continental <- fd(matrix(rgnContr3[,,3],1), const365)region.fd.Pacific <- fd(matrix(rgnContr3[,,4],1), const365)region.fdlist <- list(const=rep(1,35), region.Atlantic=region.fd.Atlantic, region.Continental=region.fd.Continental, region.Pacific=region.fd.Pacific)# str(TempRgn.mdl$betalist)######### functional response with functional explanatory variable ########## predict knee angle from hip angle; ## from demo('gait', package='fda')#### formula interface##gaittime <- as.matrix((1:20)/21)gaitrange <- c(0,20)gaitbasis <- create.fourier.basis(gaitrange, nbasis=21)gaitnbasis <- gaitbasis$nbasis
gaitcoef <- matrix(0,gaitnbasis,dim(gait)[2])harmaccelLfd <- vec2Lfd(c(0,(2*pi/20)^2,0), rangeval=gaitrange)gaitfd <- smooth.basisPar(gaittime, gait, gaitbasis, Lfdobj=harmaccelLfd, lambda=1e-2)$fd
hipfd <- gaitfd[,1]kneefd <- gaitfd[,2]knee.hip.f <- fRegress(kneefd ~ hipfd)#### manual set-up### set up the list of covariate objectsconst <- rep(1, dim(kneefd$coef)[2])xfdlist <- list(const=const, hipfd=hipfd)beta0 <- with(kneefd, fd(gaitcoef, gaitbasis, fdnames))beta1 <- with(hipfd, fd(gaitcoef, gaitbasis, fdnames))betalist <- list(const=fdPar(beta0), hipfd=fdPar(beta1))fRegressout <- fRegress(kneefd, xfdlist, betalist)par(oldpar)