res_cal calculates linear regression residuals in an efficient way : handling several dependent variables at a time, using Matrix::TsparseMatrix capabilities and allowing for pre-calculation of the matrix inverse.
res_cal(y =NULL, x, w =NULL, by =NULL, precalc =NULL, id =NULL)
Arguments
y: A (sparse) numerical matrix of dependent variable(s).
x: A (sparse) numerical matrix of independent variable(s).
w: An optional numerical vector of row weights.
by: An optional categorical vector (factor or character) when residuals calculation is to be conducted within by-groups (see Details).
precalc: A list of pre-calculated results (see Details).
id: A vector of identifiers of the units used in the calculation. Useful when precalc = TRUE in order to assess whether the ordering of the y data matrix matches the one used at the precalculation step.
Returns
if y is not NULL (calculation step) : a numerical matrix with same structure (regular base::matrix or Matrix::TsparseMatrix) and dimensions as y.
if y is NULL (pre-calculation step) : a list containing pre-calculated data.
Details
In the context of the gustave package, linear regression residual calculation is solely used to take into account the effect of calibration on variance estimation. Independent variables are therefore most likely to be the same from one variance estimation to another, hence the inversion of the matrix t(x) %*% Diagonal(x = w) %*% x can be done once and for all at a pre-calculation step.
The parameters y and precalc determine whether a list of pre-calculated data should be used in order to speed up the regression residuals computation at execution time:
if y not NULL and precalcNULL : on-the-fly calculation of the matrix inverse and the regression residuals (no pre-calculation).
if yNULL and precalcNULL : pre-calculation of the matrix inverse which is stored in a list of pre-calculated data.
if y not NULL and precalc not NULL : calculation of the regression residuals using the list of pre-calculated data.
The by parameter allows for calculation within by-groups : all calculation are made separately for each by-group (when calibration was conducted separately on several subsamples), but in an efficient way using Matrix::TsparseMatrix capabilities (especially when the matrix inverse is pre-calculated).
Examples
# Generating random dataset.seed(1)n <-100H <-5y <- matrix(rnorm(2*n), nrow = n)x <- matrix(rnorm(10*n), nrow = n)by <- letters[sample(1:H, n, replace =TRUE)]# Direct calculationres_cal(y, x)# Calculation with pre-calculated dataprecalc <- res_cal(y =NULL, x)res_cal(y, precalc = precalc)identical(res_cal(y, x), res_cal(y, precalc = precalc))# Matrix::TsparseMatrix capabilityrequire(Matrix)X <- as(x,"TsparseMatrix")Y <- as(y,"TsparseMatrix")identical(res_cal(y, x), as.matrix(res_cal(Y, X)))# by parameter for within by-groups calculationres_cal(Y, X, by = by)all.equal( res_cal(Y, X, by = by)[by =="a",], res_cal(Y[by =="a",], X[by =="a",]))