oLBFGS() R function from [stochQN]

oLBFGS guided optimizer

Optimizes an empirical (convex) loss function over batches of sample data.


oLBFGS(x0, grad_fun, pred_fun = NULL, initial_step = 0.01,
  step_fun = function(iter) 1/sqrt((iter/10) + 1),
  callback_iter = NULL, args_cb = NULL, verbose = TRUE,
  mem_size = 10, hess_init = NULL, min_curvature = 1e-04,
  y_reg = NULL, check_nan = TRUE, nthreads = -1)

Arguments

x0: Initial values for the variables to optimize.
grad_fun: Function taking as unnamed arguments x_curr (variable values), X (covariates), y (target variable), and w (weights), plus additional arguments (...), and producing the expected value of the gradient when evalauted on that data.
pred_fun: Function taking an unnamed argument as data, another unnamed argument as the variable values, and optional extra arguments (...). Will be called when using predict on the object returned by this function.
initial_step: Initial step size.
step_fun: Function accepting the iteration number as an unnamed parameter, which will output the number by which initial_step will be multiplied at each iteration to get the step size for that iteration.
callback_iter: Callback function which will be called at the end of each iteration. Will pass three unnamed arguments: the current variable values, the current iteration number, and args_cb. Pass NULL if there is no need to call a callback function.
args_cb: Extra argument to pass to the callback function.
verbose: Whether to print information about iteration statuses when something goes wrong.
mem_size: Number of correction pairs to store for approximation of Hessian-vector products.
hess_init: Value to which to initialize the diagonal of H0. If passing NULL, will use the same initializion as for SQN ((s_last * y_last) / (y_last * y_last)).
min_curvature: Minimum value of (s * y) / (s * s) in order to accept a correction pair. Pass NULL for no minimum.
y_reg: Regularizer for 'y' vector (gets added y_reg * s). Pass NULL for no regularization.
check_nan: Whether to check for variables becoming NA after each iteration, and reverting the step if they do (will also reset BFGS memory).
nthreads: Number of parallel threads to use. If set to -1, will determine the number of available threads and use all of them. Note however that not all the computations can be parallelized, and the BLAS backend might use a different number of threads.

Returns

an oLBFGS object with the user-supplied functions, which can be fit to batches of data through function partial_fit, and can produce predictions on new data through function predict.

Examples


### Example regression with randomly-generated data
library(stochQN)

### Will sample data y ~ Ax + epsilon
true_coefs <- c(1.12, 5.34, -6.123)

generate_data_batch <- function(true_coefs, n = 100) {
  X <- matrix(
    rnorm(length(true_coefs) * n),
    nrow=n, ncol=length(true_coefs))
  y <- X %*% true_coefs + rnorm(n)
  return(list(X = X, y = y))
}

### Regular regression function that minimizes RMSE
eval_fun <- function(coefs, X, y, weights=NULL, lambda=1e-5) {
  pred <- as.numeric(X %*% coefs)
  RMSE <- sqrt(mean((pred - y)^2))
  reg  <- lambda * as.numeric(coefs %*% coefs)
  return(RMSE + reg)
}

eval_grad <- function(coefs, X, y, weights=NULL, lambda=1e-5) {
  pred <- X %*% coefs
  grad <- colMeans(X * as.numeric(pred - y))
  grad <- grad + 2 * lambda * as.numeric(coefs^2)
  return(grad)
}

pred_fun <- function(X, coefs, ...) {
  return(as.numeric(X %*% coefs))
}

### Initialize optimizer form arbitrary values
x0 <- c(1, 1, 1)
optimizer <- oLBFGS(x0, grad_fun=eval_grad,
  pred_fun=pred_fun, initial_step=1e-1)
val_data <- generate_data_batch(true_coefs, n=100)

### Fit to 50 batches of data, 100 observations each
set.seed(1)
for (i in 1:50) {
  new_batch <- generate_data_batch(true_coefs, n=100)
  partial_fit(
    optimizer,
    new_batch$X, new_batch$y,
    lambda=1e-5)
  x_curr <- get_curr_x(optimizer)
  i_curr <- get_iteration_number(optimizer)
  if ((i_curr %% 10)  == 0) {
    cat(sprintf(
      "Iteration %d - E[f(x)]: %f - values of x: [%f, %f, %f]\n",
      i_curr,
      eval_fun(x_curr, val_data$X, val_data$y, lambda=1e-5),
      x_curr[1], x_curr[2], x_curr[3]))
  }
}

### Predict for new data
new_batch <- generate_data_batch(true_coefs, n=10)
yhat <- predict(optimizer, new_batch$X)

References

Schraudolph, N.N., Yu, J. and Guenter, S., 2007, March. "A stochastic quasi-Newton method for online convex optimization." In Artificial Intelligence and Statistics (pp. 436-443).
Wright, S. and Nocedal, J., 1999. "Numerical optimization." (ch 7) Springer Science, 35(67-68), p.7.

oLBFGS function

oLBFGS guided optimizer

Arguments

Returns

Examples

References

See Also