oLBFGS_free function

oLBFGS Free-Mode Optimizer

oLBFGS Free-Mode Optimizer

Optimizes an empirical (convex) loss function over batches of sample data. Compared to function/class 'oLBFGS', this version lets the user do all the calculations from the outside, only interacting with the object by means of a function that returns a request type and is fed the required calculation through a method 'update_gradient'.

Order in which requests are made:

========== loop ===========

  • calc_grad

  • calc_grad_same_batch (might skip if using check_nan)

===========================

After running this function, apply run_oLBFGS_free to it to get the first requested piece of information.

oLBFGS_free(mem_size = 10, hess_init = NULL, min_curvature = 1e-04, y_reg = NULL, check_nan = TRUE, nthreads = -1)

Arguments

  • mem_size: Number of correction pairs to store for approximation of Hessian-vector products.
  • hess_init: Value to which to initialize the diagonal of H0. If passing NULL, will use the same initializion as for SQN ((s_last * y_last) / (y_last * y_last)).
  • min_curvature: Minimum value of (s * y) / (s * s) in order to accept a correction pair. Pass NULL for no minimum.
  • y_reg: Regularizer for 'y' vector (gets added y_reg * s). Pass NULL for no regularization.
  • check_nan: Whether to check for variables becoming NA after each iteration, and reverting the step if they do (will also reset BFGS memory).
  • nthreads: Number of parallel threads to use. If set to -1, will determine the number of available threads and use all of them. Note however that not all the computations can be parallelized, and the BLAS backend might use a different number of threads.

Returns

An oLBFGS_free object, which can be used through functions update_gradient and run_oLBFGS_free

Examples

### Example optimizing Rosenbrock 2D function ### Note that this example is not stochastic, as the ### function is not evaluated in expectation based on ### batches of data, but rather it has a given absolute ### form that never varies. ### Warning: this optimizer is meant for convex functions ### (Rosenbrock's is not convex) library(stochQN) fr <- function(x) { ## Rosenbrock Banana function x1 <- x[1] x2 <- x[2] 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 } grr <- function(x) { ## Gradient of 'fr' x1 <- x[1] x2 <- x[2] c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1), 200 * (x2 - x1 * x1)) } ### Initial values of x x_opt = as.numeric(c(0, 2)) cat(sprintf("Initial values of x: [%.3f, %.3f]\n", x_opt[1], x_opt[2])) ### Will use a constant step size throughout ### (not recommended) step_size <- 1e-1 ### Initialize the optimizer optimizer <- oLBFGS_free() ### Keep track of the iteration number curr_iter <- 0 ### Run a loop for 100 iterations ### (Note that each iteration requires 2 calculations, ### hence the 200) for (i in 1:200) { req <- run_oLBFGS_free(optimizer, x_opt, step_size) if (req$task == "calc_grad") { update_gradient(optimizer, grr(req$requested_on)) } else if (req$task == "calc_grad_same_batch") { update_gradient(optimizer, grr(req$requested_on)) } ### Track progress every 10 iterations if (req$info$iteration_number > curr_iter) { curr_iter <- req$info$iteration_number if ((curr_iter %% 10) == 0) { cat(sprintf( "Iteration %3d - Current function value: %.3f\n", req$info$iteration_number, fr(x_opt) )) } } } cat(sprintf("Current values of x: [%.3f, %.3f]\n", x_opt[1], x_opt[2]))

References

  • Schraudolph, N.N., Yu, J. and Guenter, S., 2007, March. "A stochastic quasi-Newton method for online convex optimization." In Artificial Intelligence and Statistics (pp. 436-443).
  • Wright, S. and Nocedal, J., 1999. "Numerical optimization." (ch 7) Springer Science, 35(67-68), p.7.

See Also

update_gradient , run_oLBFGS_free

  • Maintainer: David Cortes
  • License: BSD_2_clause + file LICENSE
  • Last published: 2021-09-26