ridge_muhat_lfo_pai function

Leave-future-out ridge-based estimates for arm expected rewards.

Leave-future-out ridge-based estimates for arm expected rewards.

Computes leave-future-out ridge-basedn estimates of arm expected rewards based on provided data.

ridge_muhat_lfo_pai(xs, ws, yobs, K, batch_sizes, alpha = 1)

Arguments

  • xs: Matrix. Covariates of shape [A, p], where A is the number of observations and p is the number of features. Must not contain NA values.
  • ws: Integer vector. Indicates which arm was chosen for observations at each time t. Length A. Must not contain NA values.
  • yobs: Numeric vector. Observed outcomes, length A. Must not contain NA values.
  • K: Integer. Number of arms. Must be a positive integer.
  • batch_sizes: Integer vector. Sizes of batches in which data is processed. Must be positive integers.
  • alpha: Numeric. Ridge regression regularization parameter. Default is 1.

Returns

A 3D array containing the expected reward estimates for each arm and each time t, of shape [A, A, K].

Examples

set.seed(123) p <- 3 K <- 5 A <- 100 xs <- matrix(runif(A * p), nrow = A, ncol = p) ws <- sample(1:K, A, replace = TRUE) yobs <- runif(A) batch_sizes <- c(25, 25, 25, 25) muhat <- ridge_muhat_lfo_pai(xs, ws, yobs, K, batch_sizes) print(muhat)