Leave-future-out ridge-based estimates for arm expected rewards.
Computes leave-future-out ridge-basedn estimates of arm expected rewards based on provided data.
ridge_muhat_lfo_pai(xs, ws, yobs, K, batch_sizes, alpha = 1)
xs
: Matrix. Covariates of shape [A, p]
, where A
is the number of observations and p
is the number of features. Must not contain NA values.ws
: Integer vector. Indicates which arm was chosen for observations at each time t
. Length A
. Must not contain NA values.yobs
: Numeric vector. Observed outcomes, length A
. Must not contain NA values.K
: Integer. Number of arms. Must be a positive integer.batch_sizes
: Integer vector. Sizes of batches in which data is processed. Must be positive integers.alpha
: Numeric. Ridge regression regularization parameter. Default is 1.A 3D array containing the expected reward estimates for each arm and each time t
, of shape [A, A, K]
.
set.seed(123) p <- 3 K <- 5 A <- 100 xs <- matrix(runif(A * p), nrow = A, ncol = p) ws <- sample(1:K, A, replace = TRUE) yobs <- runif(A) batch_sizes <- c(25, 25, 25, 25) muhat <- ridge_muhat_lfo_pai(xs, ws, yobs, K, batch_sizes) print(muhat)
Useful links