Computes AIPW/doubly robust scores based on observed rewards, pulled arms, and inverse probability scores. If mu_hat is provided, compute AIPW scores, otherwise compute IPW scores.
aw_scores(yobs, ws, balwts, K, mu_hat =NULL)
Arguments
yobs: Numeric vector. Observed rewards. Must not contain NA values.
ws: Integer vector. Pulled arms. Must not contain NA values. Length must match yobs.
balwts: Numeric matrix. Inverse probability score 1[Wt=w]/et(w) of pulling arms, shape [A, K], where A is the number of observations and K is the number of arms. Must not contain NA values.
K: Integer. Number of arms. Must be a positive integer.
mu_hat: Optional numeric matrix. Plug-in estimator of arm outcomes, shape [A, K], or NULL. Must not contain NA values if provided.