Bandit-Based Experiments and Policy Evaluation
Estimate policy value via non-contextual adaptive weighting.
Compute AIPW/doubly robust scores.
Variance of policy value estimator via non-contextual adaptive weighti...
Calculate balancing weight scores.
Estimate/variance of policy evaluation via contextual weighting.
Check Number of Observations for Inference
Check First Batch Validity
Check Shape Compatibility of Probability Objects
Thompson Sampling draws.
Estimate/variance of policy evaluation via non-contextual weighting.
Generate classification data.
Clip lamb values between a minimum x and maximum y.
Impose probability floor.
Linear Thompson Sampling model.
Policy evaluation with adaptively generated data.
Plot cumulative assignment for bandit experiment.
Ridge Regression Initialization for Arm Expected Rewards
Leave-future-out ridge-based estimates for arm expected rewards.
Updates ridge regression matrices.
Run an experiment using Thompson Sampling.
Generate simple tree data.
Stick breaking function.
Calculate allocation ratio for a two-point stable-variance bandit.
Update linear Thompson Sampling model.
Frequentist inference on adaptively generated data. The methods implemented are based on Zhan et al. (2021) <doi:10.48550/arXiv.2106.02029> and Hadad et al. (2021) <doi:10.48550/arXiv.1911.02768>. For illustration, several functions for simulating non-contextual and contextual adaptive experiments using Thompson sampling are also supplied.
Useful links