Cram Method for Efficient Simultaneous Learning and Evaluation
Batch Contextual Epsilon-Greedy Policy
Batch Contextual Thompson Sampling Policy
Batch Disjoint LinUCB Policy with Epsilon-Greedy
Contextual Linear Bandit Environment
Cram Bandit Policy Value Estimate
Cram Bandit Simulation
Cram Bandit Variance of the Policy Value Estimate
Cram Bandit: On-policy Statistical Evaluation in Contextual Bandits
Cram Policy Estimator for Policy Value Difference (Delta)
Cram ML Expected Loss Estimate
Cram Policy Learning
Cram ML: Simultaneous Machine Learning and Evaluation
Cram Policy: Estimator for Policy Value (Psi)
Cram Policy: Efficient Simultaneous Policy Learning and Evaluation
Cram Policy Simulation
Cram ML: Variance Estimate of the crammed expected loss estimate
Cram Policy: Variance Estimate of the crammed Policy Value estimate (P...
Cram Policy: Variance Estimate of the crammed Policy Value Difference ...
Cram ML: Fit Model ML
Cram Policy: Fit Model
Generate Reward Parameters for Simulated Linear Bandits
LinUCB Disjoint Policy with Epsilon-Greedy Exploration
Cram ML: Generalized ML Learning
Cram ML: Predict with the Specified Model
Cram Policy: Predict with the Specified Model
Cram Policy: Set Model
Validate or Set the Baseline Policy
Validate or Generate Batch Assignments
Cram Policy: Validate Parameters for Feedforward Neural Networks (FNNs...
Cram Policy: Validate User-Provided Parameters for a Model
Performs the Cram method, a general and efficient approach to simultaneous learning and evaluation using a generic machine learning algorithm. In a single pass of batched data, the proposed method repeatedly trains a machine learning algorithm and tests its empirical performance. Because it utilizes the entire sample for both learning and evaluation, cramming is significantly more data-efficient than sample-splitting. Unlike cross-validation, Cram evaluates the final learned model directly, providing sharper inference aligned with real-world deployment. The method naturally applies to both policy learning and contextual bandits, where decisions are based on individual features to maximize outcomes. The package includes cram_policy() for learning and evaluating individualized binary treatment rules, cram_ml() to train and assess the population-level performance of machine learning models, and cram_bandit() for on-policy evaluation of contextual bandit algorithms. For all three functions, the package provides estimates of the average outcome that would result if the model were deployed, along with standard errors and confidence intervals for these estimates. Details of the method are described in Jia, Imai, and Li (2024) <https://www.hbs.edu/ris/Publication%20Files/2403.07031v1_a83462e0-145b-4675-99d5-9754aa65d786.pdf> and Jia et al. (2025) <doi:10.48550/arXiv.2403.07031>.
Useful links