Return counterfactual rewards estimated by a categorical reward estimator for each observation in the supplied data