Return counterfactual rewards estimated by a numeric reward estimator for each observation in the supplied data