Fit a numeric reward estimator on features, treatments and outcomes and return predicted counterfactual rewards for each observation, under each treatment candidate, as well as the scores of the internal estimators.