Fit a categorical reward estimator on features, treatments and outcomes and return predicted counterfactual rewards for each observation, under each treatment observed in the data, as well as the scores of the internal estimators.