Statistical hypothesis testing on the observed paired differences in estimated performance.
Useful links