Performs a batch matrix-matrix product of matrices stored in batch1 and batch2, with a reduced add step (all matrix multiplications get accumulated along the first dimension). input is added to the final result.
batch1 and batch2 must be 3-D tensors each containing the same number of matrices.
If batch1 is a (b×n×m) tensor, batch2 is a (b×m×p) tensor, input must be broadcastable with a (n×p) tensor and out will be a (n×p) tensor.