This function optimizes theta, in fact theta_weights. Since thetas are constrained (must be parameters of multinomial/discrete distribution), we don't directly optimize the likelihood function w.r.t. theta, but we perform change of variables to do unconstrained optimization. We therefore store these unconstrained variables in the field "theta_weights", and update these variables.