So I am still confused about how to applying scaling of parameters to a general optimization problem. Let's say I am trying to do maximum likelihood estimation. I understand how to find the scaling matrix (assuming we restrict it to diagonal form) and that the Hessian should be close to the identity matrix near the optimal parameter values. What I don't understand is that once you have your scaling matrix how do you directly use it in the optimization. In the case of MLE I feel that just scaling your parameters wouldn't yield the right results because your data that you are trying to fit would want to fit the actual parameters and not the scaled ones, unless the data itself was scaled, which I am not sure how to do in this formulation and such. Any help would be much appreciated. Thanks in advance.