Parameter Scaling for Optimization

captain · Oct 21, 2013

So I am still confused about how to applying scaling of parameters to a general optimization problem. Let's say I am trying to do maximum likelihood estimation. I understand how to find the scaling matrix (assuming we restrict it to diagonal form) and that the Hessian should be close to the identity matrix near the optimal parameter values. What I don't understand is that once you have your scaling matrix how do you directly use it in the optimization. In the case of MLE I feel that just scaling your parameters wouldn't yield the right results because your data that you are trying to fit would want to fit the actual parameters and not the scaled ones, unless the data itself was scaled, which I am not sure how to do in this formulation and such. Any help would be much appreciated. Thanks in advance.

fresh_42 · Dec 4, 2019

I think the scaling process serves the purpose, that no single range of data is preferred. E.g. if we have data around zero and others around a million, then the zeros will be lost in any calculation. So we scale the data, such that no range will be preferred. At the end of the process, we un-scale the result again, i.e. we multiply with the inverse of the scaling matrix in order to get the data back into the ranges they belong to.

Parameter Scaling for Optimization

Thread 'Proving that convexity implies second order derivative being positive'

Similar threads

Undergrad Problem in understanding instantaneous velocity

Undergrad How to find the path if we only know the velocity (without common formulas)?

Undergrad Unit Circle Confusion: A Self-Study Challenge?

Undergrad Proving that convexity implies second order derivative being positive

Undergrad Derive the orthonormality condition for Legendre polynomials

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers