Parameter Scaling for Optimization

Click For Summary
Parameter scaling in optimization, particularly for maximum likelihood estimation (MLE), involves creating a scaling matrix to ensure that no single parameter range dominates the calculations. The Hessian matrix should approximate the identity matrix near optimal values, indicating proper scaling. However, simply scaling parameters may not yield accurate results unless the data is also appropriately scaled to match these parameters. The scaling process is crucial to prevent loss of information from parameters with smaller ranges, ensuring balanced contributions from all data points. Ultimately, results should be un-scaled by applying the inverse of the scaling matrix to return to the original data ranges.
captain
Messages
163
Reaction score
0
So I am still confused about how to applying scaling of parameters to a general optimization problem. Let's say I am trying to do maximum likelihood estimation. I understand how to find the scaling matrix (assuming we restrict it to diagonal form) and that the Hessian should be close to the identity matrix near the optimal parameter values. What I don't understand is that once you have your scaling matrix how do you directly use it in the optimization. In the case of MLE I feel that just scaling your parameters wouldn't yield the right results because your data that you are trying to fit would want to fit the actual parameters and not the scaled ones, unless the data itself was scaled, which I am not sure how to do in this formulation and such. Any help would be much appreciated. Thanks in advance.
 
Physics news on Phys.org
I think the scaling process serves the purpose, that no single range of data is preferred. E.g. if we have data around zero and others around a million, then the zeros will be lost in any calculation. So we scale the data, such that no range will be preferred. At the end of the process, we un-scale the result again, i.e. we multiply with the inverse of the scaling matrix in order to get the data back into the ranges they belong to.
 
There are probably loads of proofs of this online, but I do not want to cheat. Here is my attempt: Convexity says that $$f(\lambda a + (1-\lambda)b) \leq \lambda f(a) + (1-\lambda) f(b)$$ $$f(b + \lambda(a-b)) \leq f(b) + \lambda (f(a) - f(b))$$ We know from the intermediate value theorem that there exists a ##c \in (b,a)## such that $$\frac{f(a) - f(b)}{a-b} = f'(c).$$ Hence $$f(b + \lambda(a-b)) \leq f(b) + \lambda (a - b) f'(c))$$ $$\frac{f(b + \lambda(a-b)) - f(b)}{\lambda(a-b)}...

Similar threads

  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 14 ·
Replies
14
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 0 ·
Replies
0
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K