Parameter Scaling for Optimization

Click For Summary
SUMMARY

This discussion focuses on the application of parameter scaling in maximum likelihood estimation (MLE) for optimization problems. The key takeaway is the importance of using a scaling matrix, particularly in diagonal form, to ensure that no single range of data dominates the calculations. The Hessian matrix should approximate the identity matrix near optimal parameter values. Additionally, the discussion emphasizes the necessity of scaling both parameters and data to achieve accurate results, followed by un-scaling to revert to original data ranges.

PREREQUISITES
  • Understanding of maximum likelihood estimation (MLE)
  • Familiarity with Hessian matrices in optimization
  • Knowledge of scaling matrices and their applications
  • Concept of data normalization techniques
NEXT STEPS
  • Research the construction and application of diagonal scaling matrices in optimization
  • Learn about data normalization techniques for MLE
  • Explore the implications of Hessian matrix properties in optimization algorithms
  • Study the process of un-scaling results after parameter optimization
USEFUL FOR

Data scientists, statisticians, and machine learning practitioners involved in optimization tasks, particularly those working with maximum likelihood estimation and parameter scaling techniques.

captain
Messages
163
Reaction score
0
So I am still confused about how to applying scaling of parameters to a general optimization problem. Let's say I am trying to do maximum likelihood estimation. I understand how to find the scaling matrix (assuming we restrict it to diagonal form) and that the Hessian should be close to the identity matrix near the optimal parameter values. What I don't understand is that once you have your scaling matrix how do you directly use it in the optimization. In the case of MLE I feel that just scaling your parameters wouldn't yield the right results because your data that you are trying to fit would want to fit the actual parameters and not the scaled ones, unless the data itself was scaled, which I am not sure how to do in this formulation and such. Any help would be much appreciated. Thanks in advance.
 
Physics news on Phys.org
I think the scaling process serves the purpose, that no single range of data is preferred. E.g. if we have data around zero and others around a million, then the zeros will be lost in any calculation. So we scale the data, such that no range will be preferred. At the end of the process, we un-scale the result again, i.e. we multiply with the inverse of the scaling matrix in order to get the data back into the ranges they belong to.
 

Similar threads

  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 0 ·
Replies
0
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 5 ·
Replies
5
Views
489
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K