Optimization problem using exact Hessian

  • Context: Graduate 
  • Thread starter Thread starter brydustin
  • Start date Start date
  • Tags Tags
    Hessian Optimization
Click For Summary

Discussion Overview

The discussion revolves around optimization techniques for maximizing a multi-dimensional problem using an exact Hessian and gradient. Participants explore various methods, including Newton's method and quadratic programming, while addressing practical implementation concerns and algorithmic choices.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant seeks techniques for optimization using an exact Hessian, expressing a preference against approximation methods like BFGS.
  • Another participant suggests Newton's method but raises concerns about selecting an appropriate step size (gamma) that satisfies Wolfe conditions.
  • There is a discussion about the relevance of quadratic programming, with one participant noting that it may not be beneficial without inequality constraints.
  • Several participants mention the use of software tools like MATLAB and Mathematica for optimization, discussing specific functions like fmincon and NMinimize.
  • A later post introduces a new question regarding the iterative calculation of a matrix product and the challenges of solving for a matrix C without performing extensive computations.
  • One participant suggests that solving for C based on a single vector is insufficient and that a full basis set is necessary for the calculation.

Areas of Agreement / Disagreement

Participants express various views on optimization methods, with no consensus on the best approach. Some support Newton's method while others question its effectiveness compared to other techniques. The discussion on quadratic programming highlights differing opinions on its applicability based on constraints.

Contextual Notes

Participants mention limitations regarding the choice of algorithms and the computational complexity involved in matrix operations. There is also uncertainty about the optimal selection of parameters in the discussed methods.

Who May Find This Useful

This discussion may be useful for individuals interested in optimization theory, particularly those dealing with multi-dimensional problems and seeking to understand the implications of using exact Hessians in their calculations.

brydustin
Messages
201
Reaction score
0
hi, I'm kind of new to optimization theory, and I have to maximize a multi-dimensional problem where I know the exact gradient and hessian. In other words, techniques such as BFGS are not sufficient because I don't want to approximate the Hessian (with an initial guess for example of H=I), I have the exact (analytical) Hessian and want to optimize my problem up to second order
i.e.
f(x + δ) ≈ f(x) + gT δ +.5*δTHδ (2nd order Taylor expansiona around δ)

Could someone please suggest a technique? Thanks in advance!
 
Physics news on Phys.org
Newton's Method?
 
Well yes, I was thinking about this but was unsure how to best pick gamma
for
x_(n+1) = x_n - γ[Hf(x_n)]^-1 Del(f(x_n)
such that gamma satisfies the Wolfe conditions and the optimization is the quickest possible.

Also I wasn't sure if Newton's method was the best analytical method or if its just the first one everyone learns (like bisection method in single variable numerical methods is first day technique).
 
Optimizing a second order polynomial is called quadratic programming, and you can find a lot of literature on it by doing a google search. Are you solving this inside of MATLAB or something, or writing your own optimizer from scratch?
 
Office_Shredder said:
Optimizing a second order polynomial is called quadratic programming, and you can find a lot of literature on it by doing a google search. Are you solving this inside of MATLAB or something, or writing your own optimizer from scratch?


I believe that I'm going to be writting this from scratch, ... , I may not though, if I am satisfied with what Mathematica or Fortran has to offer.
 
Matlab's interior point algorithm, called by fmincon:
http://www.mathworks.com/help/toolbox/optim/ug/fmincon.html

allows you to supply the hessian of your objective function/constraints. Here's an example code:
http://www.mathworks.com/help/toolbox/optim/ug/brn4nh7.html#bri8026

If the dimension of your problem is small (less than 100 maybe, depending on the complexity of your function/constraints) you can also try the active set/sequential quadratic programming methods. You can't give it the hessian, but that doesn't change the fact that they can still be effective optimizers. I think active-set requires fewer function evaluations on average, so if your function takes a while to evaluate it might be worth trying
 
Last edited by a moderator:
brydustin said:
I believe that I'm going to be writting this from scratch, ... , I may not though, if I am satisfied with what Mathematica or Fortran has to offer.
So have you tried Mathematica's NMinimize function? Or NSolve to find the root of the gradient? (I would be inclined to try NSolve first with an analytical Hessian.)

On Newton's method, I wouldn't worry that much about picking an "optimal" gamma. Just pick something, halve it if the step doesn't take you to an improved place, double it if it works well.

On quadratic programming: Do you have inequality constraints that you haven't mentioned? If not, QP will not buy you anything. Without inequalities, QP is trivial, and will just reduce to Newton in your case.
 
pmsrw3 said:
So have you tried Mathematica's NMinimize function? Or NSolve to find the root of the gradient? (I would be inclined to try NSolve first with an analytical Hessian.)

On Newton's method, I wouldn't worry that much about picking an "optimal" gamma. Just pick something, halve it if the step doesn't take you to an improved place, double it if it works well.

On quadratic programming: Do you have inequality constraints that you haven't mentioned? If not, QP will not buy you anything. Without inequalities, QP is trivial, and will just reduce to Newton in your case.

Thanks for the ideas. I do not have any inequality constraints... so yeah, I guess I'll just give this Newton method a go. Thanks for the ideas, I'll try the NSolve/NMinimize functions as well.
 
Hi again...
I have another question related to this post, after applying the method discussed above.
I iteratively calculate a matrix C by some method such that an input vector V, is used to generate it, and it is iteratively done... additionally, V is iteratively changed... let the final value be W.
then this means that (C_final)*...(C_2nd_iteration)*(C_first_iteration)*V = W.
However, I would like to calculate the product above, and I DONT want to add the additional obvious solution to my code that would be intensive (i.e. C_(k+1) = C_(k)*C(k-1) or some similar equivalent).

what I would like to do is something along the lines of
Solve for C in CV = W (where C is a matrix and V&W are column vectors).
via C*(v#q) = w#q = C
where q is a row vector and # is some outter product, such that v#q = I = Identity.
I know very little about the outter product (uniqueness, how to calculate, etc...)
So yeah... how can I find q?
thanks a lot for the help, its very appreciated!
 
  • #10
In principle what you want to do is very simple. You do your iteration on one vector v1, then you do it on a second v2, ..., until you've done a full basis set. Then you can easily solve for C. In fact, if you choose the basis vectors [itex]\delta_{ij}[/itex], then you get C directly. But: this actually turns out to be exactly as much work as just multiplying the matrices, which you were trying to avoid.

I don't think there's any way to get that product matrix any more efficiently than by multiplying all the component matrices. You can't solve for C on the basis of a single column vector. You need, in fact, exactly as many independent vectors as the dimension of C.
 
  • #11
pmsrw3 said:
In principle what you want to do is very simple. You do your iteration on one vector v1, then you do it on a second v2, ..., until you've done a full basis set. Then you can easily solve for C. In fact, if you choose the basis vectors [itex]\delta_{ij}[/itex], then you get C directly. But: this actually turns out to be exactly as much work as just multiplying the matrices, which you were trying to avoid.

I don't think there's any way to get that product matrix any more efficiently than by multiplying all the component matrices. You can't solve for C on the basis of a single column vector. You need, in fact, exactly as many independent vectors as the dimension of C.

yeah, I was just thinking that the calculation would be impossible ... it would mean that
the "outter-product inverse" of V in the first slot would just be the scalar inverse of the first element of V (so {1,1} would equal one, when "solving" for I), but then the next element in the "inverse" would have to be zero, so that the the rest of the row in I is zero, and so on. But immediately, we see the problem... thanks for the advice. Absurd ideas are the laundry work of scientists/mathematicians.
 

Similar threads

Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 0 ·
Replies
0
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 1 ·
Replies
1
Views
6K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
8K
Replies
17
Views
2K
  • · Replies 18 ·
Replies
18
Views
4K