Optimization problem using exact Hessian

brydustin · Jul 19, 2011

hi, I'm kind of new to optimization theory, and I have to maximize a multi-dimensional problem where I know the exact gradient and hessian. In other words, techniques such as BFGS are not sufficient because I don't want to approximate the Hessian (with an initial guess for example of H=I), I have the exact (analytical) Hessian and want to optimize my problem up to second order
i.e.
f(x + δ) ≈ f(x) + gT δ +.5*δTHδ (2nd order Taylor expansiona around δ)

Could someone please suggest a technique? Thanks in advance!

pmsrw3 · Jul 19, 2011

Newton's Method?

brydustin · Jul 19, 2011

Well yes, I was thinking about this but was unsure how to best pick gamma
for
x_(n+1) = x_n - γ[Hf(x_n)]^-1 Del(f(x_n)
such that gamma satisfies the Wolfe conditions and the optimization is the quickest possible.

Also I wasn't sure if Newton's method was the best analytical method or if its just the first one everyone learns (like bisection method in single variable numerical methods is first day technique).

Office_Shredder · Jul 19, 2011

Optimizing a second order polynomial is called quadratic programming, and you can find a lot of literature on it by doing a google search. Are you solving this inside of MATLAB or something, or writing your own optimizer from scratch?

brydustin · Jul 19, 2011

Office_Shredder said:

Optimizing a second order polynomial is called quadratic programming, and you can find a lot of literature on it by doing a google search. Are you solving this inside of MATLAB or something, or writing your own optimizer from scratch?

I believe that I'm going to be writting this from scratch, ... , I may not though, if I am satisfied with what Mathematica or Fortran has to offer.

Office_Shredder · Jul 19, 2011

Matlab's interior point algorithm, called by fmincon:
http://www.mathworks.com/help/toolbox/optim/ug/fmincon.html

allows you to supply the hessian of your objective function/constraints. Here's an example code:
http://www.mathworks.com/help/toolbox/optim/ug/brn4nh7.html#bri8026

If the dimension of your problem is small (less than 100 maybe, depending on the complexity of your function/constraints) you can also try the active set/sequential quadratic programming methods. You can't give it the hessian, but that doesn't change the fact that they can still be effective optimizers. I think active-set requires fewer function evaluations on average, so if your function takes a while to evaluate it might be worth trying

pmsrw3 · Jul 19, 2011

brydustin said:

I believe that I'm going to be writting this from scratch, ... , I may not though, if I am satisfied with what Mathematica or Fortran has to offer.

So have you tried Mathematica's NMinimize function? Or NSolve to find the root of the gradient? (I would be inclined to try NSolve first with an analytical Hessian.)

On Newton's method, I wouldn't worry that much about picking an "optimal" gamma. Just pick something, halve it if the step doesn't take you to an improved place, double it if it works well.

On quadratic programming: Do you have inequality constraints that you haven't mentioned? If not, QP will not buy you anything. Without inequalities, QP is trivial, and will just reduce to Newton in your case.

brydustin · Jul 19, 2011

pmsrw3 said:

So have you tried Mathematica's NMinimize function? Or NSolve to find the root of the gradient? (I would be inclined to try NSolve first with an analytical Hessian.)

On Newton's method, I wouldn't worry that much about picking an "optimal" gamma. Just pick something, halve it if the step doesn't take you to an improved place, double it if it works well.

On quadratic programming: Do you have inequality constraints that you haven't mentioned? If not, QP will not buy you anything. Without inequalities, QP is trivial, and will just reduce to Newton in your case.

Thanks for the ideas. I do not have any inequality constraints... so yeah, I guess I'll just give this Newton method a go. Thanks for the ideas, I'll try the NSolve/NMinimize functions as well.

brydustin · Jul 27, 2011

Hi again...
I have another question related to this post, after applying the method discussed above.
I iteratively calculate a matrix C by some method such that an input vector V, is used to generate it, and it is iteratively done... additionally, V is iteratively changed... let the final value be W.
then this means that (C_final)*...(C_2nd_iteration)*(C_first_iteration)*V = W.
However, I would like to calculate the product above, and I DONT want to add the additional obvious solution to my code that would be intensive (i.e. C_(k+1) = C_(k)*C(k-1) or some similar equivalent).

what I would like to do is something along the lines of
Solve for C in CV = W (where C is a matrix and V&W are column vectors).
via C*(v#q) = w#q = C
where q is a row vector and # is some outter product, such that v#q = I = Identity.
I know very little about the outter product (uniqueness, how to calculate, etc...)
So yeah... how can I find q?
thanks a lot for the help, its very appreciated!

pmsrw3 · Jul 27, 2011

In principle what you want to do is very simple. You do your iteration on one vector v1, then you do it on a second v2, ..., until you've done a full basis set. Then you can easily solve for C. In fact, if you choose the basis vectors [itex]\delta_{ij}[/itex], then you get C directly. But: this actually turns out to be exactly as much work as just multiplying the matrices, which you were trying to avoid.

I don't think there's any way to get that product matrix any more efficiently than by multiplying all the component matrices. You can't solve for C on the basis of a single column vector. You need, in fact, exactly as many independent vectors as the dimension of C.

brydustin · Jul 27, 2011

pmsrw3 said:

In principle what you want to do is very simple. You do your iteration on one vector v1, then you do it on a second v2, ..., until you've done a full basis set. Then you can easily solve for C. In fact, if you choose the basis vectors [itex]\delta_{ij}[/itex], then you get C directly. But: this actually turns out to be exactly as much work as just multiplying the matrices, which you were trying to avoid.

I don't think there's any way to get that product matrix any more efficiently than by multiplying all the component matrices. You can't solve for C on the basis of a single column vector. You need, in fact, exactly as many independent vectors as the dimension of C.

yeah, I was just thinking that the calculation would be impossible ... it would mean that
the "outter-product inverse" of V in the first slot would just be the scalar inverse of the first element of V (so {1,1} would equal one, when "solving" for I), but then the next element in the "inverse" would have to be zero, so that the the rest of the row in I is zero, and so on. But immediately, we see the problem... thanks for the advice. Absurd ideas are the laundry work of scientists/mathematicians.

Optimization problem using exact Hessian

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

Undergrad Mixed approximation vs. full approximation for a power series expansion

Undergrad Correct Upper/Lower limits for continuation of solutions for ODE

Undergrad 2nd order ODE's, variation of parameters and the notorious constraint

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect