Optimization problem using exact Hessian

1. Jul 19, 2011

brydustin

hi, I'm kind of new to optimization theory, and I have to maximize a multi-dimensional problem where I know the exact gradient and hessian. In other words, techniques such as BFGS are not sufficient because I don't want to approximate the Hessian (with an initial guess for example of H=I), I have the exact (analytical) Hessian and want to optimize my problem up to second order
i.e.
f(x + δ) ≈ f(x) + gT δ +.5*δTHδ (2nd order Taylor expansiona around δ)

2. Jul 19, 2011

pmsrw3

Newton's Method?

3. Jul 19, 2011

brydustin

for
x_(n+1) = x_n - γ[Hf(x_n)]^-1 Del(f(x_n)
such that gamma satisfies the Wolfe conditions and the optimization is the quickest possible.

Also I wasn't sure if Newton's method was the best analytical method or if its just the first one everyone learns (like bisection method in single variable numerical methods is first day technique).

4. Jul 19, 2011

Office_Shredder

Staff Emeritus
Optimizing a second order polynomial is called quadratic programming, and you can find a lot of literature on it by doing a google search. Are you solving this inside of matlab or something, or writing your own optimizer from scratch?

5. Jul 19, 2011

brydustin

I believe that I'm going to be writting this from scratch, ... , I may not though, if I am satisfied with what Mathematica or Fortran has to offer.

6. Jul 19, 2011

Office_Shredder

Staff Emeritus
Matlab's interior point algorithm, called by fmincon:
http://www.mathworks.com/help/toolbox/optim/ug/fmincon.html

allows you to supply the hessian of your objective function/constraints. Here's an example code:
http://www.mathworks.com/help/toolbox/optim/ug/brn4nh7.html#bri8026 [Broken]

If the dimension of your problem is small (less than 100 maybe, depending on the complexity of your function/constraints) you can also try the active set/sequential quadratic programming methods. You can't give it the hessian, but that doesn't change the fact that they can still be effective optimizers. I think active-set requires fewer function evaluations on average, so if your function takes a while to evaluate it might be worth trying

Last edited by a moderator: May 5, 2017
7. Jul 19, 2011

pmsrw3

So have you tried Mathematica's NMinimize function? Or NSolve to find the root of the gradient? (I would be inclined to try NSolve first with an analytical Hessian.)

On Newton's method, I wouldn't worry that much about picking an "optimal" gamma. Just pick something, halve it if the step doesn't take you to an improved place, double it if it works well.

On quadratic programming: Do you have inequality constraints that you haven't mentioned? If not, QP will not buy you anything. Without inequalities, QP is trivial, and will just reduce to Newton in your case.

8. Jul 19, 2011

brydustin

Thanks for the ideas. I do not have any inequality constraints.... so yeah, I guess I'll just give this Newton method a go. Thanks for the ideas, I'll try the NSolve/NMinimize functions as well.

9. Jul 27, 2011

brydustin

Hi again.....
I have another question related to this post, after applying the method discussed above.
I iteratively calculate a matrix C by some method such that an input vector V, is used to generate it, and it is iteratively done....... additionally, V is iteratively changed... let the final value be W.
then this means that (C_final)*....(C_2nd_iteration)*(C_first_iteration)*V = W.
However, I would like to calculate the product above, and I DONT want to add the additional obvious solution to my code that would be intensive (i.e. C_(k+1) = C_(k)*C(k-1) or some similar equivalent).

what I would like to do is something along the lines of
Solve for C in CV = W (where C is a matrix and V&W are column vectors).
via C*(v#q) = w#q = C
where q is a row vector and # is some outter product, such that v#q = I = Identity.
I know very little about the outter product (uniqueness, how to calculate, etc...)
So yeah.... how can I find q?
thanks a lot for the help, its very appreciated!

10. Jul 27, 2011

pmsrw3

In principle what you want to do is very simple. You do your iteration on one vector v1, then you do it on a second v2, ..., until you've done a full basis set. Then you can easily solve for C. In fact, if you choose the basis vectors $\delta_{ij}$, then you get C directly. But: this actually turns out to be exactly as much work as just multiplying the matrices, which you were trying to avoid.

I don't think there's any way to get that product matrix any more efficiently than by multiplying all the component matrices. You can't solve for C on the basis of a single column vector. You need, in fact, exactly as many independent vectors as the dimension of C.

11. Jul 27, 2011

brydustin

yeah, I was just thinking that the calculation would be impossible .... it would mean that
the "outter-product inverse" of V in the first slot would just be the scalar inverse of the first element of V (so {1,1} would equal one, when "solving" for I), but then the next element in the "inverse" would have to be zero, so that the the rest of the row in I is zero, and so on. But immediately, we see the problem..... thanks for the advice. Absurd ideas are the laundry work of scientists/mathematicians.