- #1

- 52

- 0

## Main Question or Discussion Point

I'm working on a problem where I need to find minimum of a 2D surface. I initially coded up a gradient descent algorithm, and though it works, I had to carefully select a step size (which could be problematic), plus I want it to converge quickly. So, I went through immense pain to derive the Hessian and implemented Newton's method. What I'm finding is that while Newton's method can converge impressively fast (a handful of iterations), it has a much smaller range of convergence. The initial guess must be closer to the correct answer.

Is this a known thing, or might I be doing something wrong? I can make sense of what's happening by looking at the slope of my error surface (the thing I'm trying to minimize). What happens is that at a certain distance from the minimum, the slope rate "goes negative", thus the inclusion of the second derivative drives me further from the correct answer.

As an example of what I mean, consider a 1D case where I'm trying to find the minimum of f=cos(x) between 0 and 2pi (looks like a bowl, kind of). If I just look at the slope I can see that a gradient based approach would converge anywhere within [0,2*pi] - the slope is negative below pi (the answer) and positive above.

If I look at f'' though, I see that it goes negative below pi/2 and above 1.5*pi. Thus if I start outside of that region, f'/f'' drives me the wrong way.

I guess I'm just disappointed since I thought Newton's method was hands down better than gradient. I want to make sure I'm not missing something. Maybe this is just the price you pay for the faster convergence?

Off the top of my head, I can't think of a case where it would be beneficial to have the 1/f'' reverse the direction of movement. Would it make sense to watch for this case and revert to gradient descent when it happens? Why would you want to go uphill?

Is this a known thing, or might I be doing something wrong? I can make sense of what's happening by looking at the slope of my error surface (the thing I'm trying to minimize). What happens is that at a certain distance from the minimum, the slope rate "goes negative", thus the inclusion of the second derivative drives me further from the correct answer.

As an example of what I mean, consider a 1D case where I'm trying to find the minimum of f=cos(x) between 0 and 2pi (looks like a bowl, kind of). If I just look at the slope I can see that a gradient based approach would converge anywhere within [0,2*pi] - the slope is negative below pi (the answer) and positive above.

If I look at f'' though, I see that it goes negative below pi/2 and above 1.5*pi. Thus if I start outside of that region, f'/f'' drives me the wrong way.

I guess I'm just disappointed since I thought Newton's method was hands down better than gradient. I want to make sure I'm not missing something. Maybe this is just the price you pay for the faster convergence?

Off the top of my head, I can't think of a case where it would be beneficial to have the 1/f'' reverse the direction of movement. Would it make sense to watch for this case and revert to gradient descent when it happens? Why would you want to go uphill?