- #1

- 52

- 0

Is this a known thing, or might I be doing something wrong? I can make sense of what's happening by looking at the slope of my error surface (the thing I'm trying to minimize). What happens is that at a certain distance from the minimum, the slope rate "goes negative", thus the inclusion of the second derivative drives me further from the correct answer.

As an example of what I mean, consider a 1D case where I'm trying to find the minimum of f=cos(x) between 0 and 2pi (looks like a bowl, kind of). If I just look at the slope I can see that a gradient based approach would converge anywhere within [0,2*pi] - the slope is negative below pi (the answer) and positive above.

If I look at f'' though, I see that it goes negative below pi/2 and above 1.5*pi. Thus if I start outside of that region, f'/f'' drives me the wrong way.

I guess I'm just disappointed since I thought Newton's method was hands down better than gradient. I want to make sure I'm not missing something. Maybe this is just the price you pay for the faster convergence?

Off the top of my head, I can't think of a case where it would be beneficial to have the 1/f'' reverse the direction of movement. Would it make sense to watch for this case and revert to gradient descent when it happens? Why would you want to go uphill?