Convergence criterion for Newton-Raphson

Kyouran · Feb 6, 2021

The Newton-Raphson algorithm is well-known:

##x_{n+1} = x_n - \frac{f(x_n)}{f'(x_{n})}##

Looking at a few implementations online, I have encountered two methods for convergence:

1) The first method uses the function value of the last estimate itself, ##f(x_n)## or ##f(x_{n+1})##. Since at the root the function value is zero, this limits us to only specifying a maximum absolute error on ## f(x)##, as a relative error w.r.t. zero doesn't make much sense.

2) The second method uses the change in the root of the last iteration. If this change falls below a certain threshold value, then the algorithm is assumed to have converged. Here, one has a bit more freedom: one can look at the relative change ##(\frac{x_{n+1}}{x_n}-1)## in the root, or one can look at the step size ##x_{n+1}-x_n##.

Obviously, the two criteria (or three, if you count the last two as distinct) are different, so the question here is what are the consequences of choosing one criterion over the other.

mfb · Feb 6, 2021

The question is what do you need. Do you need the function value to be small (take 1) or do you need the deviation in x to be small (take 2)?

Kyouran · Feb 6, 2021

mfb said:

The question is what do you need. Do you need the function value to be small (take 1) or do you need the deviation in x to be small (take 2)?

I suppose either would do when you just want the root, but then again I can imagine that if you take a function that comes very close to zero yet doesn't actually cross it it may see it as a false root, say something like x^2 + 0.000001 with a tolerance of like 0.0001 would still lead to a root in the first case but not in the second case.

mfb · Feb 6, 2021

For these functions the method won't converge nicely anyway. Sure, if your cutoff is large then you get a result of the method (in both cases), but if you try to improve it the estimate will get worse.

Kyouran · Feb 6, 2021

What I'm wondering is whether the second method is more robust, as it doesn't seem to have that particular problem that I mentioned. Perhaps there are downsides to the second method as well where the first method may perform better; I just haven't figured it out. My overall goal here is to get an idea of the tradeoff that is being made here.

mfb · Feb 7, 2021

The second method is basically doing half of the next step, so typically you are "a bit closer" (you checked how large the next step would be) - but you also did more calculation.

aheight · Feb 15, 2021

Kyouran said:

so the question here is what are the consequences of choosing one criterion over the other.

They're both going to converge the same (quadratically or linearly usually) but big difference in speed if ##f(z)## is computationally difficult to compute, say millions of floating-point operations when choosing the criteria ##|f(z_{n+1})-f(z_n)|<p## compared to one subtraction of ##|z_{n+1}-z_n|<p## say when searching for a million roots of a million degree polynomial.

wrobel · Feb 15, 2021

https://books.google.ru/books?id=Kn...e&q=Kantorovich akilov Newtons method&f=false

Kyouran · Feb 16, 2021

Well, I suppose for the second method if you have a function and the slope is quite steep locally, then ##x_{n+1}## may be close to ##x_n## even though there is a large change in the function value. This gives the risk of premature convergence. The first method does not have that problem, but has the problem mentioned earlier. Combining both could make it more robust in that sense, but yes it would be more costly computationally.

pbuk · Feb 17, 2021

Kyouran said:

Well, I suppose for the second method if you have a function and the slope is quite steep locally, then ##x_{n+1}## may be close to ##x_n## even though there is a large change in the function value. This gives the risk of premature convergence. The first method does not have that problem, but has the problem mentioned earlier.

This was addressed in the first reply to your original question:

mfb said:

The question is what do you need. Do you need the function value to be small (take 1) or do you need the deviation in x to be small (take 2)?

and I am not sure any more needs to be said, except perhaps to deal with

Kyouran said:

Combining both could make it more robust in that sense, but yes it would be more costly computationally.

it could be amended to "Do you need the function value to be small (take 1), do you need the deviation in x to be small (take 2) or do you need both (take both 1 and 2)?"

Note that if your convergence test is ## |f(x_i)| < \varepsilon ## you no longer have a guarantee of convergence when ## |x_i - x| \le \varepsilon_0 x ## (## \varepsilon_0 ## is machine epsilon here and ## x ## is the exact solution). Partly because of this, we usually use test 2.

Convergence criterion for Newton-Raphson

Similar threads

Undergrad Unit Circle Confusion: A Self-Study Challenge?

Undergrad Proving that convexity implies second order derivative being positive

Undergrad Ambiguity of the term "indefinite integral"

Undergrad Derive the orthonormality condition for Legendre polynomials

Undergrad Problem with calculating projections of curl using rotation of contour

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers