I Convergence criterion for Newton-Raphson

  • I
  • Thread starter Thread starter Kyouran
  • Start date Start date
  • Tags Tags
    Convergence
AI Thread Summary
The discussion focuses on the convergence criteria for the Newton-Raphson algorithm, highlighting two primary methods: one based on the function value at the current estimate and the other on the change in the root estimate. The first method requires the function value to be small, which can lead to false roots in certain cases, while the second method assesses the change in the root, potentially offering more robustness against such issues. However, the second method may risk premature convergence if the function's slope is steep. Both methods converge similarly but differ significantly in computational efficiency, especially when the function is complex. A combined approach could enhance robustness but would increase computational costs.
Kyouran
Messages
70
Reaction score
10
TL;DR Summary
Convergence criterion Newton-Raphson
The Newton-Raphson algorithm is well-known:

##x_{n+1} = x_n - \frac{f(x_n)}{f'(x_{n})}##

Looking at a few implementations online, I have encountered two methods for convergence:

1) The first method uses the function value of the last estimate itself, ##f(x_n)## or ##f(x_{n+1})##. Since at the root the function value is zero, this limits us to only specifying a maximum absolute error on ## f(x)##, as a relative error w.r.t. zero doesn't make much sense.

2) The second method uses the change in the root of the last iteration. If this change falls below a certain treshold value, then the algorithm is assumed to have converged. Here, one has a bit more freedom: one can look at the relative change ##(\frac{x_{n+1}}{x_n}-1)## in the root, or one can look at the step size ##x_{n+1}-x_n##.

Obviously, the two criteria (or three, if you count the last two as distinct) are different, so the question here is what are the consequences of choosing one criterion over the other.
 
Mathematics news on Phys.org
The question is what do you need. Do you need the function value to be small (take 1) or do you need the deviation in x to be small (take 2)?
 
mfb said:
The question is what do you need. Do you need the function value to be small (take 1) or do you need the deviation in x to be small (take 2)?
I suppose either would do when you just want the root, but then again I can imagine that if you take a function that comes very close to zero yet doesn't actually cross it it may see it as a false root, say something like x^2 + 0.000001 with a tolerance of like 0.0001 would still lead to a root in the first case but not in the second case.
 
For these functions the method won't converge nicely anyway. Sure, if your cutoff is large then you get a result of the method (in both cases), but if you try to improve it the estimate will get worse.
 
What I'm wondering is whether the second method is more robust, as it doesn't seem to have that particular problem that I mentioned. Perhaps there are downsides to the second method as well where the first method may perform better; I just haven't figured it out. My overall goal here is to get an idea of the tradeoff that is being made here.
 
The second method is basically doing half of the next step, so typically you are "a bit closer" (you checked how large the next step would be) - but you also did more calculation.
 
Kyouran said:
so the question here is what are the consequences of choosing one criterion over the other.
They're both going to converge the same (quadratically or linearly usually) but big difference in speed if ##f(z)## is computationally difficult to compute, say millions of floating-point operations when choosing the criteria ##|f(z_{n+1})-f(z_n)|<p## compared to one subtraction of ##|z_{n+1}-z_n|<p## say when searching for a million roots of a million degree polynomial.
 
Well, I suppose for the second method if you have a function and the slope is quite steep locally, then ##x_{n+1}## may be close to ##x_n## even though there is a large change in the function value. This gives the risk of premature convergence. The first method does not have that problem, but has the problem mentioned earlier. Combining both could make it more robust in that sense, but yes it would be more costly computationally.
 
  • #10
Kyouran said:
Well, I suppose for the second method if you have a function and the slope is quite steep locally, then ##x_{n+1}## may be close to ##x_n## even though there is a large change in the function value. This gives the risk of premature convergence. The first method does not have that problem, but has the problem mentioned earlier.
This was addressed in the first reply to your original question:
mfb said:
The question is what do you need. Do you need the function value to be small (take 1) or do you need the deviation in x to be small (take 2)?

and I am not sure any more needs to be said, except perhaps to deal with

Kyouran said:
Combining both could make it more robust in that sense, but yes it would be more costly computationally.

it could be amended to "Do you need the function value to be small (take 1), do you need the deviation in x to be small (take 2) or do you need both (take both 1 and 2)?"

Note that if your convergence test is ## |f(x_i)| < \varepsilon ## you no longer have a guarantee of convergence when ## |x_i - x| \le \varepsilon_0 x ## (## \varepsilon_0 ## is machine epsilon here and ## x ## is the exact solution). Partly because of this, we usually use test 2.
 

Similar threads

Back
Top