Convergence criterion for Newton-Raphson

  • Context: Undergrad 
  • Thread starter Thread starter Kyouran
  • Start date Start date
  • Tags Tags
    Convergence
Click For Summary
SUMMARY

The discussion focuses on the convergence criteria for the Newton-Raphson algorithm, specifically comparing two methods: one based on the function value at the last estimate, ##f(x_n)##, and the other based on the change in the root, ##|x_{n+1} - x_n|##. The first method limits the specification of maximum absolute error due to the nature of the function approaching zero, while the second method allows for more flexibility and is generally more robust against false roots. The trade-offs between computational cost and convergence speed are highlighted, with the second method being preferred in scenarios where function evaluations are computationally expensive.

PREREQUISITES
  • Understanding of the Newton-Raphson algorithm
  • Familiarity with numerical methods and convergence criteria
  • Knowledge of function evaluation and floating-point operations
  • Basic concepts of error tolerance in numerical analysis
NEXT STEPS
  • Research the implementation of the Newton-Raphson algorithm in Python using libraries like NumPy
  • Explore advanced convergence criteria in numerical methods, such as hybrid methods combining both function value and root change
  • Study the impact of machine epsilon on convergence guarantees in numerical algorithms
  • Investigate performance optimization techniques for computationally expensive function evaluations
USEFUL FOR

Mathematicians, software developers, and engineers involved in numerical analysis, optimization, and algorithm development will benefit from this discussion.

Kyouran
Messages
70
Reaction score
10
TL;DR
Convergence criterion Newton-Raphson
The Newton-Raphson algorithm is well-known:

##x_{n+1} = x_n - \frac{f(x_n)}{f'(x_{n})}##

Looking at a few implementations online, I have encountered two methods for convergence:

1) The first method uses the function value of the last estimate itself, ##f(x_n)## or ##f(x_{n+1})##. Since at the root the function value is zero, this limits us to only specifying a maximum absolute error on ## f(x)##, as a relative error w.r.t. zero doesn't make much sense.

2) The second method uses the change in the root of the last iteration. If this change falls below a certain threshold value, then the algorithm is assumed to have converged. Here, one has a bit more freedom: one can look at the relative change ##(\frac{x_{n+1}}{x_n}-1)## in the root, or one can look at the step size ##x_{n+1}-x_n##.

Obviously, the two criteria (or three, if you count the last two as distinct) are different, so the question here is what are the consequences of choosing one criterion over the other.
 
Physics news on Phys.org
The question is what do you need. Do you need the function value to be small (take 1) or do you need the deviation in x to be small (take 2)?
 
mfb said:
The question is what do you need. Do you need the function value to be small (take 1) or do you need the deviation in x to be small (take 2)?
I suppose either would do when you just want the root, but then again I can imagine that if you take a function that comes very close to zero yet doesn't actually cross it it may see it as a false root, say something like x^2 + 0.000001 with a tolerance of like 0.0001 would still lead to a root in the first case but not in the second case.
 
For these functions the method won't converge nicely anyway. Sure, if your cutoff is large then you get a result of the method (in both cases), but if you try to improve it the estimate will get worse.
 
What I'm wondering is whether the second method is more robust, as it doesn't seem to have that particular problem that I mentioned. Perhaps there are downsides to the second method as well where the first method may perform better; I just haven't figured it out. My overall goal here is to get an idea of the tradeoff that is being made here.
 
The second method is basically doing half of the next step, so typically you are "a bit closer" (you checked how large the next step would be) - but you also did more calculation.
 
Kyouran said:
so the question here is what are the consequences of choosing one criterion over the other.
They're both going to converge the same (quadratically or linearly usually) but big difference in speed if ##f(z)## is computationally difficult to compute, say millions of floating-point operations when choosing the criteria ##|f(z_{n+1})-f(z_n)|<p## compared to one subtraction of ##|z_{n+1}-z_n|<p## say when searching for a million roots of a million degree polynomial.
 
Well, I suppose for the second method if you have a function and the slope is quite steep locally, then ##x_{n+1}## may be close to ##x_n## even though there is a large change in the function value. This gives the risk of premature convergence. The first method does not have that problem, but has the problem mentioned earlier. Combining both could make it more robust in that sense, but yes it would be more costly computationally.
 
  • #10
Kyouran said:
Well, I suppose for the second method if you have a function and the slope is quite steep locally, then ##x_{n+1}## may be close to ##x_n## even though there is a large change in the function value. This gives the risk of premature convergence. The first method does not have that problem, but has the problem mentioned earlier.
This was addressed in the first reply to your original question:
mfb said:
The question is what do you need. Do you need the function value to be small (take 1) or do you need the deviation in x to be small (take 2)?

and I am not sure any more needs to be said, except perhaps to deal with

Kyouran said:
Combining both could make it more robust in that sense, but yes it would be more costly computationally.

it could be amended to "Do you need the function value to be small (take 1), do you need the deviation in x to be small (take 2) or do you need both (take both 1 and 2)?"

Note that if your convergence test is ## |f(x_i)| < \varepsilon ## you no longer have a guarantee of convergence when ## |x_i - x| \le \varepsilon_0 x ## (## \varepsilon_0 ## is machine epsilon here and ## x ## is the exact solution). Partly because of this, we usually use test 2.
 

Similar threads

  • · Replies 13 ·
Replies
13
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 24 ·
Replies
24
Views
4K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 17 ·
Replies
17
Views
2K
  • · Replies 16 ·
Replies
16
Views
4K