Convergence criterion for Newton-Raphson

  • Context: Undergrad 
  • Thread starter Thread starter Kyouran
  • Start date Start date
  • Tags Tags
    Convergence
Click For Summary

Discussion Overview

The discussion revolves around the convergence criteria for the Newton-Raphson algorithm, focusing on two primary methods: one based on the function value at the current estimate and the other based on the change in the root estimate. Participants explore the implications of choosing one criterion over the other, considering aspects such as robustness and computational efficiency.

Discussion Character

  • Debate/contested
  • Technical explanation
  • Exploratory

Main Points Raised

  • Some participants describe the first method as relying on the function value, suggesting that it limits the specification to a maximum absolute error on the function itself.
  • Others propose the second method, which focuses on the change in the root estimate, allowing for more flexibility in defining convergence criteria.
  • One participant raises a concern that the first method may lead to false roots in cases where the function value approaches zero without crossing it.
  • Another participant notes that the second method may be more robust as it avoids the issue of false roots but questions whether it has its own downsides.
  • Some argue that both methods typically converge at similar rates but highlight significant differences in computational speed, especially when the function is complex to evaluate.
  • A later reply discusses the risk of premature convergence with the second method when the slope of the function is steep, while the first method does not face this issue.
  • One participant suggests that combining both criteria could enhance robustness but acknowledges the increased computational cost.
  • Another participant emphasizes that using the first method may not guarantee convergence under certain conditions, leading to a preference for the second method in practice.

Areas of Agreement / Disagreement

Participants express differing views on the effectiveness and robustness of the two convergence criteria, indicating that no consensus has been reached regarding which method is superior or under what conditions each should be preferred.

Contextual Notes

There are limitations regarding the assumptions made about the functions being analyzed, particularly in relation to their behavior near zero and the implications of machine epsilon on convergence guarantees.

Kyouran
Messages
70
Reaction score
10
TL;DR
Convergence criterion Newton-Raphson
The Newton-Raphson algorithm is well-known:

##x_{n+1} = x_n - \frac{f(x_n)}{f'(x_{n})}##

Looking at a few implementations online, I have encountered two methods for convergence:

1) The first method uses the function value of the last estimate itself, ##f(x_n)## or ##f(x_{n+1})##. Since at the root the function value is zero, this limits us to only specifying a maximum absolute error on ## f(x)##, as a relative error w.r.t. zero doesn't make much sense.

2) The second method uses the change in the root of the last iteration. If this change falls below a certain threshold value, then the algorithm is assumed to have converged. Here, one has a bit more freedom: one can look at the relative change ##(\frac{x_{n+1}}{x_n}-1)## in the root, or one can look at the step size ##x_{n+1}-x_n##.

Obviously, the two criteria (or three, if you count the last two as distinct) are different, so the question here is what are the consequences of choosing one criterion over the other.
 
Physics news on Phys.org
The question is what do you need. Do you need the function value to be small (take 1) or do you need the deviation in x to be small (take 2)?
 
mfb said:
The question is what do you need. Do you need the function value to be small (take 1) or do you need the deviation in x to be small (take 2)?
I suppose either would do when you just want the root, but then again I can imagine that if you take a function that comes very close to zero yet doesn't actually cross it it may see it as a false root, say something like x^2 + 0.000001 with a tolerance of like 0.0001 would still lead to a root in the first case but not in the second case.
 
For these functions the method won't converge nicely anyway. Sure, if your cutoff is large then you get a result of the method (in both cases), but if you try to improve it the estimate will get worse.
 
What I'm wondering is whether the second method is more robust, as it doesn't seem to have that particular problem that I mentioned. Perhaps there are downsides to the second method as well where the first method may perform better; I just haven't figured it out. My overall goal here is to get an idea of the tradeoff that is being made here.
 
The second method is basically doing half of the next step, so typically you are "a bit closer" (you checked how large the next step would be) - but you also did more calculation.
 
Kyouran said:
so the question here is what are the consequences of choosing one criterion over the other.
They're both going to converge the same (quadratically or linearly usually) but big difference in speed if ##f(z)## is computationally difficult to compute, say millions of floating-point operations when choosing the criteria ##|f(z_{n+1})-f(z_n)|<p## compared to one subtraction of ##|z_{n+1}-z_n|<p## say when searching for a million roots of a million degree polynomial.
 
Well, I suppose for the second method if you have a function and the slope is quite steep locally, then ##x_{n+1}## may be close to ##x_n## even though there is a large change in the function value. This gives the risk of premature convergence. The first method does not have that problem, but has the problem mentioned earlier. Combining both could make it more robust in that sense, but yes it would be more costly computationally.
 
  • #10
Kyouran said:
Well, I suppose for the second method if you have a function and the slope is quite steep locally, then ##x_{n+1}## may be close to ##x_n## even though there is a large change in the function value. This gives the risk of premature convergence. The first method does not have that problem, but has the problem mentioned earlier.
This was addressed in the first reply to your original question:
mfb said:
The question is what do you need. Do you need the function value to be small (take 1) or do you need the deviation in x to be small (take 2)?

and I am not sure any more needs to be said, except perhaps to deal with

Kyouran said:
Combining both could make it more robust in that sense, but yes it would be more costly computationally.

it could be amended to "Do you need the function value to be small (take 1), do you need the deviation in x to be small (take 2) or do you need both (take both 1 and 2)?"

Note that if your convergence test is ## |f(x_i)| < \varepsilon ## you no longer have a guarantee of convergence when ## |x_i - x| \le \varepsilon_0 x ## (## \varepsilon_0 ## is machine epsilon here and ## x ## is the exact solution). Partly because of this, we usually use test 2.
 

Similar threads

  • · Replies 13 ·
Replies
13
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 24 ·
Replies
24
Views
4K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 5 ·
Replies
5
Views
5K
  • · Replies 17 ·
Replies
17
Views
2K
  • · Replies 16 ·
Replies
16
Views
4K