Is this problem ill-conditioned or not?

  • Thread starter nonequilibrium
  • Start date
In summary, the conversation discusses the use of the Newton-Raphson method to find the zero of a function f, and whether it is more beneficial to use another function g that reaches smaller values. The conversation also mentions the use of scaling to avoid catastrophic cancellations and the importance of considering the condition number of a matrix when solving for eigenvalues. The speaker shares their own experience with using rescaling in their own code to get better results.
  • #1
nonequilibrium
1,439
2
Say we want to know the zero of a function f, then of course we can use the iterative Newton-Raphson method, given by [itex]x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}[/itex].

But say f typically has really big values, and say you can actually, without much trouble, rewrite your problem as solving for the zero of a function g which reaches much smaller values. Is it then "better" to use [itex]x_{n+1} = x_n - \frac{g(x_n)}{g'(x_n)}[/itex] instead of [itex]x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}[/itex]? Or does it make no difference? After all, the "bigness" of f doesn't seem to influence the relative error on x (since if [.] denotes the computed value of something, then [itex][f(x_n)] = (1+\epsilon_r^{(1)})f(x_n)[/itex] and [itex][f'(x_n)] = (1+\epsilon_r^{(2)})f'(x_n)[/itex], such that [itex]\frac{[f](x_n)}{[f'](x_n)} = \frac{f(x_n)}{f'(x_n)}(1+2 \epsilon_r)[/itex], so no problem there)

On the other hand, although the method with the big f might be well conditioned(?), it might still be better to use g, since you'll avoid an overflow?
 
Mathematics news on Phys.org
  • #2
Simply scaling your function (for example using g(x)=f(x)/100000) does nothing to help your problem. There's no purpose from an analytic standpoint to switch functions unless g(x) has a shape which is better suited for solving the problem - this in itself has nothing to do with the relative magnitudes of f(x) and g(x). It seems like 99.9% of the time given a function f(x) it would be impossible to construct a wholly different function g(x) that you know has the same zeros as f(x) does

Picking a function with smaller values might help if f(x) takes on values which are obscenely large, but if g(x) ends up taking values that are obscenely small you'll have the same problem in the other direction.
 
  • #3
Thanks for replying. Maybe I'll make my situation a bit more concrete:

I need to find the eigenvalues of a certain matrix, i.e. solve [itex]|A - \lambda I| = 0[/itex], but it turns out that the elements of the matrix A are multiplied by a large constant, i.e. [itex]A = \alpha B[/itex] with B having 1's and 2's as elements and [itex]\alpha[/itex] being pretty big. So I was wondering if it would be benificial to solve instead [itex]|B - \tilde \lambda I| = 0[/itex] and then at the end [itex]\lambda = \alpha \lambda'[/itex].

Intuitively I expected it to make a difference, but thinking about it more closely I can't really seem to justify why it would make a difference. (NB: to evaluate the determinant I'm using a recursive formula, using that A is tridiagonal) Perhaps it does make a difference, but not in the evaluation of the Newton-Raphson method, but in the evaluation of the determinant... Avoiding catastrophic cancellations(?)
 
Last edited:
  • #4
The amount of numerical ill conditioning when doing operations on a matrix usually depends on the "condition number" which is the ratio of the largest to the smallest eigenvalue. A bigger condition number means the matrix is more nearly "numerically singular". But unless ##\log_{10}## of the condition number is similar to the number of significant digits used in the calculation (16, for most serious computer calculations), this is not usually an issue, There are various ways to estimate the condition number without actually finding the eigenvalues. One way is to start by doing a LU factorization of the the matrix, which is always a well behaved process, even if the L and U matrices it produces are very badly conditioned.

Scaling all the terms of a matrix doesn't change its condition number. For some numerical methods of finding eigenvalues, there is an advantage in solving ##|(A + sI) - (\lambda+s)I| = 0##, in other words adding or subtracting a constant ##s## to the original eigenvalues, and making the matrix more or less diagonally dominant.

There was a significant breakthrough in the general subject of error analysis in the 1960s, when Wilkinson realized that instead of asking "how wrong is the numerical answer compared with the exact answer", which is usually mathematically intractible, there is an easier question that is just as useful in practice, which is "if the answers I got are the exact answers to the wrong problem, how wrong is the wrong problem".

For example if you are solbing the matrix equations ##Ax = b## and you get an approximate answer ##x'##, tryiing to estimate ##x-x'## is hard. But in a practical physics or engineering situation where ##A## and/or ##b## are not known exactly, it can be more useful to investigate if there is a "small" vector ##b'## such that ##Ax'= b+b'##, or if there is a "small" matrix ##A'## such that ##(A+A')x' = b##.
 
  • #5
But then how do you explain that it does have a major effect on my numerical computations? Without the procedure described in post #3 I get ridiculous results, whereas with it, it works smoothly and effectively...
 
  • #6
mr. vodka said:
But then how do you explain that it does have a major effect on my numerical computations? Without the procedure described in post #3 I get ridiculous results, whereas with it, it works smoothly and effectively...

Are you using an eigensolution routine from a "professional" library, or your own code? Are you using the most appropriate numerical method?

Seeing the problem is probably a necessary (but not sufficient!) condition for getting an explanation.
 
  • #7
My own code. I don't want to bother you with code, but the essence of what is happening is pretty simple.

If you like, you can take a look at the page I attached. I'm basically using that iterative formula to calculate the determinant of a tridiagonal matrix. In my case, the elements a,b,c are pretty big (and if it matters: the eventual eigenvalues, at least the smallest ones, are "normal-sized", i.e. order 1). If I simply evaluate that determinant, as it is, I get wrong/NaN results, but if I first replace a,b,c by [itex]\alpha a, \alpha b, \alpha c[/itex] (with alpha small), and then solve for lambda, in the end dividing lambda by alpha, I get good results.

Maybe the specific formula (i.e. the one on the attached page) induces catastrophic cancellations? However, if that is the case, then I'm not sure why I have no troubles at all after rescaling, since then my lambda is typically really small, so relative to lambda, a and b are still huge, such that operations like in (5.122) should behave funnily(?)

EDIT: By the way, thanks for following up
 

Attachments

  • devries.gif
    devries.gif
    28.6 KB · Views: 464
  • #8
OK. If you are trying to find the eigenvalues by determinant search, you don't really need the value of the determinant, only its sign. But the determinant is likely to be of the order of the product of all the diagonal terms of the matrix, so you need to keep track of a scale factor so that stays within the range of your floating point arithemetic. You could either scale the matrix, or represent the D's by an integer power of 2 (or 10 of you prefer) times a floating ponit number fairly close to 1. That would not overflow until the determinant was of the order of about 10^8 decimal digits, which should be big enough for most situations!)

Alternatively, you could do an LU factorization of matrix 5.121, which is about the same amount of work as finding the determinant but willl not overflow, count the number of sign changes along the diagonal terms of U, and use that as a Sturm sequence to tell you the number of negative eigenvalues for a given value of ##\lambda##..
 
Last edited:
  • #9
Just the add: this is nothing to do with numerical conditioning, it is just the fact that the determinant is too big to represent as a floating point number. You would get the same problem running the code even if the matrix was diagonal and there was no cancelation at all.

For a large matrix, the matrix elements don't even have to be "big". For example if the elements of a matrix of order 500 were about 10, the determinant would be usually be about ##10^{500}##, but IEEE 64-bit floating point can only represent numbers up to about ##10^{300}##. But calculating all the eigenpairs of an arbitrary matrix of order 500 is not usually a big deal numerically in 64-bit arithmetic

FWIW the old LINPACK numerical library included determinant calculation routines, and returned the value as a power of 10 times a number between 1 and 10. The newer LAPACK libraries don't contain any determinant calculation routines at all, though you can get the determinant easily after calling other routines in the lilbrary if you really really want it.
 
  • #10
Thank you, your posts were enlightening.
 

1. What does it mean for a problem to be "ill-conditioned"?

When a problem is considered to be "ill-conditioned," it means that small changes in the input data or parameters can result in significantly large changes in the output or solution. This can lead to inaccurate or unstable results.

2. How can I determine if a problem is ill-conditioned?

There are a few ways to determine if a problem is ill-conditioned. One approach is to analyze the sensitivity of the problem, looking at how small changes in input data affect the output. Another method is to use numerical techniques, such as calculating the condition number, which measures the stability of the problem.

3. What are some common examples of ill-conditioned problems?

Ill-conditioned problems can arise in various fields, such as physics, engineering, and mathematics. Some common examples include numerical integration, solving systems of linear equations, and finding the roots of polynomials.

4. What are the consequences of working with an ill-conditioned problem?

Working with an ill-conditioned problem can lead to inaccurate or unstable results. This can be especially problematic in scientific research and engineering, where precise and reliable solutions are necessary. It can also significantly impact the efficiency and speed of the problem-solving process.

5. How can I improve the conditioning of a problem?

There are a few strategies that can be used to improve the conditioning of a problem. These include rescaling the input data or parameters, using more accurate computational methods, and applying regularization techniques to minimize the effects of noise or errors in the data. It is also important to choose appropriate algorithms and numerical techniques for the specific problem at hand.

Similar threads

  • General Math
Replies
9
Views
1K
Replies
1
Views
9K
Replies
19
Views
2K
Replies
7
Views
1K
Replies
17
Views
3K
  • Calculus
Replies
13
Views
1K
  • Quantum Physics
Replies
21
Views
2K
Replies
1
Views
788
  • Calculus and Beyond Homework Help
Replies
1
Views
704
Replies
24
Views
2K
Back
Top