# Numerical computation of the derivative

pbuk
Science Advisor
Gold Member
Right, I understood that.
That does not appear to be the case.

But also in that case you can just pick h to be 2^-24 (or maybe -23?) For all x, for 32-bit floats.
1. We are in Python, there aren't any 32-bit floats.
2. A 32 bit float has ## \epsilon = 2^{-24}##, what is going to happen to your accuracy when say ## x = 0.9 ##?
3. If you use a step of ## h = 2^{-24} ## at ## x = 2^{-25} ## the rocket explodes.
4. I could go on but this is not a good use of anyone's time: if you have something to contribute then show us the math.

sysprog
though perhaps not applicable in this specific case
Yes, I did get it, but I don't think this is a good implementation for reasons mentioned.

You are correct that I wasn't thinking about very small values initially, but...

In python, you can use frexp(x) to obtain the exponent of x, and just choose h (e.g, h = 2** [(frexp(x)-53)] for double precision) on that value without all the ugliness involved using h = (1+e)x. It requires some knowledge of the platform specific floating point implementation, but its pretty standard, and if IIRC, python has some methods and constants that will help out.

As an aside, you can abuse c, and just cast abs(x) to a uint with the proper number of bits and add 1 and then reapply the sign (IEEE 754 is lexographically ordered, excepting the sign). Of course in python or c, you will need to deal with over/under flow. It might look messier, but I think it is better.

I would personally avoid the OPs implementation because it is very possible to implement a better algorithm that is probably even more efficient, and just overall consistent over any interval.

But toward your example, choosing h with my approach would yield smaller values for most numbers, rather than running with x, so h on the interval from [0.25, 0.5), for example, would actually be the same, whereas that is not the case for the implementation in question.

Thinking about this more, I like it even less. Again, there are no details on the function being differentiated, but take, for example f(x) = 1 / (x - 0.5) and evaluate at 0.5. it should be undefined, but would give an answer.

I like this formula because in x + Eps, Eps has a dimension, usually ignored, but scaling is often important, something this formula is explicitly giving as Eps is dimensionless in (1 + Eps)x and we measure scale relative to 1, whereas now, x is the only quantity with a dimension. Numerical analysts must take this sort of thing into account all the time to get "adaptive" algorithms. The one point to be aware of in allowing x to set dimension (or absolute scale) in the problem is that if x = 0 falls within the map, this loses scale information, and you will see phenomenon akin to Gibbs phenomenon or Runge's phenomenon. Of course, there are strategems to aviod the trivial setting x = 0 exactly, so strictly speaking, once you are aware, you can avoid program crashes, but Gibbs phenomenon looks pretty unavoidable here. I guess I would use "weak" methods like test functions in Sobolev theory to avoid this. Other than a few minor issues like that, it seems superficially to me to warrant some investigation. In classical applied math and physics people did this sort of thing routinely as in the dielectric constant that appears in classical electrodynamics. Lots of mathematicians are sharp about this sort of thing (e.g. Sobolev, himself) but often there is a gap between the mathematicians and the numerical analyst practitioners. I am only a math teacher, so that's probably worse!!!

I like this formula because in x + Eps, Eps has a dimension, usually ignored, but scaling is often important, something this formula is explicitly giving as Eps is dimensionless in (1 + Eps)x and we measure scale relative to 1, whereas now, x is the only quantity with a dimension. Numerical analysts must take this sort of thing into account all the time to get "adaptive" algorithms. The one point to be aware of in allowing x to set dimension (or absolute scale) in the problem is that if x = 0 falls within the map, this loses scale information, and you will see phenomenon akin to Gibbs phenomenon or Runge's phenomenon. Of course, there are strategems to aviod the trivial setting x = 0 exactly, so strictly speaking, once you are aware, you can avoid program crashes, but Gibbs phenomenon looks pretty unavoidable here. I guess I would use "weak" methods like test functions in Sobolev theory to avoid this. Other than a few minor issues like that, it seems superficially to me to warrant some investigation. In classical applied math and physics people did this sort of thing routinely as in the dielectric constant that appears in classical electrodynamics. Lots of mathematicians are sharp about this sort of thing (e.g. Sobolev, himself) but often there is a gap between the mathematicians and the numerical analyst practitioners. I am only a math teacher, so that's probably worse!!!
To me, it's not impractical at all to implement a method that provides the most accurate possible results within the capabilities of machine representation using the actual limit definition of a derivative.

I keep thinking of examples where this approach fails, and perhaps in specialized circumstances it is works well, but I am not really seeing any utility other than expedient implementation.

To me, it's not impractical at all to implement a method that provides the most accurate possible results within the capabilities of machine representation using the actual limit definition of a derivative.

I keep thinking of examples where this approach fails, and perhaps in specialized circumstances it is works well, but I am not really seeing any utility other than expedient implementation.
I think it is important to see the proposed formula more in light of classical "macroscopics", i.e. not trying to get at the highly accurate microscopic detail. I personally do not know whether the formula is of much use or not. We see the same "classical" problem of "loss of scale" that can haunt Newton's method for finding zeros. I only meant to point out that the proposal has some appeal in a "big picture" way. It is worth bearing in mind that classical Maxwell theory had to give way to quantum mechanics as ways of addressing microscopic behaviors became more relevant and accessible. But I don't think that implies "lack of utility" as there are instances where we want this sort of picture, and as we well know, Newton's method works pretty nicely in lots of cases despite grievous deficiencies.

The OP ( @Gaussian97 ) presented a function for a symmetrical derivative calculation (central difference). This may be useful is some instances.

As this is a computational activity, a key in implementing it is the resultant errors in the steps necessary to carry it out. Compare the original presentation with a simple
[f(x+e) - f(x)]/e
This includes two function evaluations, two sums/differences, and one divide.
The original formula includes two function evaluations, two sums/differences, one divide, and 7 multiplications. The multiplications include terms which are small and the products are then summed. Digital accuracy is compromised.

Another consideration is origination of the data and how the derivatives will be used. If this is experimental data (as most all data is), there will be noise. It is typical for noise to be high frequency such as may be found with radio frequency interference on electrical transducers and on digitally computed results. When such noisy data goes into either of the derivative functions, the high frequency is amplified and may overwhelm the fundamental data.

See R.W. Hamming's book, Digital FIlters, section 6.4 on the design of a filter which only requires products and sums to shape a filter so it does not amplify the high frequency noise. Digital division is very 'expensive'. Converting the derivative to only sums and products reduces this cost, and provides a solution that is not only easy to implement on a general purpose computer but also provides the ladder structure for direct implementation on digital signal processors for rapid, dedicated applications.

This physics forum is a great place to have this discussion as the problem is fundamental to analyses.

Last edited by a moderator:
mgeorge001
FactChecker
Science Advisor
Gold Member
Another consideration is origination of the data and how the derivatives will be used. If this is experimental data (as most all data is), there will be noise. It is typical for noise to be high frequency such as may be found with radio frequency interference on electrical transducers and on digitally computed results. When such noisy data goes into either of the derivative functions, the high frequency is amplified and may overwhelm the fundamental data.
This is a case where the two-sided estimate would tend to (slightly) mitigate the noise compared to the one-sided estimate (for equal ##\epsilon## values). It is averaging the function change over a larger change of x.

I thought I'd share this timestamped link into a video that YouTube just randomly recommended. It's a nice numerical differentiation trick that might interest those who have been following this thread.

Astronuc, BvU and pbuk
I think it is important to see the proposed formula more in light of classical "macroscopics", i.e. not trying to get at the highly accurate microscopic detail. I personally do not know whether the formula is of much use or not. We see the same "classical" problem of "loss of scale" that can haunt Newton's method for finding zeros. I only meant to point out that the proposal has some appeal in a "big picture" way. It is worth bearing in mind that classical Maxwell theory had to give way to quantum mechanics as ways of addressing microscopic behaviors became more relevant and accessible. But I don't think that implies "lack of utility" as there are instances where we want this sort of picture, and as we well know, Newton's method works pretty nicely in lots of cases despite grievous deficiencies.
Well, for sure you can have functions that just don't work well with basic numerical approaches. If it is something high frequency, for example, and you try to compute a derivative that should be at a maxima or minima, it won't work well with a standard approach, but might be better with a +/- dh. But I think that really is a more fundamental problem of scale and numerical accuracy in the simulation. I think it's reasonable (I haven't gotten deep on the math too much) that the extra error term (say h²) in a quadratic gets erased doing it this way, but in a lot of cases, this term will be much smaller than f(x) to the point that it is outside of machine precision. Obviously not always. But if it does become problem I don't think the method of taking the derivative is the issue. It would be better to scale the function appropriately if at all possible to get more accurate results.

mgeorge001
pbuk
Science Advisor
Gold Member
I haven't gotten deep on the math too much
<sigh> then what is the point in commenting? Numerical analysis without the analysis is just guessing.

the extra error term (say h²) in a quadratic gets erased doing it this way
No, I repeat, the point in a two-sided calculation is that the ## f''(x) ## term gets erased is eliminated. Where ## f''(x) \gg f'(x) ##, which is of course true at an extremum, this leads to better accuracy. However where this is not true the two-sided calculation can be less accurate, because you are doubling the effect of ## \epsilon ##.

Well, for sure you can have functions that just don't work well with basic numerical approaches. If it is something high frequency, for example, and you try to compute a derivative that should be at a maxima or minima, it won't work well with a standard approach, but might be better with a +/- dh. But I think that really is a more fundamental problem of scale and numerical accuracy in the simulation. I think it's reasonable (I haven't gotten deep on the math too much) that the extra error term (say h²) in a quadratic gets erased doing it this way, but in a lot of cases, this term will be much smaller than f(x) to the point that it is outside of machine precision. Obviously not always. But if it does become problem I don't think the method of taking the derivative is the issue. It would be better to scale the function appropriately if at all possible to get more accurate results.
I think your point about scale is significant. Thanks. You have a mature perspective on this, that's obvious. A lot of this kind of work boils down to assumptions about small/small: That's a tricky 0/0 issue, but with a lot of work, like being able to reliably neglect say an h^2, I think numerical experts can address the difficulties meaningfully. The 0/0 problem vis a vis logs is equivalent to the infinity - infinity problem. Both enter the picture sometimes.

<sigh> then what is the point in commenting? Numerical analysis without the analysis is just guessing.

No, I repeat, the point in a two-sided calculation is that the ## f''(x) ## term gets erased is eliminated. Where ## f''(x) \gg f'(x) ##, which is of course true at an extremum, this leads to better accuracy. However where this is not true the two-sided calculation can be less accurate, because you are doubling the effect of ## \epsilon ##.
Perhaps I misread or misinterpreted that as saying the term is eliminated when ## f''(x) ## is zero, which over an interval is just a straight line, and isn't that interesting. So I meant to say, and didn't express it well, that I'm pretty sure if ## f''(x) ## is some constant, then the answer should be exact (within machine rounding precision).

For example, you can think of the symmetrical derivation function in terms of the mean value theorem. It is essentially computing a secant line between two points on f. MVT says there is an x' between those two points whose tangent line is parallel to the secant. It doesn't necessarily mean that x' == x, but in the case of a parabola, you can show that for any choice of h, x' will always be equal to x, or in other words, the two endpoints of the secant line will always be equidistant from the value x whose tangent line is parallel.

So when I say I didn't go into the math more deeply, I just meant that I didn't make any an attempt to prove the statement: "if ## f''(x) ## is some constant over the evaluation interval, then the answer is exact" for all functions, but I think it is probably true.