# Automatic vs Symbolic differentiation

• Python

Deathmatch

## Main Question or Discussion Point

I thought I would give you guys some justification for why I keep pestering you with this odd numerical technique. I've mentioned its advantages over finite differences quite a few times, so now I am putting it up against symbolic differentiation.

Here is a plot of the function ##x^2 / {\ln(\cosh(x) + 1)}##, together with its first six derivatives, using automatic differentiation (1001 x data points in total). The computations use 236 bit floating point precision courtesy of gmpy2 (MPFR) arbitrary point arithmetic.

Here is an illustration of the CPU time involved in generating the data:
Code:
$time -p ./models.py 0 -8 8 1001 7 1e-12 1e-12 > /tmp/data 2>/dev/null real 0.30 user 0.29 sys 0.00 Here is an interactive session evaluating the same function at a single point, ##x = 2##, together with its first twelve derivatives: Code: $ ipython3
Python 3.7.1 (default, Oct 22 2018, 11:21:55)
IPython 7.2.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from ad import *

In [2]: x = Series.get(13, 2).var

In [3]: print(~(x * x / (x.cosh + 1).ln))
+2.562938002e+00 +1.312276410e+00 -3.440924889e-01 +2.366685634e-01 +2.260914151e-01 -1.617592557e+00 +5.097273067e+00 -1.157769456e+01 +1.290979888e+01 +5.759642561e+01 -5.363560778e+02 +2.667773442e+03 -8.844486444e+03
Finally, for comparison, here is something to cut & paste into Wolfram Alpha (symbolic computation):
Code:
d^6/dx^6 x^2 / ln(cosh(x) + 1) where x = 2
I couldn't get it to do the twelfth derivative, and even for the sixth it will not attempt to evaluate the value (not unless I register anyway!).

Enjoy!

Last edited:

Related Programming and Computer Science News on Phys.org
phyzguy
Is your point that numerical differentiation is faster than symbolic differentiation? This is probably true if you only want the answer at one point or a small number of points. But the symbolic answer gives you the full function at all possible points. It contains information that is not contained in the numerical values.

Evidently I am arriving late to the party. What is the algorithm that you use for what you call "automatic" differentiation? (I know nothing about Python, so please describe the algorithm in mathematical terms; thanks).

But the symbolic answer gives you the full function at all possible points. It contains information that is not contained in the numerical values.
So does AD; all the derivatives are derived from the function itself, at any point where the function is defined.

Evidently I am arriving late to the party. What is the algorithm that you use for what you call "automatic" differentiation? (I know nothing about Python, so please describe the algorithm in mathematical terms; thanks).

FactChecker
Gold Member
A lot of the benefit of symbolic differentiation is in seeing the structure of the derivative. The simplest example I can think of is that the derivative of the exponential is itself. It doesn't look to me like AD would ever tell you that except point-by-point within a certain accuracy. AD is still a numerical technique.

I think that AD would be of more interest as a different numerical approach than as a substitute for symbolic differentiation.

A lot of the benefit of symbolic differentiation is in seeing the structure of the derivative. The simplest example I can think of is that the derivative of the exponential is itself.
Not a major point to me, as in all practical cases where I've used symbolic differentiation (like calculating the einstein tensor from the metric in GR), the symbolic representations are a complete mess, even if the package is any good at simplification, but yes I suppose it is a benefit. Basically, AD can effortlessly evaluate differentials to orders that can easily choke any the major computer algebra systems. Way beyond anyone's ability to see structure ;)
It doesn't look to me like AD would ever tell you that except point-by-point within a certain accuracy. AD is still a numerical technique.
Yes, reverse mode AD the numerical technique behind tensor flow. I am talking about the benefits of forward mode AD, which course it is a numerical technique, just not a very well known one (as I have learned from past responses to my posts).
I think that AD would be of more interest as a different numerical approach than as a substitute for symbolic differentiation.
I am not putting it forward as a replacement for symbolic differentiation, it is an alternative to finite differences (in many circumstances) and even RK4 in most circumstances, but it essentially performs the same (but much simpler combinatorially) calculations as symbolic and is fundamentally of the same accuracy. Hence the comparison.

The simplest example I can think of is that the derivative of the exponential is itself. It doesn't look to me like AD would ever tell you that except point-by-point within a certain accuracy.
Thought I would address this one separately. Here are the first twenty nineteen differentials of ##\exp (2.0)##:
Code:
\$ ipython3
Python 3.7.1 (default, Oct 22 2018, 11:21:55)
IPython 7.2.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from ad import *

In [2]: x = Series.get(20, 2).var

In [3]: print(~(x.exp))
+7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00
I think I can see the structure in that ;)

FactChecker
Gold Member
Thought I would address this one separately. Here are the first twenty nineteen differentials of ##\exp (2.0)##:
Code:
+7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00
I think I can see the structure in that ;)
Yes. Because it is trivial. It might be more difficult to recognize even slightly less trivial examples like ##x^n+x^{n-1}## or ##x*\sin(x)##.

Yes. Because it is trivial. It might be more difficult to recognize even slightly less trivial examples like ##x^n+x^{n-1}## or ##x*\sin(x)##.
Agreed, but as I mentioned above the "window" of intelligible output is finite.

There is a subset of use cases for a CAS, where the symbolic output is translated to a more "familiar" language for execution. These are the use cases where my comparison is valid. Not because of the extra time generating the derivatives (this is already paid for), but the time merely to evaluate the more complex expressions. Aside from this, it is not trivial to implement complicated expressions such as these without making errors (testing is essential!).

Unless I am much mistaken, the complexity of a CAS doing symbolic differentiation and evaluation is subject to Francesco Faà di Bruno's formula, whereas the Taylor Series Method is based on the Cauchy product. It certainly feels that way in Wolfram Alpha as the order of differentiation increases ;)

FactChecker
Gold Member
I have never personally needed a higher order of derivative than third. In everything I have been involved in, I needed velocity and acceleration often, but I only needed jerk once or twice, and never (that I can remember) needed higher derivatives (snap, crackle, pop).

EDIT: I should add that I could never use very sophisticated numerical techniques because there were always messy complications (random components, needing to find global minimums rather than local, etc.) that prevented it.

Last edited:
I have never personally needed a higher order of derivative than third. In everything I have been involved in, I needed velocity and acceleration often, but I only needed jerk once or twice, and never (that I can remember) needed higher derivatives (snap, crackle, pop).
You bring up an interesting point, so make yourself comfortable ;)

I am assuming that you use or have used RK4 for solving ODEs, which is by definition a finite difference approximation to a fourth order Taylor Series solver (because finding higher order derivatives is supposedly "difficult" or "expensive").

In fact, as I have demonstrated here, finding higher order derivatives is not difficult or expensive at all. The Taylor Series Method (TSM) which is built on the iterative AD functions and operators, is trivial at fourth order (and much higher), and is immune from the compromises and inaccuracies of finite difference.

That is a pretty big use case. Essentially, I would contend that the TSM is superior to RK4 except for functions not covered by a given AD arithmetic, or for tabular functions.

FactChecker
FactChecker
Gold Member
Good point. ODEs are a big use case that I did not encounter. There may have been people where I worked who did a lot of it that I was not aware of. For instance, I don't know what is involved in the aerodynamic CFD calculations.

Last edited:
m4r35n357
Good point. ODEs are a big use case that I did not encounter. There may have been people who did a lot of it that I was not aware of. For instance, I don't know what is involved in the aerodynamic CFD calculations.
Thanks for the feedback!

ODEs are what got me involved with this method in the first place, and is the main reason I reverse-engineered the procedure for my own use. Turns out that the best way to verify the low level functions that I needed was to wrap them in Series objects and do those function/derivative plots like in the OP. Once this was done, the interactivity was almost an obvious thing to tidy up. But neither of those is as important to me as the ODE solver!

I wonder if there is any sensible application of this to PDEs, but I don't have experience solving them. I haven't seen anything in the literature.

anorlunda
Staff Emeritus
That is a pretty big use case. Essentially, I would contend that the TSM is superior to RK4 except for functions not covered by a given AD arithmetic, or for tabular functions.
There are many problems with strong nonlinearities. Just think of a diode for example. All higher order integrations are disadvantaged in cases where the equations and/or the coefficients change dramatically from one time step to the next.

There are many problems with strong nonlinearities. Just think of a diode for example. All higher order integrations are disadvantaged in cases where the equations and/or the coefficients change dramatically from one time step to the next.
I'm sure there are degenerate/edge cases, but here is ##|x + 1|##:

Yes there really are 12 derivatives here! Piecewise functions are fine as long as f() is defined at the jump (and derivatives set to zero - see next sentence!). However, this is nothing whatsoever to do with the order of integration as it begins with the first derivative as I mentioned (Euler's method and RK4 would suffer the same fate).

Really?

pbuk