# Automatic vs Symbolic differentiation

• Python
• m4r35n357
In summary, AD is faster than symbolic differentiation for small derivatives, but symbolic differentiation is more accurate overall.
m4r35n357
TL;DR Summary
Deathmatch
I thought I would give you guys some justification for why I keep pestering you with this odd numerical technique. I've mentioned its advantages over finite differences quite a few times, so now I am putting it up against symbolic differentiation.

Here is a plot of the function ##x^2 / {\ln(\cosh(x) + 1)}##, together with its first six derivatives, using automatic differentiation (1001 x data points in total). The computations use 236 bit floating point precision courtesy of gmpy2 (MPFR) arbitrary point arithmetic.

Here is an illustration of the CPU time involved in generating the data:
Code:
$time -p ./models.py 0 -8 8 1001 7 1e-12 1e-12 > /tmp/data 2>/dev/null real 0.30 user 0.29 sys 0.00 Here is an interactive session evaluating the same function at a single point, ##x = 2##, together with its first twelve derivatives: Code: $ ipython3
Python 3.7.1 (default, Oct 22 2018, 11:21:55)
IPython 7.2.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from ad import *

In [2]: x = Series.get(13, 2).var

In [3]: print(~(x * x / (x.cosh + 1).ln))
+2.562938002e+00 +1.312276410e+00 -3.440924889e-01 +2.366685634e-01 +2.260914151e-01 -1.617592557e+00 +5.097273067e+00 -1.157769456e+01 +1.290979888e+01 +5.759642561e+01 -5.363560778e+02 +2.667773442e+03 -8.844486444e+03
Finally, for comparison, here is something to cut & paste into Wolfram Alpha (symbolic computation):
Code:
d^6/dx^6 x^2 / ln(cosh(x) + 1) where x = 2
I couldn't get it to do the twelfth derivative, and even for the sixth it will not attempt to evaluate the value (not unless I register anyway!).

Enjoy!

Last edited:
Is your point that numerical differentiation is faster than symbolic differentiation? This is probably true if you only want the answer at one point or a small number of points. But the symbolic answer gives you the full function at all possible points. It contains information that is not contained in the numerical values.

Evidently I am arriving late to the party. What is the algorithm that you use for what you call "automatic" differentiation? (I know nothing about Python, so please describe the algorithm in mathematical terms; thanks).

phyzguy said:
But the symbolic answer gives you the full function at all possible points. It contains information that is not contained in the numerical values.
So does AD; all the derivatives are derived from the function itself, at any point where the function is defined.

Dr.D said:
Evidently I am arriving late to the party. What is the algorithm that you use for what you call "automatic" differentiation? (I know nothing about Python, so please describe the algorithm in mathematical terms; thanks).

A lot of the benefit of symbolic differentiation is in seeing the structure of the derivative. The simplest example I can think of is that the derivative of the exponential is itself. It doesn't look to me like AD would ever tell you that except point-by-point within a certain accuracy. AD is still a numerical technique.

I think that AD would be of more interest as a different numerical approach than as a substitute for symbolic differentiation.

FactChecker said:
A lot of the benefit of symbolic differentiation is in seeing the structure of the derivative. The simplest example I can think of is that the derivative of the exponential is itself.
Not a major point to me, as in all practical cases where I've used symbolic differentiation (like calculating the einstein tensor from the metric in GR), the symbolic representations are a complete mess, even if the package is any good at simplification, but yes I suppose it is a benefit. Basically, AD can effortlessly evaluate differentials to orders that can easily choke any the major computer algebra systems. Way beyond anyone's ability to see structure ;)
FactChecker said:
It doesn't look to me like AD would ever tell you that except point-by-point within a certain accuracy. AD is still a numerical technique.
Yes, reverse mode AD the numerical technique behind tensor flow. I am talking about the benefits of forward mode AD, which course it is a numerical technique, just not a very well known one (as I have learned from past responses to my posts).
FactChecker said:
I think that AD would be of more interest as a different numerical approach than as a substitute for symbolic differentiation.
I am not putting it forward as a replacement for symbolic differentiation, it is an alternative to finite differences (in many circumstances) and even RK4 in most circumstances, but it essentially performs the same (but much simpler combinatorially) calculations as symbolic and is fundamentally of the same accuracy. Hence the comparison.

FactChecker said:
The simplest example I can think of is that the derivative of the exponential is itself. It doesn't look to me like AD would ever tell you that except point-by-point within a certain accuracy.
Thought I would address this one separately. Here are the first twenty nineteen differentials of ##\exp (2.0)##:
Code:
\$ ipython3
Python 3.7.1 (default, Oct 22 2018, 11:21:55)
IPython 7.2.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from ad import *

In [2]: x = Series.get(20, 2).var

In [3]: print(~(x.exp))
+7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00
I think I can see the structure in that ;)

m4r35n357 said:
Thought I would address this one separately. Here are the first twenty nineteen differentials of ##\exp (2.0)##:
Code:
+7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00
I think I can see the structure in that ;)
Yes. Because it is trivial. It might be more difficult to recognize even slightly less trivial examples like ##x^n+x^{n-1}## or ##x*\sin(x)##.

FactChecker said:
Yes. Because it is trivial. It might be more difficult to recognize even slightly less trivial examples like ##x^n+x^{n-1}## or ##x*\sin(x)##.
Agreed, but as I mentioned above the "window" of intelligible output is finite.

There is a subset of use cases for a CAS, where the symbolic output is translated to a more "familiar" language for execution. These are the use cases where my comparison is valid. Not because of the extra time generating the derivatives (this is already paid for), but the time merely to evaluate the more complex expressions. Aside from this, it is not trivial to implement complicated expressions such as these without making errors (testing is essential!).

Unless I am much mistaken, the complexity of a CAS doing symbolic differentiation and evaluation is subject to Francesco Faà di Bruno's formula, whereas the Taylor Series Method is based on the Cauchy product. It certainly feels that way in Wolfram Alpha as the order of differentiation increases ;)

I have never personally needed a higher order of derivative than third. In everything I have been involved in, I needed velocity and acceleration often, but I only needed jerk once or twice, and never (that I can remember) needed higher derivatives (snap, crackle, pop).

EDIT: I should add that I could never use very sophisticated numerical techniques because there were always messy complications (random components, needing to find global minimums rather than local, etc.) that prevented it.

Last edited:
FactChecker said:
I have never personally needed a higher order of derivative than third. In everything I have been involved in, I needed velocity and acceleration often, but I only needed jerk once or twice, and never (that I can remember) needed higher derivatives (snap, crackle, pop).
You bring up an interesting point, so make yourself comfortable ;)

I am assuming that you use or have used RK4 for solving ODEs, which is by definition a finite difference approximation to a fourth order Taylor Series solver (because finding higher order derivatives is supposedly "difficult" or "expensive").

In fact, as I have demonstrated here, finding higher order derivatives is not difficult or expensive at all. The Taylor Series Method (TSM) which is built on the iterative AD functions and operators, is trivial at fourth order (and much higher), and is immune from the compromises and inaccuracies of finite difference.

That is a pretty big use case. Essentially, I would contend that the TSM is superior to RK4 except for functions not covered by a given AD arithmetic, or for tabular functions.

FactChecker
Good point. ODEs are a big use case that I did not encounter. There may have been people where I worked who did a lot of it that I was not aware of. For instance, I don't know what is involved in the aerodynamic CFD calculations.

Last edited:
m4r35n357
FactChecker said:
Good point. ODEs are a big use case that I did not encounter. There may have been people who did a lot of it that I was not aware of. For instance, I don't know what is involved in the aerodynamic CFD calculations.
Thanks for the feedback!

ODEs are what got me involved with this method in the first place, and is the main reason I reverse-engineered the procedure for my own use. Turns out that the best way to verify the low level functions that I needed was to wrap them in Series objects and do those function/derivative plots like in the OP. Once this was done, the interactivity was almost an obvious thing to tidy up. But neither of those is as important to me as the ODE solver!

I wonder if there is any sensible application of this to PDEs, but I don't have experience solving them. I haven't seen anything in the literature.

m4r35n357 said:
That is a pretty big use case. Essentially, I would contend that the TSM is superior to RK4 except for functions not covered by a given AD arithmetic, or for tabular functions.
There are many problems with strong nonlinearities. Just think of a diode for example. All higher order integrations are disadvantaged in cases where the equations and/or the coefficients change dramatically from one time step to the next.

anorlunda said:
There are many problems with strong nonlinearities. Just think of a diode for example. All higher order integrations are disadvantaged in cases where the equations and/or the coefficients change dramatically from one time step to the next.
I'm sure there are degenerate/edge cases, but here is ##|x + 1|##:

Yes there really are 12 derivatives here! Piecewise functions are fine as long as f() is defined at the jump (and derivatives set to zero - see next sentence!). However, this is nothing whatsoever to do with the order of integration as it begins with the first derivative as I mentioned (Euler's method and RK4 would suffer the same fate).

Really?

Thanks, that's clearer.

m4r35n357

## 1. What is the difference between automatic and symbolic differentiation?

Automatic differentiation refers to a set of techniques used to numerically evaluate derivatives of mathematical functions. It follows the chain rule and is typically used in machine learning and optimization algorithms. Symbolic differentiation, on the other hand, involves manipulating mathematical expressions to obtain an exact symbolic formula for the derivative. It is often used in fields such as physics, engineering, and mathematics.

## 2. Which is more accurate, automatic or symbolic differentiation?

Symbolic differentiation is more accurate as it produces an exact symbolic formula for the derivative. Automatic differentiation, on the other hand, involves numerical evaluations and may introduce rounding errors, leading to slightly less accuracy. However, the difference in accuracy may be negligible for many practical applications.

## 3. Which method is faster, automatic or symbolic differentiation?

Automatic differentiation is typically faster than symbolic differentiation. This is because it only requires simple numerical evaluations and does not involve complex mathematical manipulations. Symbolic differentiation, on the other hand, can be computationally expensive for complex functions as it involves manipulating symbolic expressions.

## 4. Can automatic differentiation handle complex functions?

Yes, automatic differentiation can handle complex functions. It follows the chain rule and can handle functions with multiple inputs and outputs. However, for highly complex functions, the accuracy and speed of automatic differentiation may be affected.

## 5. In which applications is symbolic differentiation more commonly used?

Symbolic differentiation is more commonly used in fields such as physics, engineering, and mathematics where exact solutions are important. It is also used in computer algebra systems and software for symbolic computations. Automatic differentiation is more commonly used in machine learning and optimization algorithms, as well as in scientific computing and numerical analysis.

Replies
1
Views
1K
Replies
3
Views
1K
Replies
2
Views
2K
Replies
14
Views
5K
Replies
13
Views
20K
Replies
10
Views
10K
Replies
4
Views
4K
Replies
1
Views
2K
Replies
1
Views
945
Replies
3
Views
2K