Automatic vs Symbolic differentiation

m4r35n357 · Jul 13, 2019

I thought I would give you guys some justification for why I keep pestering you with this odd numerical technique. I've mentioned its advantages over finite differences quite a few times, so now I am putting it up against symbolic differentiation.

Here is a plot of the function ##x^2 / {\ln(\cosh(x) + 1)}##, together with its first six derivatives, using automatic differentiation (1001 x data points in total). The computations use 236 bit floating point precision courtesy of gmpy2 (MPFR) arbitrary point arithmetic.

Here is an illustration of the CPU time involved in generating the data:

Code:

$ time -p ./models.py 0 -8 8 1001 7 1e-12 1e-12 > /tmp/data 2>/dev/null
real 0.30
user 0.29
sys 0.00

Here is an interactive session evaluating the same function at a single point, ##x = 2##, together with its first twelve derivatives:

Code:

$ ipython3
Python 3.7.1 (default, Oct 22 2018, 11:21:55)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.2.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from ad import *                                                                                                    
ad module loaded

In [2]: x = Series.get(13, 2).var                                                                                            

In [3]: print(~(x * x / (x.cosh + 1).ln))                                                                                    
+2.562938002e+00 +1.312276410e+00 -3.440924889e-01 +2.366685634e-01 +2.260914151e-01 -1.617592557e+00 +5.097273067e+00 -1.157769456e+01 +1.290979888e+01 +5.759642561e+01 -5.363560778e+02 +2.667773442e+03 -8.844486444e+03

Finally, for comparison, here is something to cut & paste into Wolfram Alpha (symbolic computation):

Code:

d^6/dx^6 x^2 / ln(cosh(x) + 1) where x = 2

I couldn't get it to do the twelfth derivative, and even for the sixth it will not attempt to evaluate the value (not unless I register anyway!).

Enjoy!

phyzguy · Jul 13, 2019

Is your point that numerical differentiation is faster than symbolic differentiation? This is probably true if you only want the answer at one point or a small number of points. But the symbolic answer gives you the full function at all possible points. It contains information that is not contained in the numerical values.

Dr.D · Jul 13, 2019

Evidently I am arriving late to the party. What is the algorithm that you use for what you call "automatic" differentiation? (I know nothing about Python, so please describe the algorithm in mathematical terms; thanks).

m4r35n357 · Jul 13, 2019

phyzguy said:

But the symbolic answer gives you the full function at all possible points. It contains information that is not contained in the numerical values.

So does AD; all the derivatives are derived from the function itself, at any point where the function is defined.

m4r35n357 · Jul 13, 2019

Dr.D said:

Evidently I am arriving late to the party. What is the algorithm that you use for what you call "automatic" differentiation? (I know nothing about Python, so please describe the algorithm in mathematical terms; thanks).

Well, the Wikipedia page is not a bad place to start . . .

FactChecker · Jul 13, 2019

A lot of the benefit of symbolic differentiation is in seeing the structure of the derivative. The simplest example I can think of is that the derivative of the exponential is itself. It doesn't look to me like AD would ever tell you that except point-by-point within a certain accuracy. AD is still a numerical technique.

I think that AD would be of more interest as a different numerical approach than as a substitute for symbolic differentiation.

m4r35n357 · Jul 14, 2019

FactChecker said:

A lot of the benefit of symbolic differentiation is in seeing the structure of the derivative. The simplest example I can think of is that the derivative of the exponential is itself.

Not a major point to me, as in all practical cases where I've used symbolic differentiation (like calculating the einstein tensor from the metric in GR), the symbolic representations are a complete mess, even if the package is any good at simplification, but yes I suppose it is a benefit. Basically, AD can effortlessly evaluate differentials to orders that can easily choke any the major computer algebra systems. Way beyond anyone's ability to see structure ;)

FactChecker said:

It doesn't look to me like AD would ever tell you that except point-by-point within a certain accuracy. AD is still a numerical technique.

Yes, reverse mode AD the numerical technique behind tensor flow. I am talking about the benefits of forward mode AD, which course it is a numerical technique, just not a very well known one (as I have learned from past responses to my posts).

FactChecker said:

I think that AD would be of more interest as a different numerical approach than as a substitute for symbolic differentiation.

I am not putting it forward as a replacement for symbolic differentiation, it is an alternative to finite differences (in many circumstances) and even RK4 in most circumstances, but it essentially performs the same (but much simpler combinatorially) calculations as symbolic and is fundamentally of the same accuracy. Hence the comparison.

m4r35n357 · Jul 14, 2019

FactChecker said:

The simplest example I can think of is that the derivative of the exponential is itself. It doesn't look to me like AD would ever tell you that except point-by-point within a certain accuracy.

Thought I would address this one separately. Here are the first ~~twenty~~ nineteen differentials of ##\exp (2.0)##:

Code:

$ ipython3
Python 3.7.1 (default, Oct 22 2018, 11:21:55)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.2.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from ad import *                                                                                                           
ad module loaded

In [2]: x = Series.get(20, 2).var                                                                                                  

In [3]: print(~(x.exp))                                                                                                            
+7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00

I think I can see the structure in that ;)

FactChecker · Jul 14, 2019

m4r35n357 said:
Thought I would address this one separately. Here are the first ~~twenty~~ nineteen differentials of ##\exp (2.0)##:
Code:
+7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00
I think I can see the structure in that ;)

Yes. Because it is trivial. It might be more difficult to recognize even slightly less trivial examples like ##x^n+x^{n-1}## or ##x*\sin(x)##.

m4r35n357 · Jul 14, 2019

FactChecker said:

Yes. Because it is trivial. It might be more difficult to recognize even slightly less trivial examples like ##x^n+x^{n-1}## or ##x*\sin(x)##.

Agreed, but as I mentioned above the "window" of intelligible output is finite.

There is a subset of use cases for a CAS, where the symbolic output is translated to a more "familiar" language for execution. These are the use cases where my comparison is valid. Not because of the extra time generating the derivatives (this is already paid for), but the time merely to evaluate the more complex expressions. Aside from this, it is not trivial to implement complicated expressions such as these without making errors (testing is essential!).

Unless I am much mistaken, the complexity of a CAS doing symbolic differentiation and evaluation is subject to Francesco Faà di Bruno's formula, whereas the Taylor Series Method is based on the Cauchy product. It certainly feels that way in Wolfram Alpha as the order of differentiation increases ;)

FactChecker · Jul 14, 2019

I have never personally needed a higher order of derivative than third. In everything I have been involved in, I needed velocity and acceleration often, but I only needed jerk once or twice, and never (that I can remember) needed higher derivatives (snap, crackle, pop).

EDIT: I should add that I could never use very sophisticated numerical techniques because there were always messy complications (random components, needing to find global minimums rather than local, etc.) that prevented it.

m4r35n357 · Jul 14, 2019

FactChecker said:

I have never personally needed a higher order of derivative than third. In everything I have been involved in, I needed velocity and acceleration often, but I only needed jerk once or twice, and never (that I can remember) needed higher derivatives (snap, crackle, pop).

You bring up an interesting point, so make yourself comfortable ;)

I am assuming that you use or have used RK4 for solving ODEs, which is by definition a finite difference approximation to a fourth order Taylor Series solver (because finding higher order derivatives is supposedly "difficult" or "expensive").

In fact, as I have demonstrated here, finding higher order derivatives is not difficult or expensive at all. The Taylor Series Method (TSM) which is built on the iterative AD functions and operators, is trivial at fourth order (and much higher), and is immune from the compromises and inaccuracies of finite difference.

That is a pretty big use case. Essentially, I would contend that the TSM is superior to RK4 except for functions not covered by a given AD arithmetic, or for tabular functions.

FactChecker · Jul 14, 2019

Good point. ODEs are a big use case that I did not encounter. There may have been people where I worked who did a lot of it that I was not aware of. For instance, I don't know what is involved in the aerodynamic CFD calculations.

m4r35n357 · Jul 14, 2019

FactChecker said:

Good point. ODEs are a big use case that I did not encounter. There may have been people who did a lot of it that I was not aware of. For instance, I don't know what is involved in the aerodynamic CFD calculations.

Thanks for the feedback!

ODEs are what got me involved with this method in the first place, and is the main reason I reverse-engineered the procedure for my own use. Turns out that the best way to verify the low level functions that I needed was to wrap them in Series objects and do those function/derivative plots like in the OP. Once this was done, the interactivity was almost an obvious thing to tidy up. But neither of those is as important to me as the ODE solver!

I wonder if there is any sensible application of this to PDEs, but I don't have experience solving them. I haven't seen anything in the literature.

anorlunda · Jul 14, 2019

m4r35n357 said:

That is a pretty big use case. Essentially, I would contend that the TSM is superior to RK4 except for functions not covered by a given AD arithmetic, or for tabular functions.

There are many problems with strong nonlinearities. Just think of a diode for example. All higher order integrations are disadvantaged in cases where the equations and/or the coefficients change dramatically from one time step to the next.

m4r35n357 · Jul 14, 2019

anorlunda said:

There are many problems with strong nonlinearities. Just think of a diode for example. All higher order integrations are disadvantaged in cases where the equations and/or the coefficients change dramatically from one time step to the next.

I'm sure there are degenerate/edge cases, but here is ##|x + 1|##:

Yes there really are 12 derivatives here! Piecewise functions are fine as long as f() is defined at the jump (and derivatives set to zero - see next sentence!). However, this is nothing whatsoever to do with the order of integration as it begins with the first derivative as I mentioned (Euler's method and RK4 would suffer the same fate).

pbuk · Jul 15, 2019

The plot of the first derivative below...

m4r35n357 said:

View attachment 246559

... doesn't look much like the second plot from Wolfram Alpha here?

m4r35n357 · Jul 15, 2019

Really?

pbuk · Jul 15, 2019

Thanks, that's clearer.

Automatic vs Symbolic differentiation

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

How to increase phone signal strength by lying about it

Who is responsible for the software when AI takes over programming?

Use of AI (ML/DL) in Science

Could the reason why I can't select any kernels in VS Code be this error?

How useful is this if I want to begin programming?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight