Automatic vs Symbolic differentiation

Click For Summary

Discussion Overview

The discussion centers on the comparison between automatic differentiation (AD) and symbolic differentiation, exploring their respective advantages and limitations in computational contexts. Participants examine the efficiency of numerical techniques versus the completeness of symbolic methods, particularly in relation to derivative calculations.

Discussion Character

  • Debate/contested
  • Technical explanation
  • Mathematical reasoning

Main Points Raised

  • Some participants argue that automatic differentiation is faster than symbolic differentiation when evaluating derivatives at specific points, especially for a limited number of evaluations.
  • Others contend that symbolic differentiation provides a complete function representation, containing information not available through numerical values.
  • A participant inquires about the algorithm behind automatic differentiation, seeking a mathematical description rather than a programming explanation.
  • Some participants suggest that while AD can derive all derivatives from the function itself, it may not reveal structural insights about the derivatives as effectively as symbolic differentiation.
  • There are claims that AD can evaluate higher-order derivatives more efficiently than traditional computer algebra systems, which may struggle with complex expressions.
  • One participant emphasizes that the structure of derivatives, such as the derivative of the exponential function, is more apparent in symbolic differentiation, while AD may only provide pointwise evaluations.
  • Another participant acknowledges that while AD is a numerical technique, it serves as a viable alternative to finite differences rather than a complete replacement for symbolic differentiation.
  • Some participants express that the complexity of symbolic differentiation can lead to cumbersome outputs, which may not always be practical in applications.
  • There is a discussion about the limitations of both methods, particularly regarding the intelligibility of outputs and the potential for errors in implementing complex expressions.

Areas of Agreement / Disagreement

Participants express differing views on the advantages of automatic versus symbolic differentiation, with no consensus reached on which method is superior. The discussion remains unresolved regarding the contexts in which each method is preferable.

Contextual Notes

Participants note that the effectiveness of each differentiation method may depend on specific use cases and the complexity of the functions involved. There are references to unresolved mathematical steps and the potential for errors in complex symbolic expressions.

m4r35n357
Messages
657
Reaction score
148
TL;DR
Deathmatch
I thought I would give you guys some justification for why I keep pestering you with this odd numerical technique. I've mentioned its advantages over finite differences quite a few times, so now I am putting it up against symbolic differentiation.

Here is a plot of the function ##x^2 / {\ln(\cosh(x) + 1)}##, together with its first six derivatives, using automatic differentiation (1001 x data points in total). The computations use 236 bit floating point precision courtesy of gmpy2 (MPFR) arbitrary point arithmetic.
246559

Here is an illustration of the CPU time involved in generating the data:
Code:
$ time -p ./models.py 0 -8 8 1001 7 1e-12 1e-12 > /tmp/data 2>/dev/null
real 0.30
user 0.29
sys 0.00
Here is an interactive session evaluating the same function at a single point, ##x = 2##, together with its first twelve derivatives:
Code:
$ ipython3
Python 3.7.1 (default, Oct 22 2018, 11:21:55)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.2.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from ad import *                                                                                                    
ad module loaded

In [2]: x = Series.get(13, 2).var                                                                                            

In [3]: print(~(x * x / (x.cosh + 1).ln))                                                                                    
+2.562938002e+00 +1.312276410e+00 -3.440924889e-01 +2.366685634e-01 +2.260914151e-01 -1.617592557e+00 +5.097273067e+00 -1.157769456e+01 +1.290979888e+01 +5.759642561e+01 -5.363560778e+02 +2.667773442e+03 -8.844486444e+03
Finally, for comparison, here is something to cut & paste into Wolfram Alpha (symbolic computation):
Code:
d^6/dx^6 x^2 / ln(cosh(x) + 1) where x = 2
I couldn't get it to do the twelfth derivative, and even for the sixth it will not attempt to evaluate the value (not unless I register anyway!).

Enjoy!
 
Last edited:
Technology news on Phys.org
Is your point that numerical differentiation is faster than symbolic differentiation? This is probably true if you only want the answer at one point or a small number of points. But the symbolic answer gives you the full function at all possible points. It contains information that is not contained in the numerical values.
 
Evidently I am arriving late to the party. What is the algorithm that you use for what you call "automatic" differentiation? (I know nothing about Python, so please describe the algorithm in mathematical terms; thanks).
 
phyzguy said:
But the symbolic answer gives you the full function at all possible points. It contains information that is not contained in the numerical values.
So does AD; all the derivatives are derived from the function itself, at any point where the function is defined.
 
Dr.D said:
Evidently I am arriving late to the party. What is the algorithm that you use for what you call "automatic" differentiation? (I know nothing about Python, so please describe the algorithm in mathematical terms; thanks).
Well, the Wikipedia page is not a bad place to start . . .
 
A lot of the benefit of symbolic differentiation is in seeing the structure of the derivative. The simplest example I can think of is that the derivative of the exponential is itself. It doesn't look to me like AD would ever tell you that except point-by-point within a certain accuracy. AD is still a numerical technique.

I think that AD would be of more interest as a different numerical approach than as a substitute for symbolic differentiation.
 
FactChecker said:
A lot of the benefit of symbolic differentiation is in seeing the structure of the derivative. The simplest example I can think of is that the derivative of the exponential is itself.
Not a major point to me, as in all practical cases where I've used symbolic differentiation (like calculating the einstein tensor from the metric in GR), the symbolic representations are a complete mess, even if the package is any good at simplification, but yes I suppose it is a benefit. Basically, AD can effortlessly evaluate differentials to orders that can easily choke any the major computer algebra systems. Way beyond anyone's ability to see structure ;)
FactChecker said:
It doesn't look to me like AD would ever tell you that except point-by-point within a certain accuracy. AD is still a numerical technique.
Yes, reverse mode AD the numerical technique behind tensor flow. I am talking about the benefits of forward mode AD, which course it is a numerical technique, just not a very well known one (as I have learned from past responses to my posts).
FactChecker said:
I think that AD would be of more interest as a different numerical approach than as a substitute for symbolic differentiation.
I am not putting it forward as a replacement for symbolic differentiation, it is an alternative to finite differences (in many circumstances) and even RK4 in most circumstances, but it essentially performs the same (but much simpler combinatorially) calculations as symbolic and is fundamentally of the same accuracy. Hence the comparison.
 
FactChecker said:
The simplest example I can think of is that the derivative of the exponential is itself. It doesn't look to me like AD would ever tell you that except point-by-point within a certain accuracy.
Thought I would address this one separately. Here are the first twenty nineteen differentials of ##\exp (2.0)##:
Code:
$ ipython3
Python 3.7.1 (default, Oct 22 2018, 11:21:55)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.2.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from ad import *                                                                                                           
ad module loaded

In [2]: x = Series.get(20, 2).var                                                                                                  

In [3]: print(~(x.exp))                                                                                                            
+7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00
I think I can see the structure in that ;)
 
m4r35n357 said:
Thought I would address this one separately. Here are the first twenty nineteen differentials of ##\exp (2.0)##:
Code:
+7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00 +7.389056099e+00
I think I can see the structure in that ;)
Yes. Because it is trivial. It might be more difficult to recognize even slightly less trivial examples like ##x^n+x^{n-1}## or ##x*\sin(x)##.
 
  • #10
FactChecker said:
Yes. Because it is trivial. It might be more difficult to recognize even slightly less trivial examples like ##x^n+x^{n-1}## or ##x*\sin(x)##.
Agreed, but as I mentioned above the "window" of intelligible output is finite.

There is a subset of use cases for a CAS, where the symbolic output is translated to a more "familiar" language for execution. These are the use cases where my comparison is valid. Not because of the extra time generating the derivatives (this is already paid for), but the time merely to evaluate the more complex expressions. Aside from this, it is not trivial to implement complicated expressions such as these without making errors (testing is essential!).

Unless I am much mistaken, the complexity of a CAS doing symbolic differentiation and evaluation is subject to Francesco Faà di Bruno's formula, whereas the Taylor Series Method is based on the Cauchy product. It certainly feels that way in Wolfram Alpha as the order of differentiation increases ;)
 
  • #11
I have never personally needed a higher order of derivative than third. In everything I have been involved in, I needed velocity and acceleration often, but I only needed jerk once or twice, and never (that I can remember) needed higher derivatives (snap, crackle, pop).

EDIT: I should add that I could never use very sophisticated numerical techniques because there were always messy complications (random components, needing to find global minimums rather than local, etc.) that prevented it.
 
Last edited:
  • #12
FactChecker said:
I have never personally needed a higher order of derivative than third. In everything I have been involved in, I needed velocity and acceleration often, but I only needed jerk once or twice, and never (that I can remember) needed higher derivatives (snap, crackle, pop).
You bring up an interesting point, so make yourself comfortable ;)

I am assuming that you use or have used RK4 for solving ODEs, which is by definition a finite difference approximation to a fourth order Taylor Series solver (because finding higher order derivatives is supposedly "difficult" or "expensive").

In fact, as I have demonstrated here, finding higher order derivatives is not difficult or expensive at all. The Taylor Series Method (TSM) which is built on the iterative AD functions and operators, is trivial at fourth order (and much higher), and is immune from the compromises and inaccuracies of finite difference.

That is a pretty big use case. Essentially, I would contend that the TSM is superior to RK4 except for functions not covered by a given AD arithmetic, or for tabular functions.
 
  • Like
Likes   Reactions: FactChecker
  • #13
Good point. ODEs are a big use case that I did not encounter. There may have been people where I worked who did a lot of it that I was not aware of. For instance, I don't know what is involved in the aerodynamic CFD calculations.
 
Last edited:
  • Like
Likes   Reactions: m4r35n357
  • #14
FactChecker said:
Good point. ODEs are a big use case that I did not encounter. There may have been people who did a lot of it that I was not aware of. For instance, I don't know what is involved in the aerodynamic CFD calculations.
Thanks for the feedback!

ODEs are what got me involved with this method in the first place, and is the main reason I reverse-engineered the procedure for my own use. Turns out that the best way to verify the low level functions that I needed was to wrap them in Series objects and do those function/derivative plots like in the OP. Once this was done, the interactivity was almost an obvious thing to tidy up. But neither of those is as important to me as the ODE solver!

I wonder if there is any sensible application of this to PDEs, but I don't have experience solving them. I haven't seen anything in the literature.
 
  • #15
m4r35n357 said:
That is a pretty big use case. Essentially, I would contend that the TSM is superior to RK4 except for functions not covered by a given AD arithmetic, or for tabular functions.
There are many problems with strong nonlinearities. Just think of a diode for example. All higher order integrations are disadvantaged in cases where the equations and/or the coefficients change dramatically from one time step to the next.
 
  • #16
anorlunda said:
There are many problems with strong nonlinearities. Just think of a diode for example. All higher order integrations are disadvantaged in cases where the equations and/or the coefficients change dramatically from one time step to the next.
I'm sure there are degenerate/edge cases, but here is ##|x + 1|##:
246623

Yes there really are 12 derivatives here! Piecewise functions are fine as long as f() is defined at the jump (and derivatives set to zero - see next sentence!). However, this is nothing whatsoever to do with the order of integration as it begins with the first derivative as I mentioned (Euler's method and RK4 would suffer the same fate).
 
  • #18
Really?
246664
 
  • #19
Thanks, that's clearer.
 
  • Like
Likes   Reactions: m4r35n357

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 14 ·
Replies
14
Views
7K
  • · Replies 13 ·
Replies
13
Views
21K
  • · Replies 10 ·
Replies
10
Views
11K
  • · Replies 4 ·
Replies
4
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
Replies
3
Views
2K