What is the Connection Between the Chain Rule and Differentials?

MathStudent
Messages
281
Reaction score
1
Hi,
I've seen a couple of proofs for the chain rule, and I know this probably sounds stupid, but I'm wondering why it can't be proved as follows:

given the real valued functions y=f(u), u=g(x)
since dy, du, dx, are all real valued functions as well
can't you just state:
\frac{dy}{dx}=\frac{dy}{du}\frac{du}{dx}
by the properties of real numbers?

can someone explain why this isn't an acceptable proof?

Also since I'm on the subject of differentials, does anyone know of any good books on the theories of differentials, because I've spent a lot of time thinking about this concept, and it seems to have a different meaning for different applications. I've heard that there are plenty of theories that explain what a differential is, and explains more about it's uses ( beyond the scope of a Calc 1-3 book). Any info would be greatly appreciated.

Thanks in advance!
 
Physics news on Phys.org
by the properties of real numbers?

can someone explain why this isn't an acceptable proof?

And what property would that be? Remember that, for instance, dy is not a real number, and dy/dx is not a ratio of two real numbers.

To make an acceptable proof, you have to apply your idea to the correct definitions of the terms involved.
 
Thanks for your reply hurkyl, but I don't understand why dy is not a real number? from what I know,

dx = \Delta x -is an increment of x which is the difference of two real numbers, which itself is a real number.

and since dy = f'(x)dx and f'(x) evaluated at some x is a real number

so it seems to me that both dx and dy take on real values and thus can be treated as real numbers, and so dy/dx could be treated as the ratio of two real numbers.

Please pardon my ignorance here I realize I must be missing something important.
 
Last edited:
That is certainly the inspiration for differentials, but it's not that easy. And while the notation is such that you can usually manipulate them as if they were real numbers, that's not always the case.

Matt grime likes to give this identity involving three dependent variables:

<br /> \frac{dx}{dy} \frac{dy}{dz} \frac{dz}{dx} = -1<br />

The actual definition of the derivative is:

<br /> \frac{dy}{dx} := \lim_{\Delta x \rightarrow 0} \frac{\Delta y}{\Delta x}<br />

So it's not simply the ratio of two real numbers, but the limit of such ratios. For the proof to be valid, you have to factor in the limit. For the case of the chain rule, it just means that you are cancelling real numbers inside the limit.
 
Hurkyl said:
For the case of the chain rule, it just means that you are cancelling real numbers inside the limit.

Just to clarify, you are stating that the we must cancel real numbers inside the limit before evaluating the limit in order for a proof of the chain rule to be valid?

Also I appologize, I didn't realize that there is already a thread created on this subject which I found to be helpful for anyone else that has similar problems.
https://www.physicsforums.com/showthread.php?t=57419

It seems that there are subjects that go deeper into the theory of differentials and infinitessimals than what can be found in a standard calculus book.
Are there course that go deeper into this theory?

Thanks by the way for all your help!
(I'm very impressedd with this site!)
 
Look closely at your calculus book! The one I am looking at (Calculus by Salas, Hille and Etgen, ninth edition) says "If &Delta;x is small then df is approximately f'&Delta;x". The statement "dx= &Delta;x" is not true: it is an approximation.

\frac{dy}{dx} is approximately equal to \frac{\Delta y}{\Delta x}.
 
you are stating that the we must cancel real numbers inside the limit before evaluating the limit in order for a proof of the chain rule to be valid?

I don't know all possible proofs of the chain rule -- I was just referring to the one I think is most straightforward, and highlighting the key difference between it and your invalid argument.


Your calc 1 book says exactly what df/dx means; you don't need to appeal to anything else.


Differentials are something else (but similar), but you wouldn't use them until you start doing differential geometry, or the like.


Infinitessimals are a different subject entirely. In standard analysis, the only infinitessimal is 0, so it's not a particularly useful concept.
 
Actually you can prove the chain rule by just asserting
\frac{dy}{dt}= \frac{dy}{dx}\frac{dx}{dt}
provided you have defined dy, dx, and dt as "infinitesmals". In order to define infinitesmals themselves, you have to go to "non-standard" analysis which requires sophisticated notions from logic (specifically, the "compactness property", that if every finite subset of a set of axioms has a model, then the entire set of axioms has a model).
 
Almost, but not quite. In nonstandard analysis, dy/dx is defined to be equal to the standard part of &Delta;y/&Delta;x, provided that this exists and is the same for all choices of the infinitessimal &Delta;x.

Taking the standard part of a number means to round to the nearest standard (i.e. real) number.

So, for the proof to be accurate, you need a theorem about how multiplication interacts with the standard part operation -- that std (xy) = (std x)(std y), given the appropriate hypothesis.
 
  • #10
HallsofIvy said:
Look closely at your calculus book! The one I am looking at (Calculus by Salas, Hille and Etgen, ninth edition) says "If Δx is small then df is approximately f'Δx". The statement "dx= Δx" is not true: it is an approximation.

\frac{dy}{dx} is approximately equal to \frac{\Delta y}{\Delta x}.
hmm ...that's interesting
I have never looked in that book, but that is something I have never heard before. Everywhere I've seen has a slightly dissimilar definition.
They let
dx = \Delta x
where the define
dy = f&#039;(x)dx

so if \Delta x is small then
\Delta y is approximately dy
 
  • #11
In general a proof first requires a definition. So it is clearly true that if you define dx to be deltax, and define dy to be f'(x) deltax, then obviously f'(x) = dy/dx.

the modern differential geometry definition of df, for any differentiable function f, is that it is a function on tangent vectors to the real line. i.e. given a point p on the real line, and a tangent vector v at that point, then df(v) = the derivative of f in the direction v. now the standard tangent vector is the unit vector e in the positive x direction, and the derivative of f in that direction is the usual derivative f'(p) = dfp(e).

If v is any tangent vector one can always write it as a scalar multiple of the standard unit vector, v = ce, and then one has dfp(v) = cdf(e) = cf'(x). So df is a linear function on the tangent space at p.

now x is a function on the x axis, namely the identity function, and as such it has a differential dx, whose value at any point p and any vector v, where v = ce, is simply dx(v) = dx(ce) = cdx(e) = c.1 = c. Since a tangent vector v at p is merely the vector from p to p+v, we also call v = delta x. thus in this sense, dx(v) does equal deltax, i.e. it equals the difference v between x and x+v.

now if v = ce, since dfp(v) equals cdfp(e) = cf'(p), and dxp(v) = cdxp(e) = c, it follws that indeed dfp is a function which on every tangent vector at p, equals exactly f'(p) times what dx equals. thus the quotient of the two linear functions, dfp and dxp, is a constant function with value f'(p).

In this sense dfp/dxp = f'(p) as a quotient of linear functions, for all p, and hence df/dx = f' is true as a quotient.

the definition of dx as deltax, while well meaning, is misleading since it should say that for all p, the function dxp on tangent vectors v at p, equals the function deltax,p. namely both of them, acting on the point x+h, yield the number h.


in geometric terms, df is the family of linear functions whose family of graphs is simply the family of tangent lines to the graph of f. thus dx is the family of tangent lines to the graph of y=x, namely the family of lines of slope 1, one copy for each point p on the x axis. thus dividing dfp by dxp, for a given p, means dividing these two linear functions, which amounts to dividing their slopes. this gives f'(p)/1 = f'(p). i.e. the function taking x to f'(p)x divided by the function taking x to x, can be said to equal the constant function taking x to f'(p), i.e. the number f'(p).
 
Last edited:

Similar threads

Replies
2
Views
2K
Replies
5
Views
2K
Replies
4
Views
3K
Replies
9
Views
2K
Replies
5
Views
2K
Replies
10
Views
3K
Replies
22
Views
4K
Replies
2
Views
1K
Back
Top