What is the Connection Between the Chain Rule and Differentials?

MathStudent · Jan 14, 2005

Hi,
I've seen a couple of proofs for the chain rule, and I know this probably sounds stupid, but I'm wondering why it can't be proved as follows:

given the real valued functions y=f(u), u=g(x)
since dy, du, dx, are all real valued functions as well
can't you just state:
\frac{dy}{dx}=\frac{dy}{du}\frac{du}{dx}
by the properties of real numbers?

can someone explain why this isn't an acceptable proof?

Also since I'm on the subject of differentials, does anyone know of any good books on the theories of differentials, because I've spent a lot of time thinking about this concept, and it seems to have a different meaning for different applications. I've heard that there are plenty of theories that explain what a differential is, and explains more about it's uses ( beyond the scope of a Calc 1-3 book). Any info would be greatly appreciated.

Thanks in advance!

Hurkyl · Jan 14, 2005

by the properties of real numbers?

can someone explain why this isn't an acceptable proof?

And what property would that be? Remember that, for instance, dy is not a real number, and dy/dx is not a ratio of two real numbers.

To make an acceptable proof, you have to apply your idea to the correct definitions of the terms involved.

MathStudent · Jan 14, 2005

Thanks for your reply hurkyl, but I don't understand why dy is not a real number? from what I know,

dx = \Delta x -is an increment of x which is the difference of two real numbers, which itself is a real number.

and since dy = f'(x)dx and f'(x) evaluated at some x is a real number

so it seems to me that both dx and dy take on real values and thus can be treated as real numbers, and so dy/dx could be treated as the ratio of two real numbers.

Please pardon my ignorance here I realize I must be missing something important.

Hurkyl · Jan 14, 2005

That is certainly the inspiration for differentials, but it's not that easy. And while the notation is such that you can usually manipulate them as if they were real numbers, that's not always the case.

Matt grime likes to give this identity involving three dependent variables:

<br /> \frac{dx}{dy} \frac{dy}{dz} \frac{dz}{dx} = -1<br />

The actual definition of the derivative is:

<br /> \frac{dy}{dx} := \lim_{\Delta x \rightarrow 0} \frac{\Delta y}{\Delta x}<br />

So it's not simply the ratio of two real numbers, but the limit of such ratios. For the proof to be valid, you have to factor in the limit. For the case of the chain rule, it just means that you are cancelling real numbers inside the limit.

MathStudent · Jan 15, 2005

Hurkyl said:

For the case of the chain rule, it just means that you are cancelling real numbers inside the limit.

Just to clarify, you are stating that the we must cancel real numbers inside the limit before evaluating the limit in order for a proof of the chain rule to be valid?

Also I appologize, I didn't realize that there is already a thread created on this subject which I found to be helpful for anyone else that has similar problems.
https://www.physicsforums.com/showthread.php?t=57419

It seems that there are subjects that go deeper into the theory of differentials and infinitessimals than what can be found in a standard calculus book.
Are there course that go deeper into this theory?

Thanks by the way for all your help!
(I'm very impressedd with this site!)

HallsofIvy · Jan 15, 2005

Look closely at your calculus book! The one I am looking at (Calculus by Salas, Hille and Etgen, ninth edition) says "If Δx is small then df is approximately f'Δx". The statement "dx= Δx" is not true: it is an approximation.

\frac{dy}{dx} is approximately equal to \frac{\Delta y}{\Delta x}.

Hurkyl · Jan 15, 2005

you are stating that the we must cancel real numbers inside the limit before evaluating the limit in order for a proof of the chain rule to be valid?

I don't know all possible proofs of the chain rule -- I was just referring to the one I think is most straightforward, and highlighting the key difference between it and your invalid argument.

Your calc 1 book says exactly what df/dx means; you don't need to appeal to anything else.

Differentials are something else (but similar), but you wouldn't use them until you start doing differential geometry, or the like.

Infinitessimals are a different subject entirely. In standard analysis, the only infinitessimal is 0, so it's not a particularly useful concept.

HallsofIvy · Jan 17, 2005

Actually you can prove the chain rule by just asserting
\frac{dy}{dt}= \frac{dy}{dx}\frac{dx}{dt}
provided you have defined dy, dx, and dt as "infinitesmals". In order to define infinitesmals themselves, you have to go to "non-standard" analysis which requires sophisticated notions from logic (specifically, the "compactness property", that if every finite subset of a set of axioms has a model, then the entire set of axioms has a model).

Hurkyl · Jan 17, 2005

Almost, but not quite. In nonstandard analysis, dy/dx is defined to be equal to the standard part of Δy/Δx, provided that this exists and is the same for all choices of the infinitessimal Δx.

Taking the standard part of a number means to round to the nearest standard (i.e. real) number.

So, for the proof to be accurate, you need a theorem about how multiplication interacts with the standard part operation -- that std (xy) = (std x)(std y), given the appropriate hypothesis.

MathStudent · Jan 21, 2005

HallsofIvy said:

Look closely at your calculus book! The one I am looking at (Calculus by Salas, Hille and Etgen, ninth edition) says "If Δx is small then df is approximately f'Δx". The statement "dx= Δx" is not true: it is an approximation.

\frac{dy}{dx} is approximately equal to \frac{\Delta y}{\Delta x}.

hmm ...that's interesting
I have never looked in that book, but that is something I have never heard before. Everywhere I've seen has a slightly dissimilar definition.
They let
dx = \Delta x
where the define
dy = f'(x)dx

so if \Delta x is small then
\Delta y is approximately dy

mathwonk · Jan 21, 2005

In general a proof first requires a definition. So it is clearly true that if you define dx to be deltax, and define dy to be f'(x) deltax, then obviously f'(x) = dy/dx.

the modern differential geometry definition of df, for any differentiable function f, is that it is a function on tangent vectors to the real line. i.e. given a point p on the real line, and a tangent vector v at that point, then df(v) = the derivative of f in the direction v. now the standard tangent vector is the unit vector e in the positive x direction, and the derivative of f in that direction is the usual derivative f'(p) = dfp(e).

If v is any tangent vector one can always write it as a scalar multiple of the standard unit vector, v = ce, and then one has dfp(v) = cdf(e) = cf'(x). So df is a linear function on the tangent space at p.

now x is a function on the x axis, namely the identity function, and as such it has a differential dx, whose value at any point p and any vector v, where v = ce, is simply dx(v) = dx(ce) = cdx(e) = c.1 = c. Since a tangent vector v at p is merely the vector from p to p+v, we also call v = delta x. thus in this sense, dx(v) does equal deltax, i.e. it equals the difference v between x and x+v.

now if v = ce, since dfp(v) equals cdfp(e) = cf'(p), and dxp(v) = cdxp(e) = c, it follws that indeed dfp is a function which on every tangent vector at p, equals exactly f'(p) times what dx equals. thus the quotient of the two linear functions, dfp and dxp, is a constant function with value f'(p).

In this sense dfp/dxp = f'(p) as a quotient of linear functions, for all p, and hence df/dx = f' is true as a quotient.

the definition of dx as deltax, while well meaning, is misleading since it should say that for all p, the function dxp on tangent vectors v at p, equals the function deltax,p. namely both of them, acting on the point x+h, yield the number h.

in geometric terms, df is the family of linear functions whose family of graphs is simply the family of tangent lines to the graph of f. thus dx is the family of tangent lines to the graph of y=x, namely the family of lines of slope 1, one copy for each point p on the x axis. thus dividing dfp by dxp, for a given p, means dividing these two linear functions, which amounts to dividing their slopes. this gives f'(p)/1 = f'(p). i.e. the function taking x to f'(p)x divided by the function taking x to x, can be said to equal the constant function taking x to f'(p), i.e. the number f'(p).

What is the Connection Between the Chain Rule and Differentials?

Similar threads

Hot Threads

I Algebraic property of real numbers

I Problem in understanding instantaneous velocity

I How to find the path if we only know the velocity (without common formulas)?

I Explicit logical justification for last step in epsilon/delta proof?

A Getting the power spectral density from a plot

Recent Insights

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem