# Chain Rule / Differentials

1. Jan 14, 2005

### MathStudent

Hi,
I've seen a couple of proofs for the chain rule, and I know this probably sounds stupid, but I'm wondering why it can't be proved as follows:

given the real valued functions $y=f(u), u=g(x)$
since $dy, du, dx,$ are all real valued functions as well
can't you just state:
$$\frac{dy}{dx}=\frac{dy}{du}\frac{du}{dx}$$
by the properties of real numbers?

can someone explain why this isn't an acceptable proof?

Also since I'm on the subject of differentials, does anyone know of any good books on the thoeries of differentials, because I've spent a lot of time thinking about this concept, and it seems to have a different meaning for different applications. I've heard that there are plenty of theories that explain what a differential is, and explains more about it's uses ( beyond the scope of a Calc 1-3 book). Any info would be greatly appreciated.

2. Jan 14, 2005

### Hurkyl

Staff Emeritus
And what property would that be? Remember that, for instance, dy is not a real number, and dy/dx is not a ratio of two real numbers.

To make an acceptable proof, you have to apply your idea to the correct definitions of the terms involved.

3. Jan 14, 2005

### MathStudent

Thanks for your reply hurkyl, but I don't understand why dy is not a real number? from what I know,

$dx = \Delta x$ -is an increment of x which is the difference of two real numbers, which itself is a real number.

and since $dy = f'(x)dx$ and $f'(x)$ evaluated at some x is a real number

so it seems to me that both dx and dy take on real values and thus can be treated as real numbers, and so dy/dx could be treated as the ratio of two real numbers.

Please pardon my ignorance here I realise I must be missing something important.

Last edited: Jan 14, 2005
4. Jan 14, 2005

### Hurkyl

Staff Emeritus
That is certainly the inspiration for differentials, but it's not that easy. And while the notation is such that you can usually manipulate them as if they were real numbers, that's not always the case.

Matt grime likes to give this identity involving three dependent variables:

$$\frac{dx}{dy} \frac{dy}{dz} \frac{dz}{dx} = -1$$

The actual definition of the derivative is:

$$\frac{dy}{dx} := \lim_{\Delta x \rightarrow 0} \frac{\Delta y}{\Delta x}$$

So it's not simply the ratio of two real numbers, but the limit of such ratios. For the proof to be valid, you have to factor in the limit. For the case of the chain rule, it just means that you are cancelling real numbers inside the limit.

5. Jan 15, 2005

### MathStudent

Just to clarify, you are stating that the we must cancel real numbers inside the limit before evaluating the limit in order for a proof of the chain rule to be valid?

Also I appologize, I didn't realize that there is already a thread created on this subject which I found to be helpful for anyone else that has similiar problems.

It seems that there are subjects that go deeper into the theory of differentials and infinitessimals than what can be found in a standard calculus book.
Are there course that go deeper in to this theory?

Thanks by the way for all your help!!
(I'm very impressedd with this site!)

6. Jan 15, 2005

### HallsofIvy

Staff Emeritus
Look closely at your calculus book! The one I am looking at (Calculus by Salas, Hille and Etgen, ninth edition) says "If &Delta;x is small then df is approximately f'&Delta;x". The statement "dx= &Delta;x" is not true: it is an approximation.

$\frac{dy}{dx}$ is approximately equal to $\frac{\Delta y}{\Delta x}$.

7. Jan 15, 2005

### Hurkyl

Staff Emeritus
I don't know all possible proofs of the chain rule -- I was just referring to the one I think is most straightforward, and highlighting the key difference between it and your invalid argument.

Your calc 1 book says exactly what df/dx means; you don't need to appeal to anything else.

Differentials are something else (but similar), but you wouldn't use them until you start doing differential geometry, or the like.

Infinitessimals are a different subject entirely. In standard analysis, the only infinitessimal is 0, so it's not a particularly useful concept.

8. Jan 17, 2005

### HallsofIvy

Staff Emeritus
Actually you can prove the chain rule by just asserting
$$\frac{dy}{dt}= \frac{dy}{dx}\frac{dx}{dt}$$
provided you have defined dy, dx, and dt as "infinitesmals". In order to define infinitesmals themselves, you have to go to "non-standard" analysis which requires sophisticated notions from logic (specifically, the "compactness property", that if every finite subset of a set of axioms has a model, then the entire set of axioms has a model).

9. Jan 17, 2005

### Hurkyl

Staff Emeritus
Almost, but not quite. In nonstandard analysis, dy/dx is defined to be equal to the standard part of &Delta;y/&Delta;x, provided that this exists and is the same for all choices of the infinitessimal &Delta;x.

Taking the standard part of a number means to round to the nearest standard (i.e. real) number.

So, for the proof to be accurate, you need a theorem about how multiplication interacts with the standard part operation -- that std (xy) = (std x)(std y), given the appropriate hypothesis.

10. Jan 21, 2005

### MathStudent

hmm ...that's interesting
I have never looked in that book, but that is something I have never heard before. Everywhere I've seen has a slightly dissimilar definition.
They let
$$dx = \Delta x$$
where the define
$$dy = f'(x)dx$$

so if $\Delta x$ is small then
$\Delta y$ is approximately dy

11. Jan 21, 2005

### mathwonk

In general a proof first requires a definition. So it is clearly true that if you define dx to be deltax, and define dy to be f'(x) deltax, then obviously f'(x) = dy/dx.

the modern differential geometry definition of df, for any differentiable function f, is that it is a function on tangent vectors to the real line. i.e. given a point p on the real line, and a tangent vector v at that point, then df(v) = the derivative of f in the direction v. now the standard tangent vector is the unit vector e in the positive x direction, and the derivative of f in that direction is the usual derivative f'(p) = dfp(e).

If v is any tangent vector one can always write it as a scalar multiple of the standard unit vector, v = ce, and then one has dfp(v) = cdf(e) = cf'(x). So df is a linear function on the tangent space at p.

now x is a function on the x axis, namely the identity function, and as such it has a differential dx, whose value at any point p and any vector v, where v = ce, is simply dx(v) = dx(ce) = cdx(e) = c.1 = c. Since a tangent vector v at p is merely the vector from p to p+v, we also call v = delta x. thus in this sense, dx(v) does equal deltax, i.e. it equals the difference v between x and x+v.

now if v = ce, since dfp(v) equals cdfp(e) = cf'(p), and dxp(v) = cdxp(e) = c, it follws that indeed dfp is a function which on every tangent vector at p, equals exactly f'(p) times what dx equals. thus the quotient of the two linear functions, dfp and dxp, is a constant function with value f'(p).

In this sense dfp/dxp = f'(p) as a quotient of linear functions, for all p, and hence df/dx = f' is true as a quotient.

the definition of dx as deltax, while well meaning, is misleading since it should say that for all p, the function dxp on tangent vectors v at p, equals the function deltax,p. namely both of them, acting on the point x+h, yield the number h.

in geometric terms, df is the family of linear functions whose family of graphs is simply the family of tangent lines to the graph of f. thus dx is the family of tangent lines to the graph of y=x, namely the family of lines of slope 1, one copy for each point p on the x axis. thus dividing dfp by dxp, for a given p, means dividing these two linear fucntions, which amounts to dividing their slopes. this gives f'(p)/1 = f'(p). i.e. the function taking x to f'(p)x divided by the function taking x to x, can be said to equal the constant function taking x to f'(p), i.e. the number f'(p).

Last edited: Jan 21, 2005