The Chain Rule, death to anyone that breaks the rule

Cyrus · May 24, 2005

Ok so I am reviewing multivariable now that i have some time; (why is it taking me so long to grasp some of these concepts!?

) anyways, and I am reading the proof of stokes theorem. The book I use is Stewart, but it seems to be ripped off word for word from swokowski, which in turn rippes off S.L salas and Einar Hille, (maybe each new author contiuned the publishing over time?).

Here is what's throwing me a curve, at one point, they show that the curl \vec{F} \cdot d\vec{S} = \int_c \vect{F} \cdot d\vec{r}

They assume that \vec{F}=P\hat{i}+Q\hat{j}+R\hat{k}

I will just put the proof up to avoid confusion when I refer to it:

(1) \int_c \vec{F} \cdot d\vec{r} = \int^b_a (P\frac{dx}{dt} +Q\frac{dy}{dt}+ R\frac{dz}{dt}) dt

(2) ' ' = \int^b_a [P \frac{dx}{dt} +Q \frac{dy}{dt}+ R( \frac{\partial z}{\partial x} \frac{dx}{dt} + \frac{\partial z}{\partial y} \frac{dy}{dt})] dt

(3) ' ' = \int^b_a [(P+R\frac{\partial z}{\partial x})\frac{dx}{dt} + (Q+R\frac{\partial z}{\partial y})\frac{dy}{dt}] dt

(4) ' '=\int_{c1} (P+R\frac{\partial z}{\partial x})dx + (Q+R\frac{\partial z}{\partial y})dy

(5) ' '=\int\int_D[\frac{\partial}{\partial x}(Q+R\frac{\partial z}{\partial y})-\frac{\partial}{\partial y}(P+R\frac{\partial z}{\partial x}]dA

By Green's Theorem. Then using the chain rule again and remembering that P,Q, and R are functions of x,y and z, and that z is a function itself of x and y, we get:

(6) \int_c \vect{F} \cdot d\vec{r} = \int\int_D [( \frac{\partial Q}{\partial x} + \frac{\partial Q}{\partial z} \frac{\partial z}{\partial x}+\frac{\partial R }{\partial x}\frac{\partial z }{\partial y }+ \frac{\partial R }{\partial z }\frac{\partial z }{\partial x }\frac{\partial z }{\partial y }+ R \frac{\partial^2 z }{\partial x \partial y }) -( \frac{\partial P }{\partial y }+ \frac{\partial P }{\partial z } \frac{\partial z }{\partial y }+ \frac{\partial R }{\partial y } \frac{\partial z }{\partial x } + \frac{\partial R }{\partial z } \frac{\partial z }{\partial y } \frac{\partial z }{\partial x }+R \frac{\partial^2 z }{\partial y \partial x })]dA

TOO MUCH TYPING!

For some reason I can't see what I typed?? Aha, "" marks are reserved for something that's what it is. YES! the last line worked finally, aye!

Ok, time to get back on track.

I will walk through each step, 1-6 followed by the last line and describe what they are doing. If I am wrong along the way, let me know. I will also explain what part is throwing me off.

(1) This is just standard notation derived previously when doing the line integral of a vector field. The reason for the dt outside the parthensis is because we have x as a function of t; x=f(t), and when doing a line integral we want to integrate P W.R.T dx, not dx/dt(dx/dt is the speed at which x changes, but we are interested in the change of x only, not the change in its speed), which is why the extra dt outside the parenthesis to cancel out the dt, thus integrating W.R.T dx. I.E dx/dt*dt = dx Similar arguments hold for the dy/dt and dz/dt.

(2) This part is fine too, becuase z is a function of x and y. Furthermore, x and y are functions of t. So when we do the total derivative of z, we get the junk inside the parenthesis. This makes sense, since it was derived earlier in the book. We did the linearization of the TANGENT PLANE and arrived at an equation for dz. We divided this entire equation by dt, and took the limit, and we get the result inside (" ").

(3) Easy, Easy, Easy, just move things around and factor out differentials

(4) Seems like the trick played here is that they UNPARAMETERIZED the function with respect to t to just x and y again.

(5) Now they just apply Green's theorem for a vector function, which they can do because they use the planar curve c1 which lies on the projection plane of the surface on the xy plane. So it only varies in x and y.

Enter Confusion:

Performing the partial derivative is messing me up. Since it is the same for all the partials, let's just deal with the first part in the brackets []( the minus " " terms are the same procedure of differentiation.)

Now we have the partial derivative of Q W.R.T x. Now Q is a function of x AND a function of z. Of course Z is itself a function of x.

But how did they get the part:

\frac{\partial Q}{\partial x} + \frac{\partial Q}{\partial z} \frac{\partial z}{\partial x}.

I understand you have to take the derivative of the x part and the z part. But how did they arrive at this equation, because this equation does not reseble the linearization i.e dz=partialf/partialx(dx)+partialf/partialy(dy)? What proof can I turn to, so that I can say ah yes, this is why you do the derivative this way.

whozum · May 24, 2005

Your missing a couple lines there, #4 and 5.

Galileo · May 25, 2005

It's just an application of the chain rule. See 'The Chain Rule (General version)' on page 793. You might as well consider x and y as functions x(x,y), y(x,y) and use that y is constant w.r.t. x (it doesn't really depend on x). It looks weird, but in this way you can readily use the form presented in the book if it's not immediately obvious:

\frac{\partial}{\partial x} Q(x(x,y),y(x,y),z(x,y))=\frac{\partial Q}{\partial x}\frac{\partial x}{\partial x}+\frac{\partial Q}{\partial y}\frac{\partial y}{\partial x}+\frac{\partial Q}{\partial z}\frac{\partial z}{\partial x}=

\frac{\partial Q}{\partial x}\cdot 1+\frac{\partial Q}{\partial y}\cdot 0+\frac{\partial Q}{\partial z}\frac{\partial z}{\partial x}=\frac{\partial Q}{\partial x}+\frac{\partial Q}{\partial z}\frac{\partial z}{\partial x}

Cyrus · May 25, 2005

Galileo said:

It's just an application of the chain rule. See 'The Chain Rule (General version)' on page 793. You might as well consider x and y as functions x(x,y), y(x,y) and use that y is constant w.r.t. x (it doesn't really depend on x). It looks weird, but in this way you can readily use the form presented in the book if it's not immediately obvious:

\frac{\partial}{\partial x} Q(x(x,y),y(x,y),z(x,y))=\frac{\partial Q}{\partial x}\frac{\partial x}{\partial x}+\frac{\partial Q}{\partial y}\frac{\partial y}{\partial x}+\frac{\partial Q}{\partial z}\frac{\partial z}{\partial x}=

\frac{\partial Q}{\partial x}\cdot 1+\frac{\partial Q}{\partial y}\cdot 0+\frac{\partial Q}{\partial z}\frac{\partial z}{\partial x}=\frac{\partial Q}{\partial x}+\frac{\partial Q}{\partial z}\frac{\partial z}{\partial x}

Funny you should say that, I was thinking about it in terms of what you said last night right before i went to bed but i wasnet sure. The thing that I was not sure about was the second part of the fraction where you have dx/dx, dy/dx and dz/dx. Is the reason you have partial x/ partial x and not dx/dx, is that you made x a function of x and y?

arildno · May 25, 2005

Galileo said:

It's just an application of the chain rule. See 'The Chain Rule (General version)' on page 793. You might as well consider x and y as functions x(x,y), y(x,y) and use that y is constant w.r.t. x (it doesn't really depend on x). It looks weird, but in this way you can readily use the form presented in the book if it's not immediately obvious:

\frac{\partial}{\partial x} Q(x(x,y),y(x,y),z(x,y))=\frac{\partial Q}{\partial x}\frac{\partial x}{\partial x}+\frac{\partial Q}{\partial y}\frac{\partial y}{\partial x}+\frac{\partial Q}{\partial z}\frac{\partial z}{\partial x}=

\frac{\partial Q}{\partial x}\cdot 1+\frac{\partial Q}{\partial y}\cdot 0+\frac{\partial Q}{\partial z}\frac{\partial z}{\partial x}=\frac{\partial Q}{\partial x}+\frac{\partial Q}{\partial z}\frac{\partial z}{\partial x}

The reason why this looks "weird" is the use of sloppy notation, as Galileo is of course, fully aware of.
One might use a pedantic notation here:
1. Let us have a function Q'(x',y',z')
2. Let (x',y',z')\in\mathbb{R}^{3} be related to (x,y)\in\mathbb{R}^{2} as follows:
x'=X(x,y)=x,y'=Y(x,y)=y,z'=Z(x,y)
3. We may now define a function Q(x,y) as follows:
Q(x,y)=Q'(X(x,y),Y(x,y),Z(x,y))
4. We also define: \vec{x}'=(x',y',z'), \vec{X}(x,y)=(X(x,y),Y(x,y),Z(x,y))
5.Thus, we have:
\frac{\partial{Q}}{\partial{x}}=(\frac{\partial{Q'}}{\partial{x'}}\frac{\partial{X}}{\partial{x}}+\frac{\partial{Q'}}{\partial{y'}}\frac{\partial{Y}}{\partial{x}}+\frac{\partial{Q'}}{\partial{z'}}\frac{\partial{Z}}{\partial{x}})\mid_{\vec{x}'=\vec{X}(x,y)}

6. Now, one might ask oneself if pedantic notation is really worthwhile..

Cyrus · May 25, 2005

arildno said:

The reason why this looks "weird" is the use of sloppy notation, as Galileo is of course, fully aware of.
One might use a pedantic notation here:
1. Let us have a function Q'(x',y',z')
2. Let (x',y',z')\in\mathbb{R}^{3} be related to (x,y)\in\mathbb{R}^{2} as follows:
x'=X(x,y)=x,y'=Y(x,y)=y,z'=Z(x,y)
3. We may now define a function Q(x,y) as follows:
Q(x,y)=Q'(X(x,y),Y(x,y),Z(x,y))
4. We also define: \vec{x}'=(x',y',z'), \vec{X}(x,y)=(X(x,y),Y(x,y),Z(x,y))
5.Thus, we have:
\frac{\partial{Q}}{\partial{x}}=(\frac{\partial{Q'}}{\partial{x'}}\frac{\partial{X}}{\partial{x}}+\frac{\partial{Q'}}{\partial{y'}}\frac{\partial{Y}}{\partial{x}}+\frac{\partial{Q'}}{\partial{z'}}\frac{\partial{Z}}{\partial{x}})\mid_{\vec{x}'=\vec{X}(x,y)}

6. Now, one might ask oneself if pedantic notation is really worthwhile..

You are loosing me at line 4, can you clarify that please. Why is x' a vector function, its scalar based on your notation.

arildno · May 25, 2005

No, I just use \vec{x}' to designate a point in \mathbb{R}^{3}
Any such point can be represented with the aid of a vector having three components; the components of the vector \vec{x}' are the scalars x',y',z'.

mathwonk · May 25, 2005

i recommend you write out a proof yourself for integration around a rectangle. it is very easy, following from precisely two facts: FTC and Fubini. I.e. Fubini reduces the two dimensional integral to one dimensional integrals, and then FTC does those.

Then all other cases are obtained from that by the chain rule, which is a separate issue, having nothing to do with Stokes theorem, only with how to generalize it technically.

Cyrus · May 25, 2005

before you had x'=X(x,y), so x' was a scalar function. Now you have x' as a vector. How come you changed it into a vector function. I thought x' is only the scalar component in the x direction.

arildno · May 25, 2005

Is \vec{x}' the same symbol as x'?
Doesn't look so for me, at least.
I could equally well have called my vector \vec{v}'=(x',y',z')
if you are happier with that.

Cyrus · May 25, 2005

arildno said:

Is \vec{x}' the same symbol as x'?
Doesn't look so for me, at least.
I could equally well have called my vector \vec{v}'=(x',y',z')
if you are happier with that.

Only if I squint real hard.

I see what your saying now. It looks as if \vec{x'} and \vec{X'} are the same thing. Why was it necessary to use both of them in the proof? Oh, and i think \vec{s'} would have been a little bit clearer. Calling the position vector by x confused me with the x component of a vector.

arildno · May 25, 2005

With \vec{x}', I designate a POINT (element) in \matbb{R}^{3}

With (x,y) (or, if you like, \vec{x} (no apostroph/hyphen here!)), I designate a POINT (element) in \mathbb{R}^{2}

The statement \vec{x}'=\vec{X}'(x,y) says that we have a mapping \vec{X}':\mathbb{R}^{2}\to\mathbb{R}^{3} i.e, a point (x,y) in the plane (the input value) is mapped onto a point (x',y',z') (the output value) in \mathbb{R}^{3} in accordance with some rule.

Cyrus · May 25, 2005

I see. :-)

Cyrus · May 25, 2005

x'=X(x,y)=x,y'=Y(x,y)=y,z'=Z(x,y)=z Here you are reusing the values of x, and y when you call x'=X(x,y)=x. I guess the thing here is to remember that the x=X(x,y) is not the same as the variable x inside the parthensis. It has a double meaning that i should be VERY carefull about.

arildno · May 25, 2005

Our "machine", or, function, X has input (x,y), and spits out the x-value of (x,y) as the output (i.e, X can be regarded as the operation of projecting (x,y) onto the x-axis.)
That is, we have specified a rule so that given input, we may calculate output.

This output is then what x' is set equal to.

Cyrus · May 25, 2005

Right, just to be clear, the x's I was reffering to are: X(x,y), the x inside the (x,y) and the x in X(x,y)=x on the right side of the equal sign. Here we are reusing the variable x for two different purposes. I dident mean the capital X. (sorry if I was not clear on that.)

arildno · May 25, 2005

It was I who wasn't clear about it from the start (sorry about that), but I think you've got it.

mathwonk · May 25, 2005

this stuff is trivial, or ought to be if done right. Stewart et al are just clogging it up with notation.

Look, if W is a one form, (i.e. something like Pdx + Q dy), then the "curl" is just dW = dP^dx + dQ^dy which is expanded out by the rule that dP = dP/dx dx + dP/dy dy, and dx^dx = 0 = dy^dy, and dy^dx = -dx^dy, as usual. so dW = (dQ/dx - dP/dy)dx^dy.

Then greens theorem, which is repeated integration plus FTC, says the integral of W over the boundary of the rectangle equals the integral of dW over the rectangle.

then the so called "Stokes theorem" just says this remains true for surfaces parametrized by a rectangle. that's all. and this is just the chain rule.

I.e. let f(s,t) = (x(s,t), y(s,t), z(s,t)), be a map from the rectangle R in the s,t plane into x,y,z, space.

Then by definition, the boundary of f(R), is f of the boundary of R. and if W is a one form in x,y,z space, then curl of the pullback f*W, of W to s,t space, is the pullback of the curl of W. (this is the chain rule). i.e. f*(dW) = d(f*W)

Hence the integral of W over the boundary of f(R), equals the integral of f*W over the boundary of R, which by Green equals the integral of d(f*W) over R, which equals the inetgral of f*(dW) over R, which by definition equals the integral of d(W) over f(R).

If we write dot product for integral and b for boundary, we get just this:

<W,b(f(R))> = <W,f(bR)> = <f*W, b(R)> = <d(f*W),R> = <f*(dW),R> = <dW,f(R)>. done.

I.e. the first equation is true because f(b(R)) = b(f(R)). the second by definition of how to integrate over a parametrized curve, the third is true by greens theorem, the fourth is true by the chain rule which implies that f*(dW) = d(f*W), and the last is true by definition.

thats it.

Cyrus · May 25, 2005

Ok, so far so good now I hope. Can you explain this part to me though.

We have:

\partial Q = \frac{\partial Q'}{\partial x'} \partial x + \frac{\partial Q'}{\partial y'} \partial y.

Lets just assume right now that I am working in two dimensions so this resembles the linearization of the tanget plane approximation better. Then the equation I wrote is very simliary; however, we don't have dx anymore, we have \partial x. But when we did the linearization dx with the tangent plane approximation, we used dx. Why is it not written in terms of dx?

I.e. not written as:

\partial Q = \frac{\partial Q'}{\partial x'} dx + \frac{\partial Q'}{\partial y'} dy.

mathwonk · May 25, 2005

if you mean me, i do not know how to write curly d's so i use square d's for everything. but as you suggest it should be dQ = (curlydQ)/(curlydx) dx + (curlydQ)/(curlydy) dy.

Cyrus · May 25, 2005

mathwonk said:

if you mean me, i do not know how to write curly d's so i use square d's for everything. but as you suggest it should be dQ = (curlydQ)/(curlydx) dx + (curlydQ)/(curlydy) dy.

No sorry I was not referring to your post.

arildno · May 25, 2005

You should study carefully what mathwonk has provided you with, I'll look in upon your question afterwards.

Cyrus · May 25, 2005

ok, Ill look at it now

Cyrus · May 25, 2005

Mathwonk, can you change your post so that I can tell the difference between the partials and the d's. I am reading through but I don't know when and where you mean dx or [\partial x]. Thanks

mathwonk · May 25, 2005

doesn't matter, only one possibility is possible in each case. i.e. if f(x,y) is a function of two variables, then df/dx obviously means partial wrt x.

Cyrus · May 25, 2005

Ok, let's go back a few steps perhaps. Could you please show me a simple proof of the GENERAL version of the chain rule. I am having trouble seeing how that is derived. If you can help me with that, then it will be a great help.

mathwonk · May 25, 2005

you do not need this to follow the proof of the stokes theorem. all you need is this:

definition: if (x,y,z) = f(s,t) = (x(s,t), y(s,t),z(s,t)).

and if W = Adx + Bdy +Cdz, then
f*W = A dx/ds ds + A dx/dt dt + B dy/ds ds + B dy/dt dt +C dz/ds ds + C dz/dt dt/

= [A dx/ds + Bdy/ds +Cdz/ds] ds + [Adx/dt +Bdy/dt +Cdz/dt] dt

= Pds +Qdt.

Then d(f*W) = [dQ/dx - dP/dy] dx^dy.

and dW = [dB/dx-dA/dy] dx^dy + [dC/dy-dB/dz]dy^dz + [dA/dz -dC/dx]dz^dx.

Then check that f*(dW) = d(f*W).

this is the only missing step in the proof of the stokes theorem.

the ca=hain rule itslef is much ahrder to prove. this is just a tedious lengthy but mechanical calculation. the chain rule requires an idea to prove it.

i know how, but it will not shed any light on the current question.

mathwonk · May 25, 2005

actually that's too complicated. just assume W = Adx. then the same proof will work for the general W.

mathwonk · May 26, 2005

OK we've at least identified the place in the argument where the messy chain rule calculua tion occurs: in showing that f*(dW) = d(f*W).

I) Let's do this for a trivially simple case: W = dx.

Then if f(s,t) = (x(s,t),y(s,t),z(s,t)), then f*dx just emans composing x with f:

i.e. then f*dx = ?x/?s ds + ?x/?t dt, (where the curly d's are coming out as question marks for some reason.)

then d(f*dx) = d(?x/?s)^ds + d(?x/?t)^dt

= [(?^2x/?s^2) ds + (?^2x/?s?t) dt]^ds + [(?^2x/?t?s) ds + (?^2x/?t^2) dt]^dt

= (since ds^ds = 0 = dt^dt), (?^2x/?t?s - ?^2x/?s?t) ds^dt = 0, by equality of mixed partials. so there is something else going on here besides the chain rule.

now that we see d(f*dx) = 0 we claim also that f*(d(dx)) = 0, but that is immediate because d(dx) is already zero, i.e. this equals d(1)^dx = (0)^dx = 0, since 1 is the coefficient function of dx.

II) Now let's try pulling back a function A(x,y,z) by f, and differentiating. it is longer but still mindless calculation.

i.e. f*A = A(x(s,t),y(s,t),z(s,t)), so df*A = ?A/?s ds + ?A/?t dt.

where by the chain rule this time:
?A/?s = (?A/?x) (?x/?s) + (?A/?y) (?y/?s) + (?A/?z) (?z/?s), etc..for t.

now compute in the other order: i.e. first take
dA = (?A/?x) dx+ (?A/?y) dy + (?A/?z) dz, and then take f* of that.

i.e. substitute in dx = ?x/?s ds + ?x/?t dt, etc...for y,z,

this gives ultimately,
f*(dA) = (?A/?x) [ ?x/?s ds + ?x/?t dt] + (?A/?y) [ ?y/?s ds + ?y/?t dt]+..etc for z

= [(?A/?x) (?x/?s) + (?A/?y) (?y/?s) + (?A/?z) (?z/?s)] ds +...etc for t,

= ?A/?s ds + ?A/?t dt = d(f*A), as claimed.

III) Now we can do it for Adx, i.e. then d(Adx) = dA^dx, so

f*(d(Adx)) = f*(dA)^f*(dx) = (by previous argument) d(f*A)^f*dx

= d(f*A)^f*dx + f*A ^ d(f*dx)) (since we know the second term is zero from above,

= (by leibniz rule for d) d(f*(A)^f*dx) = d(f*(Adx)). done.

IV) Then we are done for any W = Adx + Bdy + Cdz, by linearity of f* and d.

Well checking all this is tedious, but mindless, and can at least be separated out from the rest of the argument.

i.e. the whole thing boils down to just f*(dW) = d(f*W), which is the chain rule plus the equailty of mixed partials.

The advantage of this point of view is that you only need to prove this relation once, and then can use it in calculations of many types in many settings, instead of embedding it in a proof of stokes theorem.

I.e. you should realize what it is you are proving so you can use it again.

mathwonk · May 26, 2005

could we change the title of this thread to make it a little less apocalyptic?

Cyrus · May 26, 2005

Hey arildno, I think I am getting some where can you please help me out.

Lets start with the linearization of a function of three variables,

w=b(x,y,z), and x=f(s,t) y=g(s,t) z=h(s,t)

Then the Linearization approximation to the tanget surface is defined as:

\Delta w = \frac{\partial w}{\partial x} \Delta x + \frac{\partial w}{\partial y} \Delta y + \frac{\partial w}{\partial z} \Delta z

if we divide this by delta s and take the limit we get:

\lim_{\Delta s \rightarrow 0} \frac{\Delta w}{\Delta s} = \frac{\partial w}{\partial x} \lim_{\Delta s \rightarrow 0} \frac{\Delta x}{\Delta s} + \frac{\partial w}{\partial y} \lim_{\Delta s \rightarrow 0} \frac{\Delta y}{\Delta s} + \frac{\partial w}{\partial z} \lim_{\Delta s \rightarrow 0} \frac{\Delta z }{\Delta s}

which in turn becomes:

\frac{\partial w}{\partial s}= \frac{\partial w}{\partial x} \frac{\partial x}{\partial s} + \frac{\partial w}{\partial y} \frac{\partial y}{\partial s} + \frac{\partial w}{\partial z} \frac{\partial z}{\partial s}

Now for what you stated, we change W into Q',
x=f(s,t) y=g(s,t) z=h(s,t)

changes to

x=X(s,t) y=Y(s,t) z=Z(s,t)

and (s,t) changes to

(x,y)

and pluging these changes back into the final equation yeilds:

\frac{\partial Q'}{\partial x}= \frac{\partial Q'}{\partial X} \frac{\partial X}{\partial x} + \frac{\partial Q'}{\partial Y} \frac{\partial Y}{\partial x} + \frac{\partial Q'}{\partial Z} \frac{\partial Z}{\partial x}

Ok, so at least I am one step closer to your final equation, but how do I go from this equation to your eqauation (5) in your origional post?

P.S. I am sorry about the title MathWonk, It won't let me edit it anymore! Boo. Also, I hope you don't get the impression that I am not listening to your help Mathwonk, I am going to look at your alternative method. I just want to tackel my weakness with the chain rule first. I don't want to jump into your method, because it just means I am putting off knowing the chain rule very well, which won't help me out in the long run.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*********************************************************
EDIT! WHOOPS! \frac{\partial Q'} {\partial X} has no meaning! X is a function of either x, or y! D'OUGH!

Let me repaste the correct text below and ignore the one above!

and pluging these changes back into the final equation yeilds:

\frac{\partial Q'}{\partial x}= \frac{\partial Q'}{\partial x} \frac{\partial X}{\partial x} + \frac{\partial Q'}{\partial y} \frac{\partial Y}{\partial x} + \frac{\partial Q'}{\partial z} \frac{\partial Z}{\partial x}

Ok, so at least I am one step closer to your final equation, but how do I go from this equation to your eqauation (5) in your origional post?

P.S. I am sorry about the title MathWonk, It won't let me edit it anymore! Boo. Also, I hope you don't get the impression that I am not listening to your help Mathwonk, I am going to look at your alternative method. I just want to tackel my weakness with the chain rule first. I don't want to jump into your method, because it just means I am putting off knowing the chain rule very well, which won't help me out in the long run.

Now were in VERY close agreement, except you have partial of Q and i have partial of Q' in the left hand side. ?? HMMMMM...
**********************************************************~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

EDIT NUMBER TWO! S*%%#. LOL. AHHHHHHHHHHHHHH Now I see why your use of the variable x'. I needs to be there to distinguish it from the x. Thus the problem with my dQ'/dx which has dx on the bottom, which is NOT the same dx of dQ'/dx on the first fraction on the right hand side of the equals sign. This is because we are overuisng the variable x. It should be more like this,

x=X(x,y)=x' y=Y(x,y)=y' and z'=Z(x,y)=z

so that we may write the equation as follows:

\frac{\partial Q'}{\partial x}= \frac{\partial Q'}{\partial x'} \frac{\partial X}{\partial x} + \frac{\partial Q'}{\partial y'} \frac{\partial Y}{\partial x} + \frac{\partial Q'}{\partial z'} \frac{\partial Z}{\partial x}

mathwonk · May 26, 2005

you want to know the proof of the chain rule/ look: definition of the derivative of f at a, is it is a linear function L such that L(v) is tangent to f(a+v)-f(a) at v= 0.

i.e. the difference quotient [f(a+v) - f(a) - L(v)]/|v| approaches zero as v does.

call a function o(v) such that o(v)/|v| goes to zero as v does "little oh", and write it o(v).

A function such that the quotient O(v)/|v| is bounded as v approaches zero, "big oh" and write it as O(v).

Then basic ruelks are these: linear combinations of O's are also O, and also for o's, and compositions of o's and O's are always o if even one "factor" is o. and a product of two O's is o.

Then the chain rule is as follows;

assume L is the derivative of f and M is the derivative of g, then

f(a+v)) - f(a) - L(v) = o(v), so f(a+v) - f(a) = L(v) + o(v).

Hence M(f(a+v) - f(a)) =M(L(v)) + M(o(v)).

since f(a+v) = f(a) + [f(a+v)-f(a)], hence we have

g(f(a+v)) - g(f(a)) -M([f(a+v)-f(a)]) = o(f(a+v)- f(a)) = o(O(v)) = o(v).

bu also M(f(a+v) - f(a)) =M(L(v)) + M(o(v)), from above,

so g(f(a+v)) - g(f(a)) -M([f(a+v)-f(a)])

= g(f(a+v)) - g(f(a)) - M(L(v)) + M(o(v)) = o(v).

hence g(f(a+v)) - g(f(a)) - M(L(v)) = -M(o(v)) + o(v) = o(v) + o(v) = o(v).

hence by definition, the derivative of g(f) at a is M(L).

i.e. the derivative of a composition is the composition, as linear maps, of the derivatives. hence as matrices it is dot product as you are computing above:

ie. dw/ds = (dw/dx,dw/dy/dw/dz).(dx/dt, dy/dt, dz/dt), and so on...

arildno · May 26, 2005

Apart from that I would say:

Q(x,y)=Q'(X(x,y),Y(x,y),Z(x,y))
with derivative relation I've posted earlier.
The function on your right-hand side, Q' has three arguments, whereas the function on your left-hand side, Q, has two.

The fact that in our particular case we've related (x',y',z') to (x,y) doesn't change the functional form of Q'.

Cyrus · May 26, 2005

So is what I did ok, do I simply have to replace \frac{\partial Q'}{\partial x} with \frac{\partial Q}{\partial x} , which both mean the same thing, they are equivalent?

I don't understand what you mean by 'functional form of Q'. Sorry.

Or is there problems in my proof?

arildno · May 26, 2005

Remember that the function Q' has basically the arguments x', y', z'; it is only when we let x',y',z' be functions of x,y themselves that we can say that Q' is a function of x and y.

This is expressed best by defining a NEW function Q(x,y) which equals Q' whenever x', y', z' are functions of x,y.

Cyrus · May 26, 2005

Ok gotcha! STUPID ME, I looked at a second calc book I have, Swokowksi, which explains this VERY problem! Oh well, I wouldent have REALLY understood it if I just read it. It was worth the three days of pitiful thinking to come up with such a stupidly simple anwser. One problem was that I was trying to take the limit using the differential equation i.e with dz and not delta z! And that made me say to myself, what the hell is a limit of dz/dx, and how does that turn into partial z /partial x! Aye, stupid mistake.

Ok, so now things are looking ALOT better. The only thing now is to show how THIS final equation which we both, hopefully agree on, is equal to

\frac{\partial Q}{\partial x}+ \frac{\partial Q}{\partial z} \frac{\partial z}{\partial z}

Cyrus · May 26, 2005

Ahhhhhhhhh yes, the only thing left to do now is make x'=X(x,y)=x equal to just the variable x. and like wise to y. Then partial x' / partial x is just x'=X(x,y) = x =x (these two x's have different meanings. One is the for the x term in the Q function Q(x,y,z) and the other is the x inside the parthensis (x,y). Confusing ant it :-). but this just means the partial of x' with respect to x, which is 1! YIPPIE! Also, y'=Y(x,y)=y=y so partial y'/ partial x is zerooooooooo double YIPPIE!. So now we get galileo's anwser via a much more rigerously correct proof. Finally, this Chain rule stuff makes sense, whew, I guess that means I get to live for another day!

mathwonk · May 26, 2005

try and get on the freeway, and off the feeder roads, at some point, and understand what i am telling you.

Cyrus · May 26, 2005

I tried to follow your proof of the chain rule but don't understand some of the things you presented. What is this little o big O deal? Does it have anything to do with the epsilon, or is it another concept? Sorry, but I have to plea ignorant here.

P.S. Now that I have figured out how to prove this using the more confusing and tedious way, and I am sure your way is much better, I wanted to make sure I fully understand BOTH ways. So now I will go back and read YOUR posts. Please don't feel that I have not appreciated your help you have provided even though I haven't looked through it yet.

Oh, and to arildno; did I do the right thing by making x'=x, and y'=y and z'=f(x,y), so that the partials work out, because both you and galileo stated it as x(x,y), but I don't see how you can reduce x(x,y) so that partial x / partial x is equal to one, unless x(x,y) is only a function of the inner x only.

arildno · May 27, 2005

cyrusabdollahi said:

Ahhhhhhhhh yes, the only thing left to do now is make x'=X(x,y)=x equal to just the variable x. and like wise to y. Then partial x' / partial x is just x'=X(x,y) = x =x (these two x's have different meanings. One is the for the x term in the Q function Q(x,y,z) and the other is the x inside the parthensis (x,y). Confusing ant it :-). but this just means the partial of x' with respect to x, which is 1! YIPPIE! Also, y'=Y(x,y)=y=y so partial y'/ partial x is zerooooooooo double YIPPIE!. So now we get galileo's anwser via a much more rigerously correct proof. Finally, this Chain rule stuff makes sense, whew, I guess that means I get to live for another day!

You've got it.

Cyrus · May 27, 2005

Thanks arildno, your a life saver, and you saved me twice already. Once here, and once before in surface integrals!

Now for another side question.

The definition of lineraization for a curve is based on a visual proof. Similarly, the definition of lineraization of a surface is based on a visual proof using the tangent plane. But for higher dimensions, like in our case, the linearization is a (tangent surface?). But there is seemingly NO way to visualize how this formula is derived. Do we just define it to be written the way it is (using our knowledge of symmetry of going from a curve to a surface, we extend that symmetry in hope that it still holds true for higher n-dimensional spaces.)?

mathwonk · May 27, 2005

a derivative is a linear map that approximates a given non linear map locally at some point. the whole point in defining a derivative is to say precisely how good an approximation the linear map must be to the given non linear map, in order to be considered its derivative. in the usual one variable case we say the derivative of f at a equals f'(a) provided [f(x)-f(a)]/(x-a) converges to f'(a) as x goes to a.

this treats the derivative f'(a) as a number. of course the derivative is really the linear function f'(a)(x-a) which approximates to the function f(x)-f(a) in the sense that their graphs are tangent to each other at x=a.

so how close is this linear function f'(a)(x-a) to the non linear function f(x)-f(a)?

well, when you subtract them you get an error term, [f(x)-f(a) - f'(a)(x-a)] which not only goes to zero as x does, but even the ratio
[f(x)-f(a) - f'(a)(x-a)] /(x-a) also still goes to zero as x-a does.

We say a function with this property, that it not only goes to zero, but also the ratio after dividing it by |x-a|, goes to zero with x-a, i.e. a function whose graph is tangent to the x-axis at x=a, is "little oh" of x-a.

then the definition of the derivative of f at a, is that it is a linear function of x-a whose graph is tangent to the graph of f(x)-f(a) at x=a, i.e. the graph of their difference is tangent to the x-axis at a, i.e. they differ by a function which is "little oh", i.e. very small.

This definition works in all dimensions and even in infinite dimensions. I.e. a function is little oh if its graph is tangent to the source space axis, i.e. if o(v)/|v| goes to zero as v does.

then a derivative of f at a, is a linear function L(x-a) of x-a such that the difference

f(x)-f(a)-L(x-a), is little oh at x=a,

i.e. a linear function L is the derivative of f at a, if and only if |f(x)-f(a)-L(x-a)|/|x-a| goes to zero as x-a does.

so the visual intuition of linearity in terms of flatness of the graph, is replaced by the algebraic notion of linearity of the function, i.e. L(v+w) = L(v)+L(w), etc...

and the geometric notion of tangency is replaced by the analytic description of tangency of the graphs, i.e. the slope of the distance between the graphs goes to zero, i.e. not only does [f(x)-f(a)] - L(x-a)] approach zero, but the slope of this error term, i.e. {[f(x)-f(a)] - L(x-a)}/|x-a| also goes to zero, as x-a does.

if you think about it this says the graph of the difference looks kind of like an upside down umbrella, and is tangent to the "x axis". at x=a.

so to repeat: the first thing to define is what it means to have derivative zero at a, and we say o(v) has derivative zero (at v=0), iff o(v))/|v| approaches zero as v does.

then we say f has derivative L at a, iff L is a linear function and

f(a+v)-f(a)-L(v) has derivative zero at v=0.

I promise you this is worth learning.

Cyrus · May 28, 2005

Im trying to make sense of your post math wonk. Well take it one line at a time.

well, when you subtract them you get an error term, [f(x)-f(a) - f'(a)(x-a)] which not only goes to zero as x does, but even the ratio
[f(x)-f(a) - f'(a)(x-a)] /(x-a) also still goes to zero as x-a does.

Im not seeing what you mean by it goes to zero as x does, but even as the ratio... Does it not go to zero as x goes to a?, not x goes to zero.

mathwonk · May 28, 2005

right you are. my error.

the whole point is define "tangent to zero."

a function o is "tangent to zero" if its graph is tangent to the graph of the zero function, i.e. the slope uniformly in every direction is zero, i.e. o(v)/|v| goes to zero as v does.

then two functions that both vanish at zero are tangent to each other (at zero) if their difference is tangent to zero.

then f is differentiable at a if and only if f(a+v)-f(a), as a function of v, is tangent to some linear function of v.

i.e. iff there exists some linear function L such the difference

f(a+v)-f(a)-L(v) is tangent to zero,

iff [f(a+v)-f(a)-L(v)]/|v| goes to zero as v does.

this is the universally agreed upon correct definition of a derivative, in use at least since G Hardy's "Pure Mathematics" in 1910 or so.

It is this definition that makes the proof of the chain rule most natural, in all dimensions at once, as I have outlined above.

I learned it from Lynn Loomis, see his Advanced Calculus, joint with Shlomo Sternberg, or Jean Dieudonne's Foundations of modern Analysis.

Cyrus · May 29, 2005

[f(x)-f(a) - f'(a)(x-a)] /(x-a) also still goes to zero as x-a does.

When x goes to a, the deminator goes to zero, could to expand on what you mean by that please. Are you assuming this based on the use of L'Hospitals rule, and defining this function such that it will go to zero when L'Hopitals rule is applied?

mathwonk · May 29, 2005

this is just the definition of a derivative, i.e. the usual definition is that

f'(a) is the derivative of f at a if and only if [f(x)-f(a)]/(x-a) approaches f'(a) as x goes to a.

i.e. if and only if [f(x)-f(a)]/(x-a) - f'(a) approaches zero as x goes to a,

if and only if [f(x)-f(a)]/(x-a) - [f'(a)(x-a)/(x-a)] approaches zero as x goes to a,

if and only if [f(x)-f(a) - f'(a)(x-a)]/(x-a)] approaches zero as x goes to a.

Cyrus · May 29, 2005

We say a function with this property, that it not only goes to zero, but also the ratio after dividing it by |x-a|, goes to zero with x-a, i.e. a function whose graph is tangent to the x-axis at x=a, is "little oh" of x-a.

I don't see what you mean by tangent to the x-axis at x=a. It is not tangent to the curve f(x)?

Cyrus · May 30, 2005

Ahhhhhhhhhhh, i did not read your post VERY carefully Mathwonk,

this is just the definition of a derivative, i.e. the usual definition is that

f'(a) is the derivative of f at a if and only if [f(x)-f(a)]/(x-a) approaches f'(a) as x goes to a.

i.e. if and only if [f(x)-f(a)]/(x-a) - f'(a) approaches zero as x goes to a,

if and only if [f(x)-f(a)]/(x-a) - [f'(a)(x-a)/(x-a)] approaches zero as x goes to a,

if and only if [f(x)-f(a) - f'(a)(x-a)]/(x-a)] approaches zero as x goes to a.

I read you loud and clear now on that issue.

mathwonk · May 30, 2005

great! may i say you are one of the most patient and excellent students i have encountered on this thread.

Cyrus · May 30, 2005

We say a function with this property, that it not only goes to zero, but also the ratio after dividing it by |x-a|, goes to zero with x-a, i.e. a function whose graph is tangent to the x-axis at x=a, is "little oh" of x-a.

Could you please help me out on this line. I am not seeing what you mean by tangent to the x-axis at x=a. Also, is little oh of x-a, what your calling the function f(x)?

Also, a stupid question, but ill ask it anyways (I should know this by now). How come you do not run into trouble when divide by (x-a). As x approaches a, you approach division by zero. This is also true in the very definition of a derivative, because your dividing by h, as h approaches zero.

The Chain Rule, death to anyone that breaks the rule

Similar threads

Hot Threads

I Algebraic property of real numbers

I Problem in understanding instantaneous velocity

I How to find the path if we only know the velocity (without common formulas)?

I Harmonic series Ʃ1/n diverges but p-series Ʃ(1/n)^p diverges?

I Explicit logical justification for last step in epsilon/delta proof?

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective