The Chain Rule, death to anyone that breaks the rule

Cyrus · May 24, 2005

Ok so I am reviewing multivariable now that i have some time; (why is it taking me so long to grasp some of these concepts!?

) anyways, and I am reading the proof of stokes theorem. The book I use is Stewart, but it seems to be ripped off word for word from swokowski, which in turn rippes off S.L salas and Einar Hille, (maybe each new author contiuned the publishing over time?).

Here is what's throwing me a curve, at one point, they show that the [tex]curl \vec{F} \cdot d\vec{S} = \int_c \vect{F} \cdot d\vec{r}[/tex]

They assume that [tex]\vec{F}=P\hat{i}+Q\hat{j}+R\hat{k}[/tex]

I will just put the proof up to avoid confusion when I refer to it:

(1) [tex]\int_c \vec{F} \cdot d\vec{r} = \int^b_a (P\frac{dx}{dt} +Q\frac{dy}{dt}+ R\frac{dz}{dt}) dt[/tex]

(2) [tex]' ' = \int^b_a [P \frac{dx}{dt} +Q \frac{dy}{dt}+ R( \frac{\partial z}{\partial x} \frac{dx}{dt} + \frac{\partial z}{\partial y} \frac{dy}{dt})] dt[/tex]

(3) [tex]' ' = \int^b_a [(P+R\frac{\partial z}{\partial x})\frac{dx}{dt} + (Q+R\frac{\partial z}{\partial y})\frac{dy}{dt}] dt[/tex]

(4) [tex]' '=\int_{c1} (P+R\frac{\partial z}{\partial x})dx + (Q+R\frac{\partial z}{\partial y})dy[/tex]

(5) [tex]' '=\int\int_D[\frac{\partial}{\partial x}(Q+R\frac{\partial z}{\partial y})-\frac{\partial}{\partial y}(P+R\frac{\partial z}{\partial x}]dA[/tex]

By Green's Theorem. Then using the chain rule again and remembering that P,Q, and R are functions of x,y and z, and that z is a function itself of x and y, we get:

(6) [tex]\int_c \vect{F} \cdot d\vec{r} = \int\int_D [( \frac{\partial Q}{\partial x} + \frac{\partial Q}{\partial z} \frac{\partial z}{\partial x}+\frac{\partial R }{\partial x}\frac{\partial z }{\partial y }+ \frac{\partial R }{\partial z }\frac{\partial z }{\partial x }\frac{\partial z }{\partial y }+ R \frac{\partial^2 z }{\partial x \partial y }) -( \frac{\partial P }{\partial y }+ \frac{\partial P }{\partial z } \frac{\partial z }{\partial y }+ \frac{\partial R }{\partial y } \frac{\partial z }{\partial x } + \frac{\partial R }{\partial z } \frac{\partial z }{\partial y } \frac{\partial z }{\partial x }+R \frac{\partial^2 z }{\partial y \partial x })]dA[/tex]

TOO MUCH TYPING!

For some reason I can't see what I typed?? Aha, "" marks are reserved for something that's what it is. YES! the last line worked finally, aye!

Ok, time to get back on track.

I will walk through each step, 1-6 followed by the last line and describe what they are doing. If I am wrong along the way, let me know. I will also explain what part is throwing me off.

(1) This is just standard notation derived previously when doing the line integral of a vector field. The reason for the dt outside the parthensis is because we have x as a function of t; x=f(t), and when doing a line integral we want to integrate P W.R.T dx, not dx/dt(dx/dt is the speed at which x changes, but we are interested in the change of x only, not the change in its speed), which is why the extra dt outside the parenthesis to cancel out the dt, thus integrating W.R.T dx. I.E dx/dt*dt = dx Similar arguments hold for the dy/dt and dz/dt.

(2) This part is fine too, because z is a function of x and y. Furthermore, x and y are functions of t. So when we do the total derivative of z, we get the junk inside the parenthesis. This makes sense, since it was derived earlier in the book. We did the linearization of the TANGENT PLANE and arrived at an equation for dz. We divided this entire equation by dt, and took the limit, and we get the result inside (" ").

(3) Easy, Easy, Easy, just move things around and factor out differentials

(4) Seems like the trick played here is that they UNPARAMETERIZED the function with respect to t to just x and y again.

(5) Now they just apply Green's theorem for a vector function, which they can do because they use the planar curve c1 which lies on the projection plane of the surface on the xy plane. So it only varies in x and y.

Enter Confusion:

Performing the partial derivative is messing me up. Since it is the same for all the partials, let's just deal with the first part in the brackets []( the minus " " terms are the same procedure of differentiation.)

Now we have the partial derivative of Q W.R.T x. Now Q is a function of x AND a function of z. Of course Z is itself a function of x.

But how did they get the part:

[tex]\frac{\partial Q}{\partial x} + \frac{\partial Q}{\partial z} \frac{\partial z}{\partial x}[/tex].

I understand you have to take the derivative of the x part and the z part. But how did they arrive at this equation, because this equation does not reseble the linearization i.e dz=partialf/partialx(dx)+partialf/partialy(dy)? What proof can I turn to, so that I can say ah yes, this is why you do the derivative this way.

whozum · May 24, 2005

Your missing a couple lines there, #4 and 5.

Galileo · May 25, 2005

It's just an application of the chain rule. See 'The Chain Rule (General version)' on page 793. You might as well consider x and y as functions x(x,y), y(x,y) and use that y is constant w.r.t. x (it doesn't really depend on x). It looks weird, but in this way you can readily use the form presented in the book if it's not immediately obvious:

[tex]\frac{\partial}{\partial x} Q(x(x,y),y(x,y),z(x,y))=\frac{\partial Q}{\partial x}\frac{\partial x}{\partial x}+\frac{\partial Q}{\partial y}\frac{\partial y}{\partial x}+\frac{\partial Q}{\partial z}\frac{\partial z}{\partial x}=[/tex]

[tex]\frac{\partial Q}{\partial x}\cdot 1+\frac{\partial Q}{\partial y}\cdot 0+\frac{\partial Q}{\partial z}\frac{\partial z}{\partial x}=\frac{\partial Q}{\partial x}+\frac{\partial Q}{\partial z}\frac{\partial z}{\partial x}[/tex]

Cyrus · May 25, 2005

Galileo said:

It's just an application of the chain rule. See 'The Chain Rule (General version)' on page 793. You might as well consider x and y as functions x(x,y), y(x,y) and use that y is constant w.r.t. x (it doesn't really depend on x). It looks weird, but in this way you can readily use the form presented in the book if it's not immediately obvious:

[tex]\frac{\partial}{\partial x} Q(x(x,y),y(x,y),z(x,y))=\frac{\partial Q}{\partial x}\frac{\partial x}{\partial x}+\frac{\partial Q}{\partial y}\frac{\partial y}{\partial x}+\frac{\partial Q}{\partial z}\frac{\partial z}{\partial x}=[/tex]

[tex]\frac{\partial Q}{\partial x}\cdot 1+\frac{\partial Q}{\partial y}\cdot 0+\frac{\partial Q}{\partial z}\frac{\partial z}{\partial x}=\frac{\partial Q}{\partial x}+\frac{\partial Q}{\partial z}\frac{\partial z}{\partial x}[/tex]

Funny you should say that, I was thinking about it in terms of what you said last night right before i went to bed but i wasnet sure. The thing that I was not sure about was the second part of the fraction where you have dx/dx, dy/dx and dz/dx. Is the reason you have partial x/ partial x and not dx/dx, is that you made x a function of x and y?

arildno · May 25, 2005

Galileo said:

It's just an application of the chain rule. See 'The Chain Rule (General version)' on page 793. You might as well consider x and y as functions x(x,y), y(x,y) and use that y is constant w.r.t. x (it doesn't really depend on x). It looks weird, but in this way you can readily use the form presented in the book if it's not immediately obvious:

[tex]\frac{\partial}{\partial x} Q(x(x,y),y(x,y),z(x,y))=\frac{\partial Q}{\partial x}\frac{\partial x}{\partial x}+\frac{\partial Q}{\partial y}\frac{\partial y}{\partial x}+\frac{\partial Q}{\partial z}\frac{\partial z}{\partial x}=[/tex]

[tex]\frac{\partial Q}{\partial x}\cdot 1+\frac{\partial Q}{\partial y}\cdot 0+\frac{\partial Q}{\partial z}\frac{\partial z}{\partial x}=\frac{\partial Q}{\partial x}+\frac{\partial Q}{\partial z}\frac{\partial z}{\partial x}[/tex]

The reason why this looks "weird" is the use of sloppy notation, as Galileo is of course, fully aware of.
One might use a pedantic notation here:
1. Let us have a function [tex]Q'(x',y',z')[/tex]
2. Let [tex](x',y',z')\in\mathbb{R}^{3}[/tex] be related to [tex](x,y)\in\mathbb{R}^{2}[/tex] as follows:
x'=X(x,y)=x,y'=Y(x,y)=y,z'=Z(x,y)
3. We may now define a function Q(x,y) as follows:
[tex]Q(x,y)=Q'(X(x,y),Y(x,y),Z(x,y))[/tex]
4. We also define: [tex]\vec{x}'=(x',y',z'), \vec{X}(x,y)=(X(x,y),Y(x,y),Z(x,y))[/tex]
5.Thus, we have:
[tex]\frac{\partial{Q}}{\partial{x}}=(\frac{\partial{Q'}}{\partial{x'}}\frac{\partial{X}}{\partial{x}}+\frac{\partial{Q'}}{\partial{y'}}\frac{\partial{Y}}{\partial{x}}+\frac{\partial{Q'}}{\partial{z'}}\frac{\partial{Z}}{\partial{x}})\mid_{\vec{x}'=\vec{X}(x,y)}[/tex]

6. Now, one might ask oneself if pedantic notation is really worthwhile..

Cyrus · May 25, 2005

arildno said:

The reason why this looks "weird" is the use of sloppy notation, as Galileo is of course, fully aware of.
One might use a pedantic notation here:
1. Let us have a function [tex]Q'(x',y',z')[/tex]
2. Let [tex](x',y',z')\in\mathbb{R}^{3}[/tex] be related to [tex](x,y)\in\mathbb{R}^{2}[/tex] as follows:
x'=X(x,y)=x,y'=Y(x,y)=y,z'=Z(x,y)
3. We may now define a function Q(x,y) as follows:
[tex]Q(x,y)=Q'(X(x,y),Y(x,y),Z(x,y))[/tex]
4. We also define: [tex]\vec{x}'=(x',y',z'), \vec{X}(x,y)=(X(x,y),Y(x,y),Z(x,y))[/tex]
5.Thus, we have:
[tex]\frac{\partial{Q}}{\partial{x}}=(\frac{\partial{Q'}}{\partial{x'}}\frac{\partial{X}}{\partial{x}}+\frac{\partial{Q'}}{\partial{y'}}\frac{\partial{Y}}{\partial{x}}+\frac{\partial{Q'}}{\partial{z'}}\frac{\partial{Z}}{\partial{x}})\mid_{\vec{x}'=\vec{X}(x,y)}[/tex]

6. Now, one might ask oneself if pedantic notation is really worthwhile..

You are loosing me at line 4, can you clarify that please. Why is x' a vector function, its scalar based on your notation.

arildno · May 25, 2005

No, I just use [tex]\vec{x}'[/tex] to designate a point in [tex]\mathbb{R}^{3}[/tex]
Any such point can be represented with the aid of a vector having three components; the components of the vector [tex]\vec{x}'[/tex] are the scalars x',y',z'.

mathwonk · May 25, 2005

i recommend you write out a proof yourself for integration around a rectangle. it is very easy, following from precisely two facts: FTC and Fubini. I.e. Fubini reduces the two dimensional integral to one dimensional integrals, and then FTC does those.

Then all other cases are obtained from that by the chain rule, which is a separate issue, having nothing to do with Stokes theorem, only with how to generalize it technically.

Cyrus · May 25, 2005

before you had x'=X(x,y), so x' was a scalar function. Now you have x' as a vector. How come you changed it into a vector function. I thought x' is only the scalar component in the x direction.

arildno · May 25, 2005

Is [tex]\vec{x}'[/tex] the same symbol as x'?
Doesn't look so for me, at least.
I could equally well have called my vector [tex]\vec{v}'=(x',y',z')[/tex]
if you are happier with that.

Cyrus · May 25, 2005

arildno said:

Is [tex]\vec{x}'[/tex] the same symbol as x'?
Doesn't look so for me, at least.
I could equally well have called my vector [tex]\vec{v}'=(x',y',z')[/tex]
if you are happier with that.

Only if I squint real hard.

I see what your saying now. It looks as if [tex]\vec{x'}[/tex] and [tex]\vec{X'}[/tex] are the same thing. Why was it necessary to use both of them in the proof? Oh, and i think [tex]\vec{s'}[/tex] would have been a little bit clearer. Calling the position vector by x confused me with the x component of a vector.

arildno · May 25, 2005

With [tex]\vec{x}'[/tex], I designate a POINT (element) in [tex]\matbb{R}^{3}[/tex]

With (x,y) (or, if you like, [tex]\vec{x}[/tex] (no apostroph/hyphen here!)), I designate a POINT (element) in [tex]\mathbb{R}^{2}[/tex]

The statement [tex]\vec{x}'=\vec{X}'(x,y)[/tex] says that we have a mapping [tex]\vec{X}':\mathbb{R}^{2}\to\mathbb{R}^{3}[/tex] i.e, a point (x,y) in the plane (the input value) is mapped onto a point (x',y',z') (the output value) in [tex]\mathbb{R}^{3}[/tex] in accordance with some rule.

Cyrus · May 25, 2005

I see. :-)

Cyrus · May 25, 2005

x'=X(x,y)=x,y'=Y(x,y)=y,z'=Z(x,y)=z Here you are reusing the values of x, and y when you call x'=X(x,y)=x. I guess the thing here is to remember that the x=X(x,y) is not the same as the variable x inside the parthensis. It has a double meaning that i should be VERY carefull about.

arildno · May 25, 2005

Our "machine", or, function, X has input (x,y), and spits out the x-value of (x,y) as the output (i.e, X can be regarded as the operation of projecting (x,y) onto the x-axis.)
That is, we have specified a rule so that given input, we may calculate output.

This output is then what x' is set equal to.

Cyrus · May 25, 2005

Right, just to be clear, the x's I was referring to are: X(x,y), the x inside the (x,y) and the x in X(x,y)=x on the right side of the equal sign. Here we are reusing the variable x for two different purposes. I dident mean the capital X. (sorry if I was not clear on that.)

arildno · May 25, 2005

It was I who wasn't clear about it from the start (sorry about that), but I think you've got it.

mathwonk · May 25, 2005

this stuff is trivial, or ought to be if done right. Stewart et al are just clogging it up with notation.

Look, if W is a one form, (i.e. something like Pdx + Q dy), then the "curl" is just dW = dP^dx + dQ^dy which is expanded out by the rule that dP = dP/dx dx + dP/dy dy, and dx^dx = 0 = dy^dy, and dy^dx = -dx^dy, as usual. so dW = (dQ/dx - dP/dy)dx^dy.

Then greens theorem, which is repeated integration plus FTC, says the integral of W over the boundary of the rectangle equals the integral of dW over the rectangle.

then the so called "Stokes theorem" just says this remains true for surfaces parametrized by a rectangle. that's all. and this is just the chain rule.

I.e. let f(s,t) = (x(s,t), y(s,t), z(s,t)), be a map from the rectangle R in the s,t plane into x,y,z, space.

Then by definition, the boundary of f(R), is f of the boundary of R. and if W is a one form in x,y,z space, then curl of the pullback f*W, of W to s,t space, is the pullback of the curl of W. (this is the chain rule). i.e. f*(dW) = d(f*W)

Hence the integral of W over the boundary of f(R), equals the integral of f*W over the boundary of R, which by Green equals the integral of d(f*W) over R, which equals the inetgral of f*(dW) over R, which by definition equals the integral of d(W) over f(R).

If we write dot product for integral and b for boundary, we get just this:

<W,b(f(R))> = <W,f(bR)> = <f*W, b(R)> = <d(f*W),R> = <f*(dW),R> = <dW,f(R)>. done.

I.e. the first equation is true because f(b(R)) = b(f(R)). the second by definition of how to integrate over a parametrized curve, the third is true by greens theorem, the fourth is true by the chain rule which implies that f*(dW) = d(f*W), and the last is true by definition.

thats it.

Cyrus · May 25, 2005

Ok, so far so good now I hope. Can you explain this part to me though.

We have:

[tex]\partial Q = \frac{\partial Q'}{\partial x'} \partial x + \frac{\partial Q'}{\partial y'} \partial y[/tex].

Lets just assume right now that I am working in two dimensions so this resembles the linearization of the tanget plane approximation better. Then the equation I wrote is very simliary; however, we don't have dx anymore, we have [tex]\partial x[/tex]. But when we did the linearization dx with the tangent plane approximation, we used dx. Why is it not written in terms of dx?

I.e. not written as:

[tex]\partial Q = \frac{\partial Q'}{\partial x'} dx + \frac{\partial Q'}{\partial y'} dy[/tex].

mathwonk · May 25, 2005

if you mean me, i do not know how to write curly d's so i use square d's for everything. but as you suggest it should be dQ = (curlydQ)/(curlydx) dx + (curlydQ)/(curlydy) dy.

Cyrus · May 25, 2005

mathwonk said:

if you mean me, i do not know how to write curly d's so i use square d's for everything. but as you suggest it should be dQ = (curlydQ)/(curlydx) dx + (curlydQ)/(curlydy) dy.

No sorry I was not referring to your post.

arildno · May 25, 2005

You should study carefully what mathwonk has provided you with, I'll look in upon your question afterwards.

Cyrus · May 25, 2005

ok, Ill look at it now

Cyrus · May 25, 2005

Mathwonk, can you change your post so that I can tell the difference between the partials and the d's. I am reading through but I don't know when and where you mean dx or [\partial x]. Thanks

mathwonk · May 25, 2005

doesn't matter, only one possibility is possible in each case. i.e. if f(x,y) is a function of two variables, then df/dx obviously means partial wrt x.

Cyrus · May 25, 2005

Ok, let's go back a few steps perhaps. Could you please show me a simple proof of the GENERAL version of the chain rule. I am having trouble seeing how that is derived. If you can help me with that, then it will be a great help.

mathwonk · May 25, 2005

you do not need this to follow the proof of the stokes theorem. all you need is this:

definition: if (x,y,z) = f(s,t) = (x(s,t), y(s,t),z(s,t)).

and if W = Adx + Bdy +Cdz, then
f*W = A dx/ds ds + A dx/dt dt + B dy/ds ds + B dy/dt dt +C dz/ds ds + C dz/dt dt/

= [A dx/ds + Bdy/ds +Cdz/ds] ds + [Adx/dt +Bdy/dt +Cdz/dt] dt

= Pds +Qdt.

Then d(f*W) = [dQ/dx - dP/dy] dx^dy.

and dW = [dB/dx-dA/dy] dx^dy + [dC/dy-dB/dz]dy^dz + [dA/dz -dC/dx]dz^dx.

Then check that f*(dW) = d(f*W).

this is the only missing step in the proof of the stokes theorem.

the ca=hain rule itslef is much ahrder to prove. this is just a tedious lengthy but mechanical calculation. the chain rule requires an idea to prove it.

i know how, but it will not shed any light on the current question.

mathwonk · May 25, 2005

actually that's too complicated. just assume W = Adx. then the same proof will work for the general W.

mathwonk · May 26, 2005

OK we've at least identified the place in the argument where the messy chain rule calculua tion occurs: in showing that f*(dW) = d(f*W).

I) Let's do this for a trivially simple case: W = dx.

Then if f(s,t) = (x(s,t),y(s,t),z(s,t)), then f*dx just emans composing x with f:

i.e. then f*dx = ?x/?s ds + ?x/?t dt, (where the curly d's are coming out as question marks for some reason.)

then d(f*dx) = d(?x/?s)^ds + d(?x/?t)^dt

= [(?^2x/?s^2) ds + (?^2x/?s?t) dt]^ds + [(?^2x/?t?s) ds + (?^2x/?t^2) dt]^dt

= (since ds^ds = 0 = dt^dt), (?^2x/?t?s - ?^2x/?s?t) ds^dt = 0, by equality of mixed partials. so there is something else going on here besides the chain rule.

now that we see d(f*dx) = 0 we claim also that f*(d(dx)) = 0, but that is immediate because d(dx) is already zero, i.e. this equals d(1)^dx = (0)^dx = 0, since 1 is the coefficient function of dx.

II) Now let's try pulling back a function A(x,y,z) by f, and differentiating. it is longer but still mindless calculation.

i.e. f*A = A(x(s,t),y(s,t),z(s,t)), so df*A = ?A/?s ds + ?A/?t dt.

where by the chain rule this time:
?A/?s = (?A/?x) (?x/?s) + (?A/?y) (?y/?s) + (?A/?z) (?z/?s), etc..for t.

now compute in the other order: i.e. first take
dA = (?A/?x) dx+ (?A/?y) dy + (?A/?z) dz, and then take f* of that.

i.e. substitute in dx = ?x/?s ds + ?x/?t dt, etc...for y,z,

this gives ultimately,
f*(dA) = (?A/?x) [ ?x/?s ds + ?x/?t dt] + (?A/?y) [ ?y/?s ds + ?y/?t dt]+..etc for z

= [(?A/?x) (?x/?s) + (?A/?y) (?y/?s) + (?A/?z) (?z/?s)] ds +...etc for t,

= ?A/?s ds + ?A/?t dt = d(f*A), as claimed.

III) Now we can do it for Adx, i.e. then d(Adx) = dA^dx, so

f*(d(Adx)) = f*(dA)^f*(dx) = (by previous argument) d(f*A)^f*dx

= d(f*A)^f*dx + f*A ^ d(f*dx)) (since we know the second term is zero from above,

= (by leibniz rule for d) d(f*(A)^f*dx) = d(f*(Adx)). done.

IV) Then we are done for any W = Adx + Bdy + Cdz, by linearity of f* and d.

Well checking all this is tedious, but mindless, and can at least be separated out from the rest of the argument.

i.e. the whole thing boils down to just f*(dW) = d(f*W), which is the chain rule plus the equailty of mixed partials.

The advantage of this point of view is that you only need to prove this relation once, and then can use it in calculations of many types in many settings, instead of embedding it in a proof of stokes theorem.

I.e. you should realize what it is you are proving so you can use it again.

mathwonk · May 26, 2005

could we change the title of this thread to make it a little less apocalyptic?

The Chain Rule, death to anyone that breaks the rule

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Finding the minimum distance between two curves

Undergrad Why ##a^0=1##?

High School Straightforward integration…

High School Arc Length for Hyperbolic Sin

Undergrad Ambiguity of the term "indefinite integral"

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect