- #1

- 166

- 0

You are using an out of date browser. It may not display this or other websites correctly.

You should upgrade or use an alternative browser.

You should upgrade or use an alternative browser.

- Thread starter dalcde
- Start date

- #1

- 166

- 0

- #2

- 371

- 1

http://en.wikipedia.org/wiki/Chain_rule#Second_proof

is that simpler?

is that simpler?

- #3

I like Serena

Homework Helper

- 6,579

- 177

I like this one.

Say y is a function of x, and x is a function of t.

Then y=y(x(t)) and:

[tex]\frac {dy}{dt} = \frac {dy}{dx} \frac {dx}{dt}[/tex]

That's it. Were done!

The equality follows algebraically.

This is not a proof in the delta-epsilon school of thinking, but with the definition of infinitesimals it amounts to the same thing.

- #4

HallsofIvy

Science Advisor

Homework Helper

- 41,847

- 967

Right. You swept all the dirt under the "definition of infinitesmals" carpet!

- #5

- 79

- 2

We know that

[tex]\frac{dy}{dt} = \frac{dy}{dt} [/tex]

Now

[tex]\frac{dy}{dt} = \frac{dy}{dx} \frac{dx}{dt} [/tex]

Because, we can cancel the two dx`s against each other...

[tex]\frac{dy}{dt} = \frac{dy}{ \rlap{///}(dx) } \frac{ \rlap{///}(dx) }{dt} = \frac{dy}{dt} [/tex]

EDIT: What doesnt PF have the cancel latex package? ..-

- #6

I like Serena

Homework Helper

- 6,579

- 177

Right. You swept all the dirt under the "definition of infinitesmals" carpet!

I like to think of it as intuitive shorthand notation.

Seriously, do you know of an example where a proof based on infinitesimals may be wrong?

- #7

- 1,554

- 15

In single variable calculus things usually work out if you just assume infinitesimals work like ordinary numbers. But you have to be more careful in multivariable calculus. For instance let's say you wanted to find the infinitesimal area element in polar coordinates, which isI like to think of it as intuitive shorthand notation.

Seriously, do you know of an example where a proof based on infinitesimals may be wrong?

[itex]dA=dxdy=d(rcos\theta)d(rsin\theta)=(drcos\theta-rsin\theta d\theta)(drsin\theta+rcos\theta d\theta)[/itex]. If you treat [itex]dr[/itex] and [itex]d\theta[/itex] like ordinary numbers, you will get [itex]dA=\frac{1}{2}sin2\theta(dr^{2}-(rd\theta)^{2})+cos2\theta rdrd\theta[/itex], which is completely wrong. It's only when you remember the fact that [itex]drd\theta=-d\theta dr[/itex] for differential forms (which I've always found really strange) that you get the right answer [itex]dA=rdrd\theta[/itex].

- #8

HallsofIvy

Science Advisor

Homework Helper

- 41,847

- 967

No, I have no problem with a proof based on infinitesmals- it is simply that to use infinitesmals you have to first rigorouslyI like to think of it as intuitive shorthand notation.

Seriously, do you know of an example where a proof based on infinitesimals may be wrong?

- #9

I like Serena

Homework Helper

- 6,579

- 177

In single variable calculus things usually work out if you just assume infinitesimals work like ordinary numbers. But you have to be more careful in multivariable calculus. For instance let's say you wanted to find the infinitesimal area element in polar coordinates, which is

[itex]dA=dxdy=d(rcos\theta)d(rsin\theta)=(drcos\theta-rsin\theta d\theta)(drsin\theta+rcos\theta d\theta)[/itex]. If you treat [itex]dr[/itex] and [itex]d\theta[/itex] like ordinary numbers, you will get [itex]dA=\frac{1}{2}sin2\theta(dr^{2}-(rd\theta)^{2})+cos2\theta rdrd\theta[/itex], which is completely wrong. It's only when you remember the fact that [itex]drd\theta=-d\theta dr[/itex] for differential forms (which I've always found really strange) that you get the right answer [itex]dA=rdrd\theta[/itex].

I'm afraid you're defining two different versions of dA here.

The first expression defines dxdy exactly, but it is not very useful, because your actual coordinates are still x and y.

Your second expression for dA is the surface element as it is in polar coordinates, but it has a different surface area.

I think the ratio between the two is the Jacobian determinant.

Basically your second expression shows how the Jacobian determinant can be calculated in a very intuitive and simple manner (another score for infinitesimals! ).

You should be able to find your minus sign somewhere in the Jacobian determinant.

- #10

- 166

- 0

Thanks! You have helped me a lot. The canceling stuff can be used in my case because I'm doing it with nonstandard calculus!

But is the canceling proof technically correct even in nonstandard calculus?

But is the canceling proof technically correct even in nonstandard calculus?

Last edited:

- #11

Fredrik

Staff Emeritus

Science Advisor

Gold Member

- 10,872

- 418

I haven't looked at the details of this argument and your counterargument, but if you need an example of when cancellation gives you the wrong results, how about this version of the chain rule? [tex]\frac{\partial f}{\partial u}=\frac{\partial f}{\partial x}\frac{\partial x}{\partial u}+\frac{\partial f}{\partial y}\frac{\partial y}{\partial u}[/tex]I'm afraid you're defining two different versions of dA here.

The first expression defines dxdy exactly, but it is not very useful, because your actual coordinates are still x and y.

Your second expression for dA is the surface element as it is in polar coordinates, but it has a different surface area.

I think the ratio between the two is the Jacobian determinant.

Basically your second expression shows how the Jacobian determinant can be calculated in a very intuitive and simple manner (another score for infinitesimals! ).

You should be able to find your minus sign somewhere in the Jacobian determinant.

- #12

Fredrik

Staff Emeritus

Science Advisor

Gold Member

- 10,872

- 418

If we're going to suggest non-rigorous arguments instead of proofs, I suggest the following instead of that stuff about cancellations. Let's use the notation O(h) to mean "anything that goes to zero at least as fast as h" and O(h^{2}) to mean "anything that goes go zero at least as fast as h^{2}" *. A rigorous argument would have to explain exactly what that means, and prove at every step that the definition is satisfied. This argument is non-rigorous because those details are ignored.

[tex] f(g(x+h)) =f\big(g(x)+hg'(x)+O(h^2)\big) =f(g(x))+\big(hg'(x)+O(h^2)\big)f'(g(x))+O(h^2)[/tex] [tex]\frac{f(g(x+h))-f(g(x))}{h} = \frac{\big(hg'(x)+O(h^2)\big)f'(g(x)) +O(h^2)}{h} =f'(g(x))g'(x)+f'(g(x))O(h)+O(h)[/tex] Weird. Some of the primes look really small. You will have to look closely or zoom to see them.

*) Note that O(h) doesn't have to represent the same thing in every place it's used. For example, we have O(h)+h+h^{2}=O(h), even though O(h) obviously doesn't represent the same thing on both sides. Similar comments apply to O(h^{2}).

By the way, one of the problems with the dx, dy arguments is that even when you get the right result, it doesn't tell you at what point in the domain to evaluate the function. For example, [tex]\frac{dy}{dx}=\frac{1}{\frac{dx}{dy}}[/tex] isn't the*wrong* result, but it's certainly less accurate than [tex](f^{-1})'(x)=\frac{1}{f'(f^{-1}(x))}.[/tex]

[tex] f(g(x+h)) =f\big(g(x)+hg'(x)+O(h^2)\big) =f(g(x))+\big(hg'(x)+O(h^2)\big)f'(g(x))+O(h^2)[/tex] [tex]\frac{f(g(x+h))-f(g(x))}{h} = \frac{\big(hg'(x)+O(h^2)\big)f'(g(x)) +O(h^2)}{h} =f'(g(x))g'(x)+f'(g(x))O(h)+O(h)[/tex] Weird. Some of the primes look really small. You will have to look closely or zoom to see them.

*) Note that O(h) doesn't have to represent the same thing in every place it's used. For example, we have O(h)+h+h

By the way, one of the problems with the dx, dy arguments is that even when you get the right result, it doesn't tell you at what point in the domain to evaluate the function. For example, [tex]\frac{dy}{dx}=\frac{1}{\frac{dx}{dy}}[/tex] isn't the

Last edited:

- #13

I like Serena

Homework Helper

- 6,579

- 177

I haven't looked at the details of this argument and your counterargument, but if you need an example of when cancellation gives you the wrong results, how about this version of the chain rule? [tex]\frac{\partial f}{\partial u}=\frac{\partial f}{\partial x}\frac{\partial x}{\partial u}+\frac{\partial f}{\partial y}\frac{\partial y}{\partial u}[/tex]

You're pulling in partial derivatives here, which are not quite infinitesimals.

Btw, I know the formula as

[tex]\frac{df}{du}=\frac{\partial f}{\partial x}\frac{dx}{du}+\frac{\partial f}{\partial y}\frac{dy}{du}[/tex]

which shows the difference between partials and infinitesimals.

Basically this shows a more intuitive notation for multivariate derivatives.

By the way, one of the problems with the dx, dy arguments is that even when you get the right result, it doesn't tell you at what point in the domain to evaluate the function. For example, [tex]\frac{dy}{dx}=\frac{1}{\frac{dx}{dy}}[/tex] isn't thewrongresult, but it's certainly less accurate than [tex](f^{-1})'(x)=\frac{1}{f'(f^{-1}(x))}.[/tex]

Yes, but that is because it is a shorthand notation.

Note that the infinitesimal notation is also the proof, which is not the case with the functional notation.

There's nothing wrong in also using the functional notation if that clarifies something, which it does in this case.

- #14

Fredrik

Staff Emeritus

Science Advisor

Gold Member

- 10,872

- 418

Your notation is appropriate when the left-hand side is the derivative of the function [itex]u\mapsto f(x(u),y(u))[/itex]. Mine is appropriate when the left-hand side is the partial derivative with respect to the first variable of the function [itex](u,v)\mapsto f(x(u,v),y(y,v))[/itex].You're pulling in partial derivatives here, which are not quite infinitesimals.

Btw, I know the formula as

[tex]\frac{df}{du}=\frac{\partial f}{\partial x}\frac{dx}{du}+\frac{\partial f}{\partial y}\frac{dy}{du}[/tex]

which shows the difference between partials and infinitesimals.

Basically this shows a more intuitive notation for multivariate derivatives.

Why are partial derivatives "not quite infinitesimals"? Note for example that [itex]\partial f(x,y)/\partial x[/itex], the partial derivative of f with respect to the first variable, evaluated at (x,y), is equal to the ordinary derivative of the function [itex]x\mapsto f(x,y)[/itex], evaluated at x. Hm, I suppose you could say that even though we can write z=f(x,y) in both cases, dz and [itex]\partial z[/itex] would refer to two different functions. But we're still dealing with a small change in z divided by a small change in x, in both cases.

By the way, the notation I like the best (by far) is [tex](f\circ g)_{,i}(x) =f_{,j}(g(x))g_{j,i}(x).[/tex] (I'm using Einstein's summation convention, so there's a sum over j).

Is it? Maybe it is, but I don't think that can follow from the definition of "infinitesimal". I don't know that definition, but obviously dx and dy need to depend on each other in some way for these calculations to be valid, and I don't think that's going to be a part of the definition. You're going to need some pretty fancy definitions of dx and dy to justify interpreting dy/dx=1/(dx/dy) as a proof of the formula I posted for the derivative of an inverse function.Note that the infinitesimal notation is also the proof, which is not the case with the functional notation.

Last edited:

- #15

- 166

- 0

By the way, the notation I like the best (by far) is [tex](f\circ g)_{,i}(x) =f_{,j}(g(x))g_{j,i}(x).[/tex] (I'm using Einstein's summation convention, so there's a sum over j).

I'm not familiar with Einstein's summation convention, but I think one j should be a superscript and one subscript.

- #16

Fredrik

Staff Emeritus

Science Advisor

Gold Member

- 10,872

- 418

Einstein's summation convention is supposed to be used in the context of differential geometry, where the vertical position of the index informs us what type of tensor we're dealing with. In this context (no tensors involved), there's no harm in putting all the indices downstairs. The convention I'm using here is really just toI'm not familiar with Einstein's summation convention, but I think one j should be a superscript and one subscript.

- #17

I like Serena

Homework Helper

- 6,579

- 177

Your notation is appropriate when the left-hand side is the derivative of the function [itex]u\mapsto f(x(u),y(u))[/itex]. Mine is appropriate when the left-hand side is the partial derivative with respect to the first variable of the function [itex](u,v)\mapsto f(x(u,v),y(y,v))[/itex].

Good point. Didn't think of that.

Why are partial derivatives "not quite infinitesimals"? Note for example that [itex]\partial f(x,y)/\partial x[/itex], the partial derivative of f with respect to the first variable, evaluated at (x,y), is equal to the ordinary derivative of the function [itex]x\mapsto f(x,y)[/itex], evaluated at x. Hm, I suppose you could say that even though we can write z=f(x,y) in both cases, dz and [itex]\partial z[/itex] would refer to two different functions. But we're still dealing with a small change in z divided by a small change in x, in both cases.

You're right of course, I only meant that with partials things become more complicated.

The formula doesn't simply pan out algebraically any more.

By the way, the notation I like the best (by far) is [tex](f\circ g)_{,i}(x) =f_{,j}(g(x))g_{j,i}(x).[/tex] (I'm using Einstein's summation convention, so there's a sum over j).

I don't know this notation (yet).

The wiki page on derivative shows a number of notations, but not this one.

What does it say?

Why is it your preferred notation?

And where can I find more information on it?

Is it? Maybe it is, but I don't think that can follow from the definition of "infinitesimal". I don't know that definition, but obviously dx and dy need to depend on each other in some way for these calculations to be valid, and I don't think that's going to be a part of the definition. You're going to need some pretty fancy definitions of dx and dy to justify interpreting dy/dx=1/(dx/dy) as a proof of the formula I posted for the derivative of an inverse function.

Let's give it a try.

I'm keeping it a bit informal, referring to x and y as scalar values as well as functions.

If necessary I can make it more formal and introduce more symbols, but I only want to know if the reasoning, possibly after some extensions, is valid as a proof.

Let y be an invertible function of x, given by y(x), and let x(y) be its inverse function.

For any points x where the function y is differentiable, and where the inverse function x is differentiable, and both are non-zero, the following holds.

For any 0 < |epsilon|, we can define a dy=epsilon, such that there is a 0 < |dx|, such that the ratio dy/dx is equal to y'(x).

In this case the inverse ratio given by dx/dy is equal to x'(y).

Qed.

Shoot!

- #18

Fredrik

Staff Emeritus

Science Advisor

Gold Member

- 10,872

- 418

The LaTeX looks pretty ugly in my Firefox. I recommend that you right-click and "scale all math" to 110% to view this comfortably.

I write the chain rule for functions from ℝ into ℝ as [tex](f\circ g)'(x)=f'(g(x))g'(x).[/tex] The corresponding rule for the situation when [itex]f:\mathbb R^m\rightarrow\mathbb R[/itex] and [itex]g:\mathbb R^n\rightarrow\mathbb R^m[/itex] can e.g. be written as [tex]\frac{\partial (f\circ g)(x)}{\partial x_i}=\sum_{j=1}^m\frac{\partial f(g(x))}{\partial g_j}\frac{\partial g_j(x)}{\partial x_i},[/tex] where the [itex]g_j[/itex] are defined by [itex]g(x)=(g_1(x),\dots,g_m(x))[/itex]. I really don't like this notation. For example, why is the partial derivative of f with respect to the jth variable denoted by [itex]\partial f/\partial g_j[/itex] all of a sudden. The only answer I can think of is extremely ugly to me: Because we intend to evaluate that function at g(x).

To avoid that ugliness, we can write this version of the chain rule as [tex]D_i(f\circ g)(x)=\sum_{j=1}^m D_j f(g(x)) D_i g_j(x)[/tex] instead. There's nothing wrong with this, but it doesn't look a lot like the single-variable version in the form I like to see it. So let's use the comma notation instead, and while we're at it, let's just drop the summation sigma. This is harmless as long as we can remember that there's always a sum over each index that appears twice. [tex](f\circ g)_{,i}(x)=f_{,j}(g(x))g_{j,i}(x)[/tex] If we interpret the indices as labeling the rows and columns of matrices, then this is the ith component of a matrix equation, so with the appropriate definitions, we can rewrite it as [tex](f\circ g)'(x)=f'(g(x))g'(x),[/tex] which is*exactly* the same as the single-variable version. The reasons why I don't consider this an improvement is that we have to remember those definitions to know what the formula says, and that when we actually use the chain rule, we are going to be working with the components anyway. See e.g. post #5 here. In that post, I'm writing the index that labels the component functions upstairs (i.e. I write [itex]g^i[/itex] instead of [itex]g_i[/itex]). I think that improves the readability a bit, so I should have said that *that's* my favorite notation.

It's just a notation, so it can't be an enormous improvement over any notation that works. All I can tell you is what it means and what I like about it. If f is a function, then [itex]f_{,i}[/itex] denotes its partial derivative with respect to the ith variable. This is an alternative to [itex]D_if[/itex]. I don't like the notation [itex]\partial f/\partial x_i[/itex] because it gives the impression that the variable symbols we're using are somehow relevant, which they're not of course. Note that [itex]f_{,i}[/itex] is a function and [itex]f_{,i}(x)[/itex] it's value at x.I don't know this notation (yet).

The wiki page on derivative shows a number of notations, but not this one.

What does it say?

Why is it your preferred notation?

And where can I find more information on it?

I write the chain rule for functions from ℝ into ℝ as [tex](f\circ g)'(x)=f'(g(x))g'(x).[/tex] The corresponding rule for the situation when [itex]f:\mathbb R^m\rightarrow\mathbb R[/itex] and [itex]g:\mathbb R^n\rightarrow\mathbb R^m[/itex] can e.g. be written as [tex]\frac{\partial (f\circ g)(x)}{\partial x_i}=\sum_{j=1}^m\frac{\partial f(g(x))}{\partial g_j}\frac{\partial g_j(x)}{\partial x_i},[/tex] where the [itex]g_j[/itex] are defined by [itex]g(x)=(g_1(x),\dots,g_m(x))[/itex]. I really don't like this notation. For example, why is the partial derivative of f with respect to the jth variable denoted by [itex]\partial f/\partial g_j[/itex] all of a sudden. The only answer I can think of is extremely ugly to me: Because we intend to evaluate that function at g(x).

To avoid that ugliness, we can write this version of the chain rule as [tex]D_i(f\circ g)(x)=\sum_{j=1}^m D_j f(g(x)) D_i g_j(x)[/tex] instead. There's nothing wrong with this, but it doesn't look a lot like the single-variable version in the form I like to see it. So let's use the comma notation instead, and while we're at it, let's just drop the summation sigma. This is harmless as long as we can remember that there's always a sum over each index that appears twice. [tex](f\circ g)_{,i}(x)=f_{,j}(g(x))g_{j,i}(x)[/tex] If we interpret the indices as labeling the rows and columns of matrices, then this is the ith component of a matrix equation, so with the appropriate definitions, we can rewrite it as [tex](f\circ g)'(x)=f'(g(x))g'(x),[/tex] which is

I don't understand what you're saying. If you meant that for each positive infinitesimal dy, there's a positive infinitesimal dx such that dy/dx=f'(x), then my questions are "what's an infinitesimal?" and "how do you know this?". What you said doesn't answer either of those questions. It also doesn't explain why dx/dy should have anything to do with the derivative of [itex]f^{-1}[/itex].Let's give it a try.

I'm keeping it a bit informal, referring to x and y as scalar values as well as functions.

If necessary I can make it more formal and introduce more symbols, but I only want to know if the reasoning, possibly after some extensions, is valid as a proof.

Let y be an invertible function of x, given by y(x), and let x(y) be its inverse function.

For any points x where the function y is differentiable, and where the inverse function x is differentiable, and both are non-zero, the following holds.

For any 0 < |epsilon|, we can define a dy=epsilon, such that there is a 0 < |dx|, such that the ratio dy/dx is equal to y'(x).

In this case the inverse ratio given by dx/dy is equal to x'(y).

Qed.

Shoot!

Last edited:

- #19

I like Serena

Homework Helper

- 6,579

- 177

[tex]\frac{\partial (f\circ g)(x)}{\partial x_i}=\sum_{j=1}^n\frac{\partial f(g(x))}{\partial g_j}\frac{\partial g_j(x)}{\partial x_i},[/tex] where the [itex]g_j[/itex] are defined by [itex]g(x)=(g_1(x),\dots,g_n(x))[/itex]. I really don't like this notation. For example, why is the partial derivative of f with respect to the jth variable denoted by [itex]\partial f/\partial g_j[/itex] all of a sudden. The only answer I can think of is extremely ugly to me: Because we intend to evaluate that function at g(x).

Wouldn't g usually denote a coordinate transformation?

If that's the case I would prefer to use different symbols.

Say f(x(u)), where x and u denote vectors.

Your formula would become:

[tex]\frac{\partial f(x(u))}{\partial u_i}=\sum_{j=1}^n \frac{\partial f(x(u))}{\partial x_j}\frac{\partial x_j(u)}{\partial u_i}[/tex]

or simply:

[tex]\frac{\partial f}{\partial u_i}=\frac{\partial f}{\partial x_j}\frac{\partial x_j}{\partial u_i}[/tex]

I like this notation, because it shows that you take partial derivatives of f, which must be corrected by multiplying with the appropriate ratio between coordinates.

The use of the symbols x and u instead of g and x is also more intuitive, because using g suggests that g is a function like f, instead of just another set of coordinates.

- #20

Fredrik

Staff Emeritus

Science Advisor

Gold Member

- 10,872

- 418

But gIf that's the case I would prefer to use different symbols.

Say f(x(u)), where x and u denote vectors.

Your formula would become:

[tex]\frac{\partial f(x(u))}{\partial u_i}=\sum_{j=1}^n \frac{\partial f(x(u))}{\partial x_j}\frac{\partial x_j(u)}{\partial u_i}[/tex]

or simply:

[tex]\frac{\partial f}{\partial u_i}=\frac{\partial f}{\partial x_j}\frac{\partial x_j}{\partial u_i}[/tex]

I like this notation, because it shows that you take partial derivatives of f, which must be corrected by multiplying with the appropriate ratio between coordinates.

The use of the symbols x and u instead of g and x is also more intuitive, because using g suggests that g is a function like f, instead of just another set of coordinates.

I also don't like the notation [tex]\frac{\partial f(x(u))}{\partial u_i}[/tex] that you put on the left. I'm sure lots of people use it, but it looks very ugly to me. It looks like a partial derivative of f evaluated at (u(x)), even though it's actually a partial derivative of [itex]f\circ u[/itex] evaluated at x. For some reason, I have less of a problem with it when it's written in the form [tex]\frac{\partial}{\partial u_i}f(x(u)),[/tex] because when I see this expression, I find it easier to tell myself that the [itex]u_i[/itex] in the denominator is there to tell us both that the function we're taking a derivative of is [itex]u_i\mapsto f(x(u))[/itex] rather than any of the other possibilities, and that the derivative is to be evaluated at [itex]u_i[/itex]. For example, I would interpret [tex]\frac{\partial}{\partial y}ax^2y^3[/tex] as [tex](t\mapsto ax^2t^3)'(y).[/tex]

By the way, the sum should go from 1 to m, not 1 to n. I got that wrong in my previous post, and corrected it after you had replied.

Last edited:

- #21

Fredrik

Staff Emeritus

Science Advisor

Gold Member

- 10,872

- 418

Actually, if we're not going to do it rigorously, then we might as well use an even simpler argument. It follows immediately from the definition of the derivative that when h is small, [tex]f(x+h)\approx f(x)+hf'(x).[/tex] Let's just use this formula twice, once on g and then once on f. [tex]f(g(x+h))\approx f\big(g(x)+hg'(x)\big)\approx f(g(x))+hg'(x)f'(g(x))[/tex] This implies that [tex]\begin{align}(f\circ g)'(x) &\approx \frac{f(g(x+h))-f(g(x))}{h}\approx \frac{f(g(x))+hg'(x)f'(g(x))-f(g(x))}{h}\\ &\approx f'(g(x))g'(x).\end{align}[/tex] What's missing here is of course a proof that the error in this approximation really goes to zero when h goes to zero. But this is still a good way to see that the chain rule is "likely" to be true.If we're going to suggest non-rigorous arguments instead of proofs,...

Last edited:

- #22

I like Serena

Homework Helper

- 6,579

- 177

Btw, right now I'm sticking with a function x instead of g, since at least I usually mean a coordinate transformation when I write down something like this.

When the function is

How is:

[tex]\frac{\partial f \circ x}{\partial u_i}(u)=\sum_{j=1}^m \frac{\partial f}{\partial x_j}(x(u))\frac{\partial x_j}{\partial u_i}(u)[/tex]

or:

[tex]\frac{\partial f \circ x}{\partial u_i}=\frac{\partial f}{\partial x_j}\frac{\partial x_j}{\partial u_i}[/tex]

What I dislike about your form, is that it is one long string of symbols with no real visual cues.

What I like about Leibniz's notation is that is shows the ratios of change, which link directly to any drawing that I might make.

(Yes, I like drawings to clarify and understand what's going on. )

- #23

I like Serena

Homework Helper

- 6,579

- 177

Actually, if we're not going to do it rigorously, then we might as well use an even simpler argument. It follows immediately from the definition of the derivative that when h is small, [tex]f(x+h)\approx f(x)+hf'(x).[/tex] Let's just use this formula twice, once on g and then once on f. [tex]f(g(x+h))\approx f\big(g(x)+hg'(x)\big)\approx f(g(x))+hg'(x)f'(g(x))[/tex] This implies that [tex]\begin{align}(f\circ g)'(x) &\approx \frac{f(g(x+h))-f(g(x))}{h}\approx \frac{f(g(x))+hg'(x)f'(g(x))-f(g(x))}{h}\\ &\approx f'(g(x))g'(x).\end{align}[/tex] What's missing here is of course a proof that the error in this approximation really goes to zero when h goes to zero. But this is still a good way to see that the chain rule is "likely" to be true.

Yes, this works too, although I dislike the approximately-symbols.

The use of those symbols makes it specifically non-rigorous.

My "proof" is based on the graphical interpretation of ratios, from which it is immediately evident that the inverse has the ratio inversed.

There is no "approximately" involved, although it is a jump of the mind.

I'm getting the impression that I am more graphically minded, thinking in pictures, and wanting to relate the symbols I write to what I see in my head.

- #24

Fredrik

Staff Emeritus

Science Advisor

Gold Member

- 10,872

- 418

That's what I like the most about it. If you make a non-rigorous argument, you need to make sure that no one will mistake it for an actual proof.Yes, this works too, although I dislike the approximately-symbols.

The use of those symbols makes it specifically non-rigorous.

Ah, yes, this is almost an actual proof of the formula for the derivative of an inverse function. But I don't see an equally convincing argument of that sort for the chain rule.My "proof" is based on the graphical interpretation of ratios, from which it is immediately evident that the inverse has the ratio inversed.

- #25

- 371

- 0

Actually, if we're not going to do it rigorously, then we might as well use an even simpler argument. It follows immediately from the definition of the derivative that when h is small, [tex]f(x+h)\approx f(x)+hf'(x).[/tex]

That doesn't seem to work for all functions. Let f(x)=tan x. Then using that formula, we will get the tangent of pi/2 to be something like 3.8264459099620716 ( here )

Share: