Exploring the Multivariable Chain Rule Proof

In summary: In the language of differential geometry, T is the best linear approximation of f at x_0.)Note that in this context, the derivative of f at x_0 is exactly T, as the equation above shows.
  • #1
sponsoredwalk
533
5
I'm looking at the proof of the multivariable chain rule & just a little bit curious about something.
In the single variable chain rule proof the way I know it is that you take the derivative:

[itex] f'(x) \ = \ \lim_{ \Delta x \to \infty} \frac{ \Delta y}{ \Delta x} [/itex]

and manipulate it as follows:

[itex] f'(x) \ - \ \lim_{ \Delta x \to \infty} \frac{ \Delta y}{ \Delta x} \ = \ 0 [/itex]

[itex] f'(x) \ - \ \frac{ \Delta y}{ \Delta x} \ = \ \epsilon (x) [/itex]

[itex] \Delta y \ = \ f'(x) \Delta x \ + \ \epsilon (x) \Delta x [/itex]

and you work off that function to prove the single variable version.
The multivariable version uses a function:

[itex] \Delta z \ = \ f_x(x,y) \Delta x \ + \ f_y(x,y) \Delta y \ + \ \epsilon_1 (x) \Delta x \ + \ \epsilon_2 (x) \Delta y [/itex]

which I can see is analogous to the single variable version but having
trouble deriving to be honest. But assuming that I'm okay with this
function I wonder about the proof.

The special case is just do divide by Δt & take the limit:

[itex] \frac{dz}{dt} \ = \ \lim_{ \Delta t \to \infty} \frac{ \Delta z}{ \Delta t} \ = \ \lim_{ \Delta t \to \infty} \ [ \ f_x(x,y) \frac{ \Delta x}{ \Delta t} \ + \ f_y(x,y) \frac{ \Delta y}{ \Delta t} \ + \ \epsilon_1 (x) \frac{ \Delta x}{ \Delta t} \ + \ \epsilon_2 (x) \frac{ \Delta y}{ \Delta t} \ ] \ = \ \ f_x(x,y) \frac{ dx}{ dt} \ + \ f_y(x,y) \frac{ d y}{ dt} [/itex]

and if f(x,y) has both x & y as functions of two variables
z = f(x,y) = f [ x(s,t),y(s,t) ]
then you follow the exact same idea if you're taking the partial w.r.t.
to s or t.

The general chain rule would just be a natural extension of this right? i.e.

z = f(x₁,x₂,...,xᵢ) = f [ x₁(t₁,t₂,...,tᵢ),x₂(t₁,t₂,...,tᵢ),...,xᵢ(t₁,t₂,...,tᵢ) ]

and the partial w.r.t. to tᵥ is the exact same idea:[itex] \frac{\partial z}{\partial t_v} \ = \ f_{x_1}[x_2(t_1,t_2,...,t_v,...,t_i),x_2(t_1,t_2,...,t_v,...,t_i),...] \ \frac{dx_1}{dt_v} \ + \ f_{x_2}[x_2(t_1,t_2,...,t_v,...,t_i),x_2(t_1,t_2,...,t_v,...,t_i),...] \ \frac{dx_2}{dt_v} \ + \ ... \ + \ f_{x_i}[x_1(t_1,t_2,...,t_v,...,t_i),x_2(t_1,t_2,...,t_v,...,t_i),...] \ \frac{dx_i}{dt_v}[/itex]

obviously the notation can be shortened :blushing: but that's it right?

---------------------------------------------------------------------------

Assuming that proof to be correct I'm wondering about the function[itex] \Delta z \ = \ f_x(x,y) \Delta x \ + \ f_y(x,y) \Delta y \ + \ \epsilon_1 (x) \Delta x \ + \ \epsilon_2 (x) \Delta y [/itex]

I mean rather than just saying it's analogous in different dimensions
shouldn't there be a way to derive it from the very similar arguments
involving tangent planes?

Start with the vector equation N • (X - X₀) = 0 to derive the plane.
N•(X - X₀) = 0
(A,B,C)•[(x - x₀),(y - y₀),(z - z₀)] = 0
A(x - x₀) + B(y - y₀) + C(z - z₀) = 0
z - z₀ = (-A/C)(x - x₀) + (-B/C)(y - y₀)
f(x,y) = f(x₀,y₀) + (-A/C)(x - x₀) + (-B/C)(y - y₀)
f(x,y) = f(x₀,y₀) + (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)

Now, I understand that this is the description of the tangent plane that
intersects the point f(x₀,y₀) & can be used to approximate a function for
all x close to f(x₀,y₀)
f(x,y) ≈ f(x₀,y₀) + (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
I say this to make sure I have the correct understanding, when I derived
f(x,y) = f(x₀,y₀) + (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
above I was deriving a linear tangent plane equation but for any function
at the point f(x₀,y₀) we can use this equation to find the tangent plane
intersecting the point f(x₀,y₀) and we can also linearly approximate any
function for all x,y close to f(x₀,y₀) just like the single variable tangent line.

It is the extra terms of taylor's formula that turn
f(x,y) ≈ f(x₀,y₀) + ... into f(x,y) = f(x₀,y₀) + ...
That's been confusing me & I'd really appreciate confirmation that I've
got the logic right now.

But how do we turn f(x,y) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀) into
[itex] \Delta z \ = \ f_x(x,y) \Delta x \ + \ f_y(x,y) \Delta y \ + \ \epsilon_1 (x) \Delta x \ + \ \epsilon_2 (x) \Delta y [/itex]

Or said maybe a bit more clearly, turning:

f(x,y) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
f(x₀ + Δx,y₀ + Δy) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
Δz= (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)

into:

Δz= (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀) + ε₁(x)Δx + ε₂(y)Δy
Δz= (∂f/∂x)Δx + (∂f/∂y)Δy + ε₁(x)Δx + ε₂(y)Δy

in a more linear fashion than just saying it should work :confused:
 
Last edited:
Physics news on Phys.org
  • #2
You need to think and write a little bit more precisely with respect to what equations are approximations and what equations are exact. The equation [tex]f(x, y) - f(x_0, y_0) = \frac{\partial f}{\partial x}(x - x_0) + \frac{\partial f}{\partial y}(y - y_0)[/tex] is in general false even for [tex](x, y)[/tex] close to [tex](x_0, y_0)[/tex]; you need to write the error terms represented in your other formulation by [tex]\epsilon_1(x, y) \Delta x + \epsilon_2(x, y) \Delta y[/tex].

It will also help you to learn about the interpretation of the derivative as a linear transformation -- this makes thinking about approximations and the chain rule much simpler.

If [tex]U \subset \mathbb{R}^m[/tex] is an open set and [tex]f: U \to \mathbb{R}^n[/tex] is a function, we say [tex]f[/tex] is differentiable at [tex]x_0 \in U[/tex] if there exists a linear transformation [tex]T: \mathbb{R}^m \to \mathbb{R}^n[/tex] which approximates [tex]f[/tex] well near [tex]x_0[/tex]. That is, if for [tex]x[/tex] in some neighborhood [tex]V[/tex] of [tex]x_0[/tex], we find that [tex]f(x)[/tex] is almost equal to [tex]f(x_0) + T(x - x_0)[/tex]. How is "almost equal" defined? The approximation must be the best possible linear approximation, i.e., not leave any linear part behind: we must have [tex]\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - T(x - x_0)\|}{\|x - x_0\|} = 0[/tex], or in the Landau little-o notation, [tex]f(x) = f(x_0) + T(x - x_0) + o(x - x_0)[/tex] as [tex]x \to x_0[/tex]. This should be a rephrasing in different notation of the things you already know using [tex]\epsilon_1(x, y) \Delta x[/tex] and the like. We then say [tex]T[/tex] is the derivative of [tex]f[/tex] at [tex]x_0[/tex], [tex]Df(x_0) = T[/tex].

The nice thing about this formulation is that the chain rule reduces to a statement that the derivative of the composition is the composition of the derivatives: [tex]D(g \circ f)(x_0) = Dg(f(x_0)) \circ Df(x_0)[/tex]; and the root of its proof is the fact that a linear transformation can't turn something that is [tex]o(x - x_0)[/tex], something that dies faster than linearly as [tex]x \to x_0[/tex], into something that isn't.

Take a look at the early part of Spivak, Calculus on manifolds, for a good and accessible treatment of the derivative along these lines.
 
  • #3
The equation

[tex]
f(x, y) - f(x_0, y_0) = \frac{\partial f}{\partial x}(x - x_0) + \frac{\partial f}{\partial y}(y - y_0)
[/tex]

is not an equation of the function, I'm aware of that. I derived this equation in my post to
explicitly be the equation of the tangent plane to the point f(x₀,y₀) with f(x,y) being another
point on the tangent plane. I agree the notation is confusing, I'm getting this from Stewart's
calculus & honestly if you read my post you'll see that I understand the distinction between
the function (which requires error terms) and the tangent plane. The book made me double
guess myself & I posted that part of my post just to get clarification that I wasn't wrong,
so thanks!

I have absolutely no problem with learning the more advanced linear transformation
expression of the derivative & honestly want to do it properly but I've just planned to
finish the Stewart multivariable book quickly to ensure I'm aware of all formulations of
this topic.

If you've found no problem with this elementary chain rule proof in-and-of itself then
if you understand what I mean by turning:

f(x,y) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
f(x₀ + Δx,y₀ + Δy) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
Δz= (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)

into:

Δz= (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀) + ε₁(x)Δx + ε₂(y)Δy
Δz= (∂f/∂x)Δx + (∂f/∂y)Δy + ε₁(x)Δx + ε₂(y)Δy

using a clear explanation that doesn't rely on me saying "well because they are in different
dimensions we can just use the same single-variable derivation in each dimension" I would
appreciate it. Note: I derived the single variable expression at the top of my first post &
I'm afraid the idea is to just copy that in different dimensions, is that all there is to it or is
there a clearer way?
 
  • #4
sponsoredwalk said:
The equation

[tex]
f(x, y) - f(x_0, y_0) = \frac{\partial f}{\partial x}(x - x_0) + \frac{\partial f}{\partial y}(y - y_0)
[/tex]

is not an equation of the function, I'm aware of that. I derived this equation in my post to
explicitly be the equation of the tangent plane to the point f(x₀,y₀) with f(x,y) being another
point on the tangent plane. I agree the notation is confusing

The reason I called this out specifically is that you can't do that. The equation you wrote is not just "not an equation of the function"; it is false. You cannot use [tex]f[/tex] to denote a function in one line and the function of its tangent plane in the next line; that leads you to the incorrect conclusions you make below. You say you understand the distinction between the function and the tangent plane, but your notation -- the math you write, which is the only means I have to assess what you understand -- doesn't convey that.

sponsoredwalk said:
I have absolutely no problem with learning the more advanced linear transformation
expression of the derivative & honestly want to do it properly but I've just planned to
finish the Stewart multivariable book quickly to ensure I'm aware of all formulations of
this topic.

It's entirely possible that your confusion is due to Stewart being confusing and terrible; I haven't read Stewart, but most elementary calculus books I've read are, in fact, confusing and terrible.

sponsoredwalk said:
If you've found no problem with this elementary chain rule proof in-and-of itself then
if you understand what I mean by turning:

f(x,y) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
f(x₀ + Δx,y₀ + Δy) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
Δz= (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)

into:

Δz= (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀) + ε₁(x)Δx + ε₂(y)Δy
Δz= (∂f/∂x)Δx + (∂f/∂y)Δy + ε₁(x)Δx + ε₂(y)Δy

using a clear explanation that doesn't rely on me saying "well because they are in different
dimensions we can just use the same single-variable derivation in each dimension" I would
appreciate it.

The explanation is "you can't, because it's wrong". The problem is that in the first set of equations [tex]f[/tex] is used to denote the function of the tangent plane -- thus the absence of error terms -- and in the second set [tex]f[/tex] (or [tex]\Delta z = f(x + \Delta x + \Delta y) - f(x)[/tex]) is used to denote the original function -- thus the presence of error terms. There is no correct algebraic transformation you can do to turn one into the other, because they're not equations of the same thing.
 
  • #5
sponsoredwalk said:
I'm looking at the proof of the multivariable chain rule & just a little bit curious about something.
In the single variable chain rule proof the way I know it is that you take the derivative:

[itex] f'(x) \ = \ \lim_{ \Delta x \to \infty} \frac{ \Delta y}{ \Delta x} [/itex]

and manipulate it as follows:

[itex] f'(x) \ - \ \lim_{ \Delta x \to \infty} \frac{ \Delta y}{ \Delta x} \ = \ 0 [/itex]
No one else has commented on this, so I will. This is a relatively minor point, but you repeated it in a couple of places. The limits should be taken as [itex]\Delta x[/itex] and [itex]\Delta t[/itex] approach zero, not infinity.
sponsoredwalk said:
[itex] \frac{dz}{dt} \ = \ \lim_{ \Delta t \to \infty} \frac{ \Delta z}{ \Delta t} \ = \ \lim_{ \Delta t \to \infty} \ [ \ f_x(x,y) \frac{ \Delta x}{ \Delta t} \ + \ f_y(x,y) \frac{ \Delta y}{ \Delta t} \ + \ \epsilon_1 (x) \frac{ \Delta x}{ \Delta t} \ + \ \epsilon_2 (x) \frac{ \Delta y}{ \Delta t} \ ] \ = \ \ f_x(x,y) \frac{ dx}{ dt} \ + \ f_y(x,y) \frac{ d y}{ dt} [/itex]
 
  • #6
Ah you're totally right! I must admit the similarity between the equations had me
thinking one could be derived from the other & looking at the notation Stewart uses
I see exactly where I got that idea from, the whole mistake I'm making is to even
bother with the tangent plane here. That's really helpful, I guess when dealing with
this it is a case of conceptually understanding that the derivatives in both directions
require error terms & that you can go on from there. Thanks :cool:

As for the LaTeX [itex] \lim_{x \to \infty} [/itex], I have a program on firefox that
generates latex readymade but it generates the limits pre-set to go to infinity & I just
didn't change them by mistake, I also notice my LaTeX error terms accidentally both have
ε₁(x)Δx + ε₂(x)Δy instead of ε₁(x)Δx + ε₂(y)Δy from all that copying & pasting :tongue2:
 

1. What is the multivariable chain rule proof?

The multivariable chain rule proof is a mathematical proof that explains how to calculate the derivative of a function with multiple variables using the chain rule. It is an extension of the single-variable chain rule and is used in multivariable calculus to solve problems involving functions with multiple variables.

2. Why is understanding the multivariable chain rule proof important?

Understanding the multivariable chain rule proof is important because it allows us to find the derivative of a function with multiple variables, which is necessary for solving many real-world problems in fields such as physics, engineering, and economics. It also lays the foundation for more advanced concepts in multivariable calculus.

3. What are the key steps in the multivariable chain rule proof?

The key steps in the multivariable chain rule proof include: breaking down the function into smaller, simpler functions, applying the single-variable chain rule to each of these functions, and then combining the results using the product rule. This process can be repeated for functions with any number of variables.

4. How is the multivariable chain rule proof used in real-world applications?

The multivariable chain rule proof is used in a wide range of real-world applications, such as optimization problems, economics, and physics. For example, it can be used to calculate the marginal cost and revenue in business, finding the maximum and minimum values of a function, and determining the velocity and acceleration of an object in motion.

5. What are some common mistakes when using the multivariable chain rule proof?

Some common mistakes when using the multivariable chain rule proof include forgetting to apply the product rule when combining the derivatives of the smaller functions, not correctly identifying the inner and outer functions, and not taking into account the chain rule when differentiating a composite function. It is important to carefully follow each step of the proof to avoid these mistakes.

Similar threads

  • Calculus and Beyond Homework Help
Replies
2
Views
221
Replies
4
Views
603
  • Calculus and Beyond Homework Help
Replies
9
Views
1K
  • Calculus and Beyond Homework Help
Replies
7
Views
1K
  • Calculus and Beyond Homework Help
Replies
11
Views
1K
  • Calculus and Beyond Homework Help
Replies
4
Views
924
  • Calculus and Beyond Homework Help
Replies
9
Views
2K
  • Calculus and Beyond Homework Help
Replies
7
Views
1K
  • Calculus and Beyond Homework Help
Replies
11
Views
1K
  • Calculus and Beyond Homework Help
Replies
1
Views
966
Back
Top