How is the Multivariable Chain Rule Proved?

sponsoredwalk · Jan 7, 2011

I'm looking at the proof of the multivariable chain rule & just a little bit curious about something.
In the single variable chain rule proof the way I know it is that you take the derivative:

[itex]f'(x) \ = \ \lim_{ \Delta x \to \infty} \frac{ \Delta y}{ \Delta x}[/itex]

and manipulate it as follows:

[itex]f'(x) \ - \ \lim_{ \Delta x \to \infty} \frac{ \Delta y}{ \Delta x} \ = \ 0[/itex]

[itex]f'(x) \ - \ \frac{ \Delta y}{ \Delta x} \ = \ \epsilon (x)[/itex]

[itex]\Delta y \ = \ f'(x) \Delta x \ + \ \epsilon (x) \Delta x[/itex]

and you work off that function to prove the single variable version.
The multivariable version uses a function:

[itex]\Delta z \ = \ f_x(x,y) \Delta x \ + \ f_y(x,y) \Delta y \ + \ \epsilon_1 (x) \Delta x \ + \ \epsilon_2 (x) \Delta y[/itex]

which I can see is analogous to the single variable version but having
trouble deriving to be honest. But assuming that I'm okay with this
function I wonder about the proof.

The special case is just do divide by Δt & take the limit:

[itex]\frac{dz}{dt} \ = \ \lim_{ \Delta t \to \infty} \frac{ \Delta z}{ \Delta t} \ = \ \lim_{ \Delta t \to \infty} \ [ \ f_x(x,y) \frac{ \Delta x}{ \Delta t} \ + \ f_y(x,y) \frac{ \Delta y}{ \Delta t} \ + \ \epsilon_1 (x) \frac{ \Delta x}{ \Delta t} \ + \ \epsilon_2 (x) \frac{ \Delta y}{ \Delta t} \ ] \ = \ \ f_x(x,y) \frac{ dx}{ dt} \ + \ f_y(x,y) \frac{ d y}{ dt}[/itex]

and if f(x,y) has both x & y as functions of two variables
z = f(x,y) = f [ x(s,t),y(s,t) ]
then you follow the exact same idea if you're taking the partial w.r.t.
to s or t.

The general chain rule would just be a natural extension of this right? i.e.

z = f(x₁,x₂,...,xᵢ) = f [ x₁(t₁,t₂,...,tᵢ),x₂(t₁,t₂,...,tᵢ),...,xᵢ(t₁,t₂,...,tᵢ) ]

and the partial w.r.t. to tᵥ is the exact same idea:[itex]\frac{\partial z}{\partial t_v} \ = \ f_{x_1}[x_2(t_1,t_2,...,t_v,...,t_i),x_2(t_1,t_2,...,t_v,...,t_i),...] \ \frac{dx_1}{dt_v} \ + \ f_{x_2}[x_2(t_1,t_2,...,t_v,...,t_i),x_2(t_1,t_2,...,t_v,...,t_i),...] \ \frac{dx_2}{dt_v} \ + \ ... \ + \ f_{x_i}[x_1(t_1,t_2,...,t_v,...,t_i),x_2(t_1,t_2,...,t_v,...,t_i),...] \ \frac{dx_i}{dt_v}[/itex]

obviously the notation can be shortened :blushing:

but that's it right?

---------------------------------------------------------------------------

Assuming that proof to be correct I'm wondering about the function[itex]\Delta z \ = \ f_x(x,y) \Delta x \ + \ f_y(x,y) \Delta y \ + \ \epsilon_1 (x) \Delta x \ + \ \epsilon_2 (x) \Delta y[/itex]

I mean rather than just saying it's analogous in different dimensions
shouldn't there be a way to derive it from the very similar arguments
involving tangent planes?

Start with the vector equation N • (X - X₀) = 0 to derive the plane.
N•(X - X₀) = 0
(A,B,C)•[(x - x₀),(y - y₀),(z - z₀)] = 0
A(x - x₀) + B(y - y₀) + C(z - z₀) = 0
z - z₀ = (-A/C)(x - x₀) + (-B/C)(y - y₀)
f(x,y) = f(x₀,y₀) + (-A/C)(x - x₀) + (-B/C)(y - y₀)
f(x,y) = f(x₀,y₀) + (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)

Now, I understand that this is the description of the tangent plane that
intersects the point f(x₀,y₀) & can be used to approximate a function for
all x close to f(x₀,y₀)
f(x,y) ≈ f(x₀,y₀) + (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
I say this to make sure I have the correct understanding, when I derived
f(x,y) = f(x₀,y₀) + (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
above I was deriving a linear tangent plane equation but for any function
at the point f(x₀,y₀) we can use this equation to find the tangent plane
intersecting the point f(x₀,y₀) and we can also linearly approximate any
function for all x,y close to f(x₀,y₀) just like the single variable tangent line.

It is the extra terms of taylor's formula that turn
f(x,y) ≈ f(x₀,y₀) + ... into f(x,y) = f(x₀,y₀) + ...
That's been confusing me & I'd really appreciate confirmation that I've
got the logic right now.

But how do we turn f(x,y) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀) into
[itex]\Delta z \ = \ f_x(x,y) \Delta x \ + \ f_y(x,y) \Delta y \ + \ \epsilon_1 (x) \Delta x \ + \ \epsilon_2 (x) \Delta y[/itex]

Or said maybe a bit more clearly, turning:

f(x,y) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
f(x₀ + Δx,y₀ + Δy) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
Δz= (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)

into:

Δz= (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀) + ε₁(x)Δx + ε₂(y)Δy
Δz= (∂f/∂x)Δx + (∂f/∂y)Δy + ε₁(x)Δx + ε₂(y)Δy

in a more linear fashion than just saying it should work

ystael · Jan 7, 2011

You need to think and write a little bit more precisely with respect to what equations are approximations and what equations are exact. The equation [tex]f(x, y) - f(x_0, y_0) = \frac{\partial f}{\partial x}(x - x_0) + \frac{\partial f}{\partial y}(y - y_0)[/tex] is in general false even for [tex](x, y)[/tex] close to [tex](x_0, y_0)[/tex]; you need to write the error terms represented in your other formulation by [tex]\epsilon_1(x, y) \Delta x + \epsilon_2(x, y) \Delta y[/tex].

It will also help you to learn about the interpretation of the derivative as a linear transformation -- this makes thinking about approximations and the chain rule much simpler.

If [tex]U \subset \mathbb{R}^m[/tex] is an open set and [tex]f: U \to \mathbb{R}^n[/tex] is a function, we say [tex]f[/tex] is differentiable at [tex]x_0 \in U[/tex] if there exists a linear transformation [tex]T: \mathbb{R}^m \to \mathbb{R}^n[/tex] which approximates [tex]f[/tex] well near [tex]x_0[/tex]. That is, if for [tex]x[/tex] in some neighborhood [tex]V[/tex] of [tex]x_0[/tex], we find that [tex]f(x)[/tex] is almost equal to [tex]f(x_0) + T(x - x_0)[/tex]. How is "almost equal" defined? The approximation must be the best possible linear approximation, i.e., not leave any linear part behind: we must have [tex]\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - T(x - x_0)\|}{\|x - x_0\|} = 0[/tex], or in the Landau little-o notation, [tex]f(x) = f(x_0) + T(x - x_0) + o(x - x_0)[/tex] as [tex]x \to x_0[/tex]. This should be a rephrasing in different notation of the things you already know using [tex]\epsilon_1(x, y) \Delta x[/tex] and the like. We then say [tex]T[/tex] is the derivative of [tex]f[/tex] at [tex]x_0[/tex], [tex]Df(x_0) = T[/tex].

The nice thing about this formulation is that the chain rule reduces to a statement that the derivative of the composition is the composition of the derivatives: [tex]D(g \circ f)(x_0) = Dg(f(x_0)) \circ Df(x_0)[/tex]; and the root of its proof is the fact that a linear transformation can't turn something that is [tex]o(x - x_0)[/tex], something that dies faster than linearly as [tex]x \to x_0[/tex], into something that isn't.

Take a look at the early part of Spivak, Calculus on manifolds, for a good and accessible treatment of the derivative along these lines.

sponsoredwalk · Jan 7, 2011

The equation

[tex] f(x, y) - f(x_0, y_0) = \frac{\partial f}{\partial x}(x - x_0) + \frac{\partial f}{\partial y}(y - y_0)[/tex]

is not an equation of the function, I'm aware of that. I derived this equation in my post to
explicitly be the equation of the tangent plane to the point f(x₀,y₀) with f(x,y) being another
point on the tangent plane. I agree the notation is confusing, I'm getting this from Stewart's
calculus & honestly if you read my post you'll see that I understand the distinction between
the function (which requires error terms) and the tangent plane. The book made me double
guess myself & I posted that part of my post just to get clarification that I wasn't wrong,
so thanks!

I have absolutely no problem with learning the more advanced linear transformation
expression of the derivative & honestly want to do it properly but I've just planned to
finish the Stewart multivariable book quickly to ensure I'm aware of all formulations of
this topic.

If you've found no problem with this elementary chain rule proof in-and-of itself then
if you understand what I mean by turning:

f(x,y) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
f(x₀ + Δx,y₀ + Δy) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
Δz= (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)

into:

Δz= (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀) + ε₁(x)Δx + ε₂(y)Δy
Δz= (∂f/∂x)Δx + (∂f/∂y)Δy + ε₁(x)Δx + ε₂(y)Δy

using a clear explanation that doesn't rely on me saying "well because they are in different
dimensions we can just use the same single-variable derivation in each dimension" I would
appreciate it. Note: I derived the single variable expression at the top of my first post &
I'm afraid the idea is to just copy that in different dimensions, is that all there is to it or is
there a clearer way?

ystael · Jan 7, 2011

sponsoredwalk said:

The equation

[tex] f(x, y) - f(x_0, y_0) = \frac{\partial f}{\partial x}(x - x_0) + \frac{\partial f}{\partial y}(y - y_0)[/tex]

is not an equation of the function, I'm aware of that. I derived this equation in my post to
explicitly be the equation of the tangent plane to the point f(x₀,y₀) with f(x,y) being another
point on the tangent plane. I agree the notation is confusing

The reason I called this out specifically is that you can't do that. The equation you wrote is not just "not an equation of the function"; it is false. You cannot use [tex]f[/tex] to denote a function in one line and the function of its tangent plane in the next line; that leads you to the incorrect conclusions you make below. You say you understand the distinction between the function and the tangent plane, but your notation -- the math you write, which is the only means I have to assess what you understand -- doesn't convey that.

sponsoredwalk said:

I have absolutely no problem with learning the more advanced linear transformation
expression of the derivative & honestly want to do it properly but I've just planned to
finish the Stewart multivariable book quickly to ensure I'm aware of all formulations of
this topic.

It's entirely possible that your confusion is due to Stewart being confusing and terrible; I haven't read Stewart, but most elementary calculus books I've read are, in fact, confusing and terrible.

sponsoredwalk said:

If you've found no problem with this elementary chain rule proof in-and-of itself then
if you understand what I mean by turning:

f(x,y) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
f(x₀ + Δx,y₀ + Δy) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
Δz= (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)

into:

Δz= (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀) + ε₁(x)Δx + ε₂(y)Δy
Δz= (∂f/∂x)Δx + (∂f/∂y)Δy + ε₁(x)Δx + ε₂(y)Δy

using a clear explanation that doesn't rely on me saying "well because they are in different
dimensions we can just use the same single-variable derivation in each dimension" I would
appreciate it.

The explanation is "you can't, because it's wrong". The problem is that in the first set of equations [tex]f[/tex] is used to denote the function of the tangent plane -- thus the absence of error terms -- and in the second set [tex]f[/tex] (or [tex]\Delta z = f(x + \Delta x + \Delta y) - f(x)[/tex]) is used to denote the original function -- thus the presence of error terms. There is no correct algebraic transformation you can do to turn one into the other, because they're not equations of the same thing.

Mark44 · Jan 7, 2011

sponsoredwalk said:

I'm looking at the proof of the multivariable chain rule & just a little bit curious about something.
In the single variable chain rule proof the way I know it is that you take the derivative:

[itex]f'(x) \ = \ \lim_{ \Delta x \to \infty} \frac{ \Delta y}{ \Delta x}[/itex]

and manipulate it as follows:

[itex]f'(x) \ - \ \lim_{ \Delta x \to \infty} \frac{ \Delta y}{ \Delta x} \ = \ 0[/itex]

No one else has commented on this, so I will. This is a relatively minor point, but you repeated it in a couple of places. The limits should be taken as [itex]\Delta x[/itex] and [itex]\Delta t[/itex] approach zero, not infinity.

sponsoredwalk said:

[itex]\frac{dz}{dt} \ = \ \lim_{ \Delta t \to \infty} \frac{ \Delta z}{ \Delta t} \ = \ \lim_{ \Delta t \to \infty} \ [ \ f_x(x,y) \frac{ \Delta x}{ \Delta t} \ + \ f_y(x,y) \frac{ \Delta y}{ \Delta t} \ + \ \epsilon_1 (x) \frac{ \Delta x}{ \Delta t} \ + \ \epsilon_2 (x) \frac{ \Delta y}{ \Delta t} \ ] \ = \ \ f_x(x,y) \frac{ dx}{ dt} \ + \ f_y(x,y) \frac{ d y}{ dt}[/itex]

sponsoredwalk · Jan 7, 2011

Ah you're totally right! I must admit the similarity between the equations had me
thinking one could be derived from the other & looking at the notation Stewart uses
I see exactly where I got that idea from, the whole mistake I'm making is to even
bother with the tangent plane here. That's really helpful, I guess when dealing with
this it is a case of conceptually understanding that the derivatives in both directions
require error terms & that you can go on from there. Thanks

As for the LaTeX [itex]\lim_{x \to \infty}[/itex], I have a program on firefox that
generates latex readymade but it generates the limits pre-set to go to infinity & I just
didn't change them by mistake, I also notice my LaTeX error terms accidentally both have
ε₁(x)Δx + ε₂(x)Δy instead of ε₁(x)Δx + ε₂(y)Δy from all that copying & pasting

How is the Multivariable Chain Rule Proved?

Homework Help Overview

Discussion Character

Approaches and Questions Raised

Discussion Status

Contextual Notes

Similar threads

Hi! Can someone explain about Differential Equations?

Deriving spatial derivatives

Is this the correct general solution of the given PDE?

What does "compute Aut(G)" mean?

J_1(x) = (x^2/10)*(J_1(x) + J_3(x)) How to solve?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect