How is the Multivariable Chain Rule Proved?

Click For Summary
SUMMARY

The discussion focuses on the proof of the multivariable chain rule, highlighting the transition from single-variable to multivariable derivatives. The proof utilizes the function Δz = f_x(x,y) Δx + f_y(x,y) Δy + ε₁(x) Δx + ε₂(y) Δy, which parallels the single-variable case. Participants emphasize the importance of error terms in approximations and the distinction between the function and its tangent plane. The conversation also addresses common misconceptions regarding limits, clarifying that they should approach zero rather than infinity.

PREREQUISITES
  • Understanding of single-variable calculus, specifically the chain rule.
  • Familiarity with multivariable functions and partial derivatives.
  • Knowledge of Taylor's theorem and error terms in approximations.
  • Basic concepts of linear transformations in calculus.
NEXT STEPS
  • Study the derivation of the multivariable chain rule in detail.
  • Learn about Taylor series expansions for multivariable functions.
  • Explore linear transformations and their role in calculus.
  • Review the concept of differentiability in multivariable calculus.
USEFUL FOR

Students of calculus, mathematicians, and educators seeking a deeper understanding of the multivariable chain rule and its applications in higher-dimensional analysis.

sponsoredwalk
Messages
531
Reaction score
5
I'm looking at the proof of the multivariable chain rule & just a little bit curious about something.
In the single variable chain rule proof the way I know it is that you take the derivative:

f'(x) \ = \ \lim_{ \Delta x \to \infty} \frac{ \Delta y}{ \Delta x}

and manipulate it as follows:

f'(x) \ - \ \lim_{ \Delta x \to \infty} \frac{ \Delta y}{ \Delta x} \ = \ 0

f'(x) \ - \ \frac{ \Delta y}{ \Delta x} \ = \ \epsilon (x)

\Delta y \ = \ f'(x) \Delta x \ + \ \epsilon (x) \Delta x

and you work off that function to prove the single variable version.
The multivariable version uses a function:

\Delta z \ = \ f_x(x,y) \Delta x \ + \ f_y(x,y) \Delta y \ + \ \epsilon_1 (x) \Delta x \ + \ \epsilon_2 (x) \Delta y

which I can see is analogous to the single variable version but having
trouble deriving to be honest. But assuming that I'm okay with this
function I wonder about the proof.

The special case is just do divide by Δt & take the limit:

\frac{dz}{dt} \ = \ \lim_{ \Delta t \to \infty} \frac{ \Delta z}{ \Delta t} \ = \ \lim_{ \Delta t \to \infty} \ [ \ f_x(x,y) \frac{ \Delta x}{ \Delta t} \ + \ f_y(x,y) \frac{ \Delta y}{ \Delta t} \ + \ \epsilon_1 (x) \frac{ \Delta x}{ \Delta t} \ + \ \epsilon_2 (x) \frac{ \Delta y}{ \Delta t} \ ] \ = \ \ f_x(x,y) \frac{ dx}{ dt} \ + \ f_y(x,y) \frac{ d y}{ dt}

and if f(x,y) has both x & y as functions of two variables
z = f(x,y) = f [ x(s,t),y(s,t) ]
then you follow the exact same idea if you're taking the partial w.r.t.
to s or t.

The general chain rule would just be a natural extension of this right? i.e.

z = f(x₁,x₂,...,xᵢ) = f [ x₁(t₁,t₂,...,tᵢ),x₂(t₁,t₂,...,tᵢ),...,xᵢ(t₁,t₂,...,tᵢ) ]

and the partial w.r.t. to tᵥ is the exact same idea:\frac{\partial z}{\partial t_v} \ = \ f_{x_1}[x_2(t_1,t_2,...,t_v,...,t_i),x_2(t_1,t_2,...,t_v,...,t_i),...] \ \frac{dx_1}{dt_v} \ + \ f_{x_2}[x_2(t_1,t_2,...,t_v,...,t_i),x_2(t_1,t_2,...,t_v,...,t_i),...] \ \frac{dx_2}{dt_v} \ + \ ... \ + \ f_{x_i}[x_1(t_1,t_2,...,t_v,...,t_i),x_2(t_1,t_2,...,t_v,...,t_i),...] \ \frac{dx_i}{dt_v}

obviously the notation can be shortened :blushing: but that's it right?

---------------------------------------------------------------------------

Assuming that proof to be correct I'm wondering about the function\Delta z \ = \ f_x(x,y) \Delta x \ + \ f_y(x,y) \Delta y \ + \ \epsilon_1 (x) \Delta x \ + \ \epsilon_2 (x) \Delta y

I mean rather than just saying it's analogous in different dimensions
shouldn't there be a way to derive it from the very similar arguments
involving tangent planes?

Start with the vector equation N • (X - X₀) = 0 to derive the plane.
N•(X - X₀) = 0
(A,B,C)•[(x - x₀),(y - y₀),(z - z₀)] = 0
A(x - x₀) + B(y - y₀) + C(z - z₀) = 0
z - z₀ = (-A/C)(x - x₀) + (-B/C)(y - y₀)
f(x,y) = f(x₀,y₀) + (-A/C)(x - x₀) + (-B/C)(y - y₀)
f(x,y) = f(x₀,y₀) + (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)

Now, I understand that this is the description of the tangent plane that
intersects the point f(x₀,y₀) & can be used to approximate a function for
all x close to f(x₀,y₀)
f(x,y) ≈ f(x₀,y₀) + (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
I say this to make sure I have the correct understanding, when I derived
f(x,y) = f(x₀,y₀) + (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
above I was deriving a linear tangent plane equation but for any function
at the point f(x₀,y₀) we can use this equation to find the tangent plane
intersecting the point f(x₀,y₀) and we can also linearly approximate any
function for all x,y close to f(x₀,y₀) just like the single variable tangent line.

It is the extra terms of taylor's formula that turn
f(x,y) ≈ f(x₀,y₀) + ... into f(x,y) = f(x₀,y₀) + ...
That's been confusing me & I'd really appreciate confirmation that I've
got the logic right now.

But how do we turn f(x,y) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀) into
\Delta z \ = \ f_x(x,y) \Delta x \ + \ f_y(x,y) \Delta y \ + \ \epsilon_1 (x) \Delta x \ + \ \epsilon_2 (x) \Delta y

Or said maybe a bit more clearly, turning:

f(x,y) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
f(x₀ + Δx,y₀ + Δy) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
Δz= (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)

into:

Δz= (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀) + ε₁(x)Δx + ε₂(y)Δy
Δz= (∂f/∂x)Δx + (∂f/∂y)Δy + ε₁(x)Δx + ε₂(y)Δy

in a more linear fashion than just saying it should work :confused:
 
Last edited:
Physics news on Phys.org
You need to think and write a little bit more precisely with respect to what equations are approximations and what equations are exact. The equation f(x, y) - f(x_0, y_0) = \frac{\partial f}{\partial x}(x - x_0) + \frac{\partial f}{\partial y}(y - y_0) is in general false even for (x, y) close to (x_0, y_0); you need to write the error terms represented in your other formulation by \epsilon_1(x, y) \Delta x + \epsilon_2(x, y) \Delta y.

It will also help you to learn about the interpretation of the derivative as a linear transformation -- this makes thinking about approximations and the chain rule much simpler.

If U \subset \mathbb{R}^m is an open set and f: U \to \mathbb{R}^n is a function, we say f is differentiable at x_0 \in U if there exists a linear transformation T: \mathbb{R}^m \to \mathbb{R}^n which approximates f well near x_0. That is, if for x in some neighborhood V of x_0, we find that f(x) is almost equal to f(x_0) + T(x - x_0). How is "almost equal" defined? The approximation must be the best possible linear approximation, i.e., not leave any linear part behind: we must have \lim_{x\to x_0} \frac{\|f(x) - f(x_0) - T(x - x_0)\|}{\|x - x_0\|} = 0, or in the Landau little-o notation, f(x) = f(x_0) + T(x - x_0) + o(x - x_0) as x \to x_0. This should be a rephrasing in different notation of the things you already know using \epsilon_1(x, y) \Delta x and the like. We then say T is the derivative of f at x_0, Df(x_0) = T.

The nice thing about this formulation is that the chain rule reduces to a statement that the derivative of the composition is the composition of the derivatives: D(g \circ f)(x_0) = Dg(f(x_0)) \circ Df(x_0); and the root of its proof is the fact that a linear transformation can't turn something that is o(x - x_0), something that dies faster than linearly as x \to x_0, into something that isn't.

Take a look at the early part of Spivak, Calculus on manifolds, for a good and accessible treatment of the derivative along these lines.
 
The equation

<br /> f(x, y) - f(x_0, y_0) = \frac{\partial f}{\partial x}(x - x_0) + \frac{\partial f}{\partial y}(y - y_0)<br />

is not an equation of the function, I'm aware of that. I derived this equation in my post to
explicitly be the equation of the tangent plane to the point f(x₀,y₀) with f(x,y) being another
point on the tangent plane. I agree the notation is confusing, I'm getting this from Stewart's
calculus & honestly if you read my post you'll see that I understand the distinction between
the function (which requires error terms) and the tangent plane. The book made me double
guess myself & I posted that part of my post just to get clarification that I wasn't wrong,
so thanks!

I have absolutely no problem with learning the more advanced linear transformation
expression of the derivative & honestly want to do it properly but I've just planned to
finish the Stewart multivariable book quickly to ensure I'm aware of all formulations of
this topic.

If you've found no problem with this elementary chain rule proof in-and-of itself then
if you understand what I mean by turning:

f(x,y) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
f(x₀ + Δx,y₀ + Δy) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
Δz= (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)

into:

Δz= (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀) + ε₁(x)Δx + ε₂(y)Δy
Δz= (∂f/∂x)Δx + (∂f/∂y)Δy + ε₁(x)Δx + ε₂(y)Δy

using a clear explanation that doesn't rely on me saying "well because they are in different
dimensions we can just use the same single-variable derivation in each dimension" I would
appreciate it. Note: I derived the single variable expression at the top of my first post &
I'm afraid the idea is to just copy that in different dimensions, is that all there is to it or is
there a clearer way?
 
sponsoredwalk said:
The equation

<br /> f(x, y) - f(x_0, y_0) = \frac{\partial f}{\partial x}(x - x_0) + \frac{\partial f}{\partial y}(y - y_0)<br />

is not an equation of the function, I'm aware of that. I derived this equation in my post to
explicitly be the equation of the tangent plane to the point f(x₀,y₀) with f(x,y) being another
point on the tangent plane. I agree the notation is confusing

The reason I called this out specifically is that you can't do that. The equation you wrote is not just "not an equation of the function"; it is false. You cannot use f to denote a function in one line and the function of its tangent plane in the next line; that leads you to the incorrect conclusions you make below. You say you understand the distinction between the function and the tangent plane, but your notation -- the math you write, which is the only means I have to assess what you understand -- doesn't convey that.

sponsoredwalk said:
I have absolutely no problem with learning the more advanced linear transformation
expression of the derivative & honestly want to do it properly but I've just planned to
finish the Stewart multivariable book quickly to ensure I'm aware of all formulations of
this topic.

It's entirely possible that your confusion is due to Stewart being confusing and terrible; I haven't read Stewart, but most elementary calculus books I've read are, in fact, confusing and terrible.

sponsoredwalk said:
If you've found no problem with this elementary chain rule proof in-and-of itself then
if you understand what I mean by turning:

f(x,y) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
f(x₀ + Δx,y₀ + Δy) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
Δz= (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)

into:

Δz= (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀) + ε₁(x)Δx + ε₂(y)Δy
Δz= (∂f/∂x)Δx + (∂f/∂y)Δy + ε₁(x)Δx + ε₂(y)Δy

using a clear explanation that doesn't rely on me saying "well because they are in different
dimensions we can just use the same single-variable derivation in each dimension" I would
appreciate it.

The explanation is "you can't, because it's wrong". The problem is that in the first set of equations f is used to denote the function of the tangent plane -- thus the absence of error terms -- and in the second set f (or \Delta z = f(x + \Delta x + \Delta y) - f(x)) is used to denote the original function -- thus the presence of error terms. There is no correct algebraic transformation you can do to turn one into the other, because they're not equations of the same thing.
 
sponsoredwalk said:
I'm looking at the proof of the multivariable chain rule & just a little bit curious about something.
In the single variable chain rule proof the way I know it is that you take the derivative:

f&#039;(x) \ = \ \lim_{ \Delta x \to \infty} \frac{ \Delta y}{ \Delta x}

and manipulate it as follows:

f&#039;(x) \ - \ \lim_{ \Delta x \to \infty} \frac{ \Delta y}{ \Delta x} \ = \ 0
No one else has commented on this, so I will. This is a relatively minor point, but you repeated it in a couple of places. The limits should be taken as \Delta x and \Delta t approach zero, not infinity.
sponsoredwalk said:
\frac{dz}{dt} \ = \ \lim_{ \Delta t \to \infty} \frac{ \Delta z}{ \Delta t} \ = \ \lim_{ \Delta t \to \infty} \ [ \ f_x(x,y) \frac{ \Delta x}{ \Delta t} \ + \ f_y(x,y) \frac{ \Delta y}{ \Delta t} \ + \ \epsilon_1 (x) \frac{ \Delta x}{ \Delta t} \ + \ \epsilon_2 (x) \frac{ \Delta y}{ \Delta t} \ ] \ = \ \ f_x(x,y) \frac{ dx}{ dt} \ + \ f_y(x,y) \frac{ d y}{ dt}
 
Ah you're totally right! I must admit the similarity between the equations had me
thinking one could be derived from the other & looking at the notation Stewart uses
I see exactly where I got that idea from, the whole mistake I'm making is to even
bother with the tangent plane here. That's really helpful, I guess when dealing with
this it is a case of conceptually understanding that the derivatives in both directions
require error terms & that you can go on from there. Thanks :cool:

As for the LaTeX \lim_{x \to \infty}, I have a program on firefox that
generates latex readymade but it generates the limits pre-set to go to infinity & I just
didn't change them by mistake, I also notice my LaTeX error terms accidentally both have
ε₁(x)Δx + ε₂(x)Δy instead of ε₁(x)Δx + ε₂(y)Δy from all that copying & pasting :-p
 
Question: A clock's minute hand has length 4 and its hour hand has length 3. What is the distance between the tips at the moment when it is increasing most rapidly?(Putnam Exam Question) Answer: Making assumption that both the hands moves at constant angular velocities, the answer is ## \sqrt{7} .## But don't you think this assumption is somewhat doubtful and wrong?

Similar threads

Replies
8
Views
2K
Replies
2
Views
1K
Replies
4
Views
1K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
Replies
2
Views
1K
Replies
7
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K