Proof of Multivariable Chain Rule in higher dimensions

Click For Summary
The discussion centers on proving the multivariable chain rule for functions F: R^m → R^n and G: R^p → R^m, specifically that (F ∘ G)'(x) = F'(G(x)) G'(x). The initial approach involves extending the single-variable chain rule to higher dimensions using the mean value theorem and analyzing the behavior of the functions as h approaches zero. Participants engage in clarifying the notation and structure of the proof, emphasizing the need for summation over the components of the functions involved. There is a focus on ensuring proper indexing and the application of partial derivatives in the context of multivariable calculus. The conversation highlights the complexity of generalizing the proof for arbitrary dimensions.
SpY]
Messages
63
Reaction score
0

Homework Statement



Let \textbf{F}: \textbf{R}^m \rightarrow \textbf{R}^n and \textbf{G}: \textbf{R}^p \rightarrow \textbf{R}^m

Prove that ({\textbf{F} \circ \textbf{G}})'(x) = {\textbf{F}}'(\textbf{G}(\textbf{x})) {\textbf{G}}'(\textbf{x})

Homework Equations


Assume the single variable chain rule, that is for
f, g: \textbf{R} \rightarrow \textbf{R}

\frac {d(f \circ g)}{dt}(t) = \frac {df}{dt} \big]_{g(t)} \frac {dg}{dt}(t)


The Attempt at a Solution


I figured using the single variable result by extending it to \textbf{R}^2 first, a sort of subproof which uses the mean value theorem:

Let f: \textbf{R}^2 \rightarrow \textbf{R} and \textbf{G}: \textbf{R} \rightarrow \textbf{R}^2

Then

f(\textbf{G}(t+h)) - f(\textbf{G}(t)) = f(G_1(t+h), G_2(t+h)) - f(G_1(t), G_2(t+h)) + f(G_1(t), G_2(t+h)) - f(G_1(t), G_2(t))
The second and third terms change nothing, I will use them later

Then by the first mean value theorem,
\exists k_1, k_2 \in (0,h) such that

G_1 (t+h) - G_1 (t) = h{G_1}'(t+k_1)

G_2 (t+h) - G_2 (t) = h{G_2}'(t+k_2)

Expanding the first two terms previously by substituting G_1(t+h)
f(G_1(t+h), G_2(t+h)) - f(G_1(t), G_2(t+h))

= f(h{G_1}'(t+k_1) + G_1(t), G_2(t+h))- f(G_1(t), G_2(t+h))

= h{G_1}'(t+k_1) \frac {\partial df}{\partial dx_1} \big]_{(p_1 + G_1(t), G_2(t+h))}

Where p_1 \in (0, h{G_1}'(t+k_1))

Similarly for the next two terms substituting G_2(t+h)
f(G_1(t), G_2(t+h)) - f(G_1(t), G_2(t))

f(G_1(t), h{G_2}'(t+k_2) + G_2(t)) - f(G_1(t), G_2(t))

= h{G_2}'(t+k_2) \frac {\partial df}{\partial dx_2} \big]_{(G_1(t), p_2 + G_2(t))}

Where p_2 \in (0, h{G_1}'(t+k_2))

Combining this all together and dividing by h:

\frac {f(\textbf{G}(t+h)) - f(\textbf{G}(t))}{h}

= {G_1}'(t+k_1) \frac {\partial df}{\partial dx_1} \big]_{(p_1 + G_1(t), G_2(t+h))} + {G_2}'(t+k_2) \frac {\partial df}{\partial dx_2} \big]_{(G_1(t), p_2 + G_2(t))}

Now as h \rightarrow 0, k_1, k_2, p_1, p_2 \rightarrow 0 since they are contained in intervals up to h. The LHS is now the chain derivative

{(f \circ \textbf{G})}'(t) =\lim_{h \to 0} \frac {f(\textbf{G}(t+h)) - f(\textbf{G}(t))}{h}

= {G_1}'(t+k_1) \frac {\partial df}{\partial dx_1} \big]_{(p_1 + G_1(t), G_2(t+h))} + {G_2}'(t+k_2) \frac {\partial df}{\partial dx_2} \big]_{(G_1(t), p_2 + G_2(t))}

= {f}'(\textbf{G} (t)) { \textbf{G}}'(t)

I've tried generalizing this for any n, but it gets rather long so I'm not sure how to put in concisely. After that, I don't know how to take it to the general proof (any m,n) as required.

Thanks
 
Last edited:
Physics news on Phys.org
Rather than going into all the limit stuff, I think there is an easier way.

Let i \in \{1, ..., n\}, j \in \{1, ..., m\}, k \in \{1, ..., p\}.

First we need to have that:

\frac {\partial} {\partial x_k} f_i(g_1(x), ..., g_m(x)) = \sum_{j=1}^m \frac {\partial} {\partial x_j} f_i(g_1(x), ..., g_m(x)) \cdot \frac {\partial} {\partial x_k} g_j(x))

Then we can apply the definitions and say:

(F \circ G)&#039;(x) <br /> = \left( \frac {\partial} {\partial x_k} f_i(g_1(x), ..., g_m(x)) \right) <br /> = \left( \sum_{j=1}^m \frac {\partial} {\partial x_j} f_i(g_1(x), ..., g_m(x)) \cdot \frac {\partial} {\partial x_k} g_j(x)) \right) <br /> = \left( \frac {\partial} {\partial x_j} f_i(g_1(x), ..., g_m(x)) \right) \left( \frac {\partial} {\partial x_k} g_j(x)) \right) <br /> = F&#039;(G(x))G&#039;(x)
 
Hmmm ok so let me get this straight: the i refers to elements in f, j to elements in g, and k for partial derivatives in \frac {\partial} {\partial x_k}? Where f: \textbf{R}^n \rightarrow \textbf{R} and g: \textbf{R}^m \rightarrow \textbf{R}? (just to be specific on domains here)

Then shouldn't your first line read

<br /> \sum_{i=1}^n \frac {\partial} {\partial x_k} f_i(g_1(x), ..., g_m(x)) = \sum_{j=1}^m \sum_{i=1}^n \frac {\partial} {\partial x_k} f_i(g_1(x), ..., g_m(x)) \cdot \frac {\partial} {\partial x_k} g_j(x))<br />

Because you need to sum the components for f otherwise f_i is meaningless, then end up with a double sum on the right (over f and g).

Also your first partial derivative on the right should be \frac {\partial} {\partial x_k} not by x_j otherwise it goes to \frac {\partial} {\partial x_m} because of the sum, or should there be a \sum_{k=1}^p somewhere?

I'm having trouble following your last line as well, because you expand the partial derivative using a sum, then just take the sum away keeping the same indices. Throughout you have the variables i, j, k without the sum in front, when you should be summing to n, m, p.

Thanks for the effort though. If a mentor or homework helper could give input it would be appreciated.
 
Question: A clock's minute hand has length 4 and its hour hand has length 3. What is the distance between the tips at the moment when it is increasing most rapidly?(Putnam Exam Question) Answer: Making assumption that both the hands moves at constant angular velocities, the answer is ## \sqrt{7} .## But don't you think this assumption is somewhat doubtful and wrong?

Similar threads

  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 0 ·
Replies
0
Views
2K
Replies
2
Views
1K
Replies
8
Views
2K
Replies
10
Views
2K
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 6 ·
Replies
6
Views
1K
Replies
10
Views
3K