Rigorously understanding chain rule for sum of functions

Avatrin · Aug 6, 2017

In my quest to understand the Euler-Lagrange equation, I've realized I have to understand the chain rule first. So, here's the issue:

We have [itex]g(\epsilon) = f(t) + \epsilon h(t)[/itex]. We have to compute [itex]\frac{\partial F(g(\epsilon))}{\partial \epsilon}[/itex]. This is supposed to be equal to [itex]\frac{\partial F(f)}{\partial f}h(t)[/itex] when [itex]\epsilon = 0[/itex]. However, this does not make any sense to me. Doing the computations and using the chain rule, I get:
$$\frac{\partial F(g(0))}{\partial \epsilon} = \lim_{\epsilon \to 0}\frac{F(g(\epsilon)) - F(g(0))}{g(\epsilon) -g(0) } \frac{g(\epsilon) -g(0)}{\epsilon} = \lim_{\epsilon \to 0}\frac{F(f(t)+\epsilon h(t)) - F(f(t))}{\epsilon h(t) } h(t) $$
On an intuitive level I can understand it. I can think of [itex]f(t)+\epsilon h(t)[/itex] as [itex]f+\Delta f[/itex] since [itex]h[/itex] can be any arbitrary function, and that allows me to use the other definition of the derivative. However, it does not seem like a very rigorous way of doing it.

How can I show that [itex]\frac{\partial F(g(0))}{\partial \epsilon} = \frac{\partial F(f)}{\partial f}h(t)[/itex] using the definition of the derivative? Or, rather, a definition of the derivative..?

andrewkirk · Aug 6, 2017

The notation is not quite right, and that can easily cause confusion. The expression ##\frac{\partial F(g(0))}{\partial\epsilon}## has two problems. First, the thing after the ##\partial## in the numerator needs to be a function, but what is there is not a function but a value, ie a number: ##F(g(0))##. Secondly, all the functions you have presented are single-variable functions, so there is no need for the partial derivative symbol ##\partial##.

The second issue is easily fixed by replacing ##\partial## by ##d##. To fix the first issue, let's first define a function that is the 'function of a function' we are interested in. We define single-variable function ##G## such that ##G(\epsilon)=F(g(\epsilon))##. Then we want to calculate ##G'(0)##, where the prime symbol ##'## indicates differentiation.

These things are easier to write and work with if we use the function composition symbol ##\circ##, which plays the role that, if ##\phi,\psi## are single-variable functions then ##\phi\circ\psi## is the single-variable function that, given an input of ##x##, returns value ##\phi(\psi(x))##. With this notation, we have ##G=F\circ g##.

The chain rule tells us that ##\left(F\circ g\right)'=(F'\circ g)g'##, which means that, evaluated at ##x##, this gives ##F'(g(x))\times g'(x)##.

In the OP, it appears we want to evaluate the derivative of ##G## at ##\epsilon=0##. The chain rule tells us that this is:
$$ G'(0) = \left(F\circ g\right)'(0) = F'(g(0))g'(0) $$
Then, differentiating the expression you gave for ##g##, we see that ##g'=h##. Substituting that in, we get:
$$ G'(0) = F'(g(0))h(0) $$
which appears to be the result sought.

Avatrin · Aug 6, 2017

andrewkirk said:

The chain rule tells us that ##\left(F\circ g\right)'=(F'\circ g)g'##, which means that, evaluated at ##x##, this gives ##F'(g(x))\times g'(x)##.

In the OP, it appears we want to evaluate the derivative of ##G## at ##\epsilon=0##. The chain rule tells us that this is:
$$ G'(0) = \left(F\circ g\right)'(0) = F'(g(0))g'(0) $$
Then, differentiating the expression you gave for ##g##, we see that ##g'=h##. Substituting that in, we get:
$$ G'(0) = F'(g(0))h(0) $$
which appears to be the result sought.

No, that is what every resource on the Euler-Lagrange equation tells me. However, since we are differentiating with respect to [itex]\epsilon[/itex] we should have that [itex]\Delta g = \Delta \epsilon h[/itex].

The issue is that none of those resources tell me why what you are writing is true. I can prove the chain rule. However, none of the proofs I know seem to apply for this particular case, and the notation used by both of us hides that. That is why I specified that I want a proof that uses the definition of the derivative.

andrewkirk · Aug 6, 2017

Avatrin said:

However, none of the proofs I know seem to apply for this particular case

Why do you think the chain rule doesn't apply? We have a composed function ##G=F\circ g## and we want to differentiate it. That's exactly what the Chain Rule is about.

Perhaps it would help if you tried to re-express what it is that you have been asked to prove. As per my previous post, the statement in the OP doesn't make sense, as the thing it purports to differentiate is not a function.

FactChecker · Aug 6, 2017

Avatrin said:

However, this does not make any sense to me. Doing the computations and using the chain rule, I get:
$$\frac{\partial F(g(0))}{\partial \epsilon} = \lim_{\epsilon \to 0}\frac{F(g(\epsilon)) - F(g(0))}{g(\epsilon) -g(0) } \frac{g(\epsilon) -g(0)}{\epsilon} = \lim_{\epsilon \to 0}\frac{F(f(t)+\epsilon h(t)) - F(f(t))}{\epsilon h(t) } h(t) $$
On an intuitive level I can understand it. I can think of [itex]f(t)+\epsilon h(t)[/itex] as [itex]f+\Delta f[/itex] since [itex]h[/itex] can be any arbitrary function, and that allows me to use the other definition of the derivative. However, it does not seem like a very rigorous way of doing it.

How can I show that [itex]\frac{\partial F(g(0))}{\partial \epsilon} = \frac{\partial F(f)}{\partial f}h(t)[/itex] using the definition of the derivative? Or, rather, a definition of the derivative..?

Do you have some other definition of the derivative that you prefer? The limit of the ratio of deltas that you used above seems like that original definition and rigorous.

Avatrin · Aug 8, 2017

andrewkirk said:

Perhaps it would help if you tried to re-express what it is that you have been asked to prove. As per my previous post, the statement in the OP doesn't make sense, as the thing it purports to differentiate is not a function.

FactChecker said:

Do you have some other definition of the derivative that you prefer? The limit of the ratio of deltas that you used above seems like that original definition and rigorous.

So, you two together make me wonder if my textbook is not misleading me since the derivative of a functional may be defined differently than the definition I have been using above. I am trying to differentiate a functional since this technically falls under the calculus of variations. However, my textbook claims you only need multivariable calculus to understand the equation. Using those definitions, it does not make sense that we do this: [itex]\Delta \epsilon g = \Delta f[/itex] and call it a day.

Like I said in my original post, it makes sense on an intuitive level, but it just doesn't seem very rigorous unless it is defined that way. Skimming through a few articles on functional derivatives online, it does indeed seem to be the case that the functional derivative is defined in a way that computationally makes it equal to [itex]\Delta \epsilon g = \Delta f[/itex] (at least in this case).

andrewkirk · Aug 8, 2017

Part of the trouble is you have not defined any of your terms. If you still want help, I suggest you post definitions of the terms ##f,g,h,F,\epsilon## and explain what it is that you are trying to prove. It may also help to post an image of the page(s) that you are having trouble understanding.

Avatrin · Aug 9, 2017

andrewkirk said:

Part of the trouble is you have not defined any of your terms. If you still want help, I suggest you post definitions of the terms ##f,g,h,F,\epsilon## and explain what it is that you are trying to prove. It may also help to post an image of the page(s) that you are having trouble understanding.

The very first sentence in my OP says I am trying to understand the Euler-Lagrange equation. So:

F is a functional
f and h are continuous functions
g I did define in my original post
##\epsilon## is a variable

I am just trying to understand a step in the standard derivation of the Euler-Lagrange equation (it is the derivation I see on every website I have been through, including Wikipedia).

FactChecker · Aug 9, 2017

It may help to realize that as long as limits and division are defined, many proofs from Calc 1 are likely to be just as valid in a new application. If you have doubts, you should go step-by-step through the Calc 1 proofs and see which specific steps might be problematic in the new context. Then you can address those steps specifically.

In fact, derivatives and calculus can be used an many relatively abstract settings with the same theorems holding.

Rigorously understanding chain rule for sum of functions

1. What is the chain rule for the sum of functions?

2. Why is it important to understand the chain rule for the sum of functions?

3. How do you apply the chain rule for the sum of functions?

4. Can the chain rule for the sum of functions be used for more than two functions?

5. Are there any common mistakes when using the chain rule for the sum of functions?

Similar threads

Hot Threads

Recent Insights