Functional derivative: chain rule

Click For Summary
SUMMARY

The discussion centers on the complexities of applying the chain rule to functional derivatives, particularly when dealing with functionals like \(\mathcal{F}[\rho]\) and \(\rho[\sigma]\). It is established that functionals cannot be composed directly due to their distinct domains and ranges. The correct formulation of the chain rule in this context is given as \(\frac{\delta \mathcal{F}[\rho]}{\delta \sigma} = \int \frac{\delta \mathcal{F}[\rho]}{\delta \rho} \frac{\delta \rho}{\delta \sigma} \, \mathrm{d}x'\). The discussion also highlights the importance of understanding operator derivatives and their relationship to functional derivatives.

PREREQUISITES
  • Understanding of functional derivatives and their definitions.
  • Familiarity with distribution theory and inner products in functional analysis.
  • Knowledge of operator theory and the concept of linear operators.
  • Basic calculus concepts, particularly the chain rule and Taylor expansions.
NEXT STEPS
  • Study the properties of functional derivatives in the context of variational calculus.
  • Learn about the application of distributions in physics and engineering.
  • Explore operator theory, focusing on linear and non-linear operators.
  • Investigate the implications of parameterized functionals and their derivatives.
USEFUL FOR

Mathematicians, physicists, and engineers working with variational principles, functional analysis, and those seeking to deepen their understanding of calculus in infinite-dimensional spaces.

CompuChip
Science Advisor
Homework Helper
Messages
4,305
Reaction score
49
Hmm, I've been working with functional derivatives lately, and some things aren't particularly clear.

I took the definition Wikipedia gives, but since I know little of distribution theory I don't fully get it all (I just read the bracket thing as a function inner product :)).

Anyway, I tried to derive some basic identities like the sum and product rule, which are quite straightforward, but I got kinda stuck at the chain rule. Suppose we have a functional \mathcal F[\rho] but \rho[\sigma] is itself a functional. Then it should be true that
\frac{\delta \mathcal F[\rho]}{\delta \sigma(x)} = \int \frac{\delta \mathcal F[\rho]}{\delta \rho(x')} \frac{\delta \rho(x')}{\delta \sigma(x)} \, \mathrm{d}x'
but how do I go about proving this?

Thanks!
 
Last edited:
Physics news on Phys.org
I don't want to sound impatient, but ... bump[/size] ... anyone?
 
The "bracket thing" is an inner product for functions (more generally distributions which are formal functions define only inside integrals and not necessarily in terms of values, as e.g. the Dirac delta "function")

But there is a problem with your invocation of the chain rule. The functional \mathcal{F} maps functions to scalars. If you assume \rho is again a functional then you can't compose it with \mathcal{F}

In general you can't compose functionals since their domain and range are distinct types (functions vs numbers).

So you can either have a chain rule of the form:
\frac{\delta\mathcal{F}[\rho(\sigma)]}{\delta \sigma}= \frac{\delta \mathcal{F}[\rho]}{\delta \rho}\frac{d \rho}{d \sigma}

Or invoke a parameterized functional (functional valued function) which will give a much messier chain rule.

It may help (though not I think to prove the general case) to start with functionals defined by integrals, i.e.:
\mathcal{F}[\rho] \equiv \int \phi(\rho(x))dx

then you get functional differential:
\delta \mathcal{F}[\rho] = \int \frac{d \phi(\rho)}{d\rho}\delta \rho dx

Then the functional derivative (as a distribution) is:
\left\langle \frac{\delta \mathcal{F}[\rho]}{\delta\rho} , f\right\rangle=\int\frac{d \phi(\rho)}{d\rho} f dx
thence
\frac{\delta \mathcal{F}[\rho]}{\delta\rho} =\frac{d \phi(\rho)}{d\rho}= \phi' \circ \rho

In standard calculus you will note that the derivative of a function is a multiplier for a linear term in the variable. The n-th derivative is the multiplier for an n-th degree term. Effectively then the first derivative times the variable is a linear function, and higher order n-th degree functions.

In the functional case, the first functional derivative contracts with the variable (a function) to yield a linear functional. This is why you get a "distribution" instead of a function. Because the space of functions is infinite dimensional, its dual space consists of more than linear functionals of the form:
f \mapsto \left\langle \phi , f \right\rangle
where \phi is a function. We must define a more general class of objects, distributions which are only defined by first giving a linear functional and then rewriting it in the form of an inner product above but with the \phi not meaningful as an actual function.

Note you can generalize further by considering an operator (not necessarily linear) which maps functions to functions and for which we can (with some restrictions) define an operator derivative

\frac{\Delta \Omega[\phi]}{\Delta \phi}
as an linear operator (for a given \phi).

Thence:
\frac{\Delta \Omega[\phi]}{\Delta \phi}[a \xi + b \eta] =a\frac{\Delta \Omega[\phi]}{\Delta \phi}[\xi] +b\frac{\Delta \Omega[\phi]}{\Delta \phi}[\eta]
is a function. (a and b scalars, f and g functions.)

Since you can compose operators you can then better discuss an operator chain rule:

\frac{\Delta \Omega[\Xi[\psi]]}{\Delta \psi}[f] = \frac{\Delta \Omega[\Xi[\psi]]}{\Delta \Xi}\left[\frac{\Delta \Xi}{\Delta \psi}[f]\right]

In essence one generalizes the Taylor expansion of a general operator in terms of "constant" plus linear operator plus bilinear operator ...
 
Last edited:
The derivative of "The Derivative"

I noticed that I didn't actually give you the definition of the operator derivative. It is defined as a linear operator such that for small variations of the function:
\Omega[f+\delta f] = \Omega[f]+ \frac{\Delta\Omega[f]}{\Delta f}[\delta f] + O(\delta f^2)

To get more rigorous:
\Omega[f+\delta f](x) = \Omega[f](x)+ \frac{\Delta\Omega[f]}{\Delta f}[\delta f](x) + O(\delta f^2)
where the order of the square of the variation is defined by taking the maximum magnitude of \delta f(x)\cdot \delta f(y) for all values of x and y.

You can then define the derivative explicitly by:
\frac{\Delta\Omega[f]}{\Delta f}[\delta f](x) = \lim_{\epsilon\to 0}<br /> \frac{\Omega[f + \epsilon\delta f](x)-\Omega[f](x)}{\epsilon}
where \delta f is any of a restricted class of functions e.g. test functions or bounded smooth functions with compact support or something similar.

Here is a clarifying example.

The derivative of a function is the action of an operator \mathbf{D} on that function.

We can thus take the operator derivative of the differential operator:

\mathbf{D}&#039;[f] = \frac{\Delta \mathbf{D}[f]}{\Delta f}

Since we are taking an operator derivative \mathbf{D}&#039;[f] will be a linear operator. But since the derivative is a linear operator its derivative will be a constant (with regard to f).

In short \mathbf{D}&#039;[f_1] = \mathbf{D}&#039;[f_2].

So the "value" of the "derivative of the derivative" is:

\mathbf{D}&#039;[f][g]= \lim_{\epsilon\to 0}\frac{\mathbf{D}[f+\epsilon g]-\mathbf{D}[f]}{\epsilon}<br /> =\lim_{\epsilon\to 0} \frac{\mathbf{D}[f] +\epsilon\mathbf{D}[g]-\mathbf{D}[f]}{\epsilon} = \mathbf{D}[g]

So the "derivative of the derivative" is "the derivative" in the same sense that the derivative of f(x)=cx is c viewed as a linear multiplier.

It may help more to think of the operator derivative as acting like a Greens function:
The derivative being linear can be expressed as an integral:
\mathbf{D}[f](x) = \int \delta&#039;(x-y)f(y)dy
where \delta&#039;(x-y) is the formal derivative of the Dirac delta function (limit of derivatives of normalized Gaussian functions as the variation goes to zero).The derivative then is the linear operator defined in "component form" by the two valued function D(x,y) = \delta&#039;(x-y). Think of the variables as acting analogous to the indices of vectors or matrices.

You can think of the higher order functional and operator derivatives as generalized Taylor expansions:

For functionals:
\mathcal{F}[\phi] = F^{(0)}+ \int F^{(1)}(x)\phi(x)dx + \iint F^{(2)}(x,y)\phi(x)\phi(y) dx dy + \iiint F^{(3)}(x,y,z)\phi(x)\phi(y)\phi(z)dx dy dz + \cdots

For operators:
\Omega[\phi](x) = \Omega^{(0)}(x) + \int \Omega^{(1)}(x,y)\phi(y)dy + \iint \Omega^{(2)}(x,y,z)\phi(y)\phi(z) dy dz + \cdots
where these "multi-variable functions" \Omega^{(k)} are rather multi-distributions since they only appear in integrals.
 
Last edited:
Wow, thanks! Now there's some reply :)

Obviously you are right and functionals can't be composed with functionals. I guess I little confused by the notation in my (physics) notes. I read and re-read it, and I think the idea is the following: We have a functional \Omega[\sigma] which we want to derive w.r.t. \rho. The implicit assumption is, that there is some physical connection between \rho and \sigma (as in: they are physical quantities which can -- in principle -- be calculated given one of them). I guess that makes \sigma a map of functions (plug in a \rho and you get a function \sigma[\rho] -- here's the square brackets that I passed too soon in reading and caused the confusion). Actually, in the notes on the left hand side there is a functional of \rho, on the right hand side there is a functional of \sigma and "... where it is understood that \sigma(r) = \sigma[\rho](r) corresponds to that [...] potential that gives rise to a [...] density profile \rho"

So now let me try to think it through mathematically. We have a functional \Omega[\sigma], where \sigma(x) = F[\rho] + f(x) is a (given) function depending on x, plus a number which depends on a function :) So basically then, \Omega is a functional of \rho, because giving a \rho determines my \sigma and let's me calculate the corresponding \Omega. Now I want to calculate \frac{\delta \Omega}{\delta \rho(r)} but I want to express it in terms of the derivative w.r.t. \sigma. For example, suppose that \Omega[\sigma] = \int \phi( \sigma(x) ) \, dx. Now I'm quite sure that \frac{\delta \sigma}{\delta \rho(x&#039;) } = \frac{\delta F[\rho]}{\delta \rho(x&#039;)} and \frac{\delta \Omega}{\delta \sigma(r)} = \frac{\partial \phi(\sigma)}{\partial \sigma} but I don't really see how to combine them.

I'm going to read your posts again, and see whether it's really necessary for me to talk about operators. They will anyway help me to understand the whole matter a bit better :) Thanks again!
 
Where I see you quote:
"... where it is understood that \sigma(\tau) = \sigma[\rho](\tau)..."
would seem to say that \sigma is in fact an operator in the sense that it takes as an argument \rho and the result is acting as a function (evaluated at \tau.)

But you can always view these operators on functions as variable dependent functionals. Given the operator \sigma maps the function \rho to the function \sigma[\rho] then we can look upon the value of that function \sigma[\rho](\tau) as the value of a (tau dependent) functional [\itex]\sigma_t[\rho][/itex].

With that in mind you can use the functional derivative to define the operator derivative and use the same notation and in this context define a chain rule.
 
jambaugh said:
...

But there is a problem with your invocation of the chain rule. The functional \mathcal{F} maps functions to scalars. If you assume \rho is again a functional then you can't compose it with \mathcal{F}

...
Ok, this is why F[\rho[\sigma]] is undefined in principle. But how does one have to interpret the functional composite F[\rho[\sigma]] in order to arrive at the following chain rule?
\frac{\delta \mathcal F[\rho]}{\delta \sigma(x)} = \int \frac{\delta \mathcal F[\rho]}{\delta \rho(x&#039;)} \frac{\delta \rho(x&#039;)}{\delta \sigma(x)} \, \mathrm{d}x&#039;
 
*push*
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 3 ·
Replies
3
Views
545
  • · Replies 12 ·
Replies
12
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
2
Views
2K
  • · Replies 10 ·
Replies
10
Views
3K