Functional derivative: chain rule

AI Thread Summary
The discussion centers on the complexities of functional derivatives, particularly the chain rule when dealing with functionals that depend on other functionals. The initial confusion arises from the assumption that one can compose functionals, which is incorrect since their domains and ranges are distinct. The correct approach involves recognizing that while a functional maps functions to scalars, one can express relationships between functionals through parameterized functionals or operator derivatives. The participants suggest that starting with functionals defined by integrals can clarify the derivation process, emphasizing the need to treat functionals and their derivatives carefully. Ultimately, understanding the interplay between functionals and operators is crucial for applying the chain rule correctly in this context.
CompuChip
Science Advisor
Homework Helper
Messages
4,305
Reaction score
49
Hmm, I've been working with functional derivatives lately, and some things aren't particularly clear.

I took the definition Wikipedia gives, but since I know little of distribution theory I don't fully get it all (I just read the bracket thing as a function inner product :)).

Anyway, I tried to derive some basic identities like the sum and product rule, which are quite straightforward, but I got kinda stuck at the chain rule. Suppose we have a functional \mathcal F[\rho] but \rho[\sigma] is itself a functional. Then it should be true that
\frac{\delta \mathcal F[\rho]}{\delta \sigma(x)} = \int \frac{\delta \mathcal F[\rho]}{\delta \rho(x')} \frac{\delta \rho(x')}{\delta \sigma(x)} \, \mathrm{d}x'
but how do I go about proving this?

Thanks!
 
Last edited:
Mathematics news on Phys.org
I don't want to sound impatient, but ... bump[/size] ... anyone?
 
The "bracket thing" is an inner product for functions (more generally distributions which are formal functions define only inside integrals and not necessarily in terms of values, as e.g. the Dirac delta "function")

But there is a problem with your invocation of the chain rule. The functional \mathcal{F} maps functions to scalars. If you assume \rho is again a functional then you can't compose it with \mathcal{F}

In general you can't compose functionals since their domain and range are distinct types (functions vs numbers).

So you can either have a chain rule of the form:
\frac{\delta\mathcal{F}[\rho(\sigma)]}{\delta \sigma}= \frac{\delta \mathcal{F}[\rho]}{\delta \rho}\frac{d \rho}{d \sigma}

Or invoke a parameterized functional (functional valued function) which will give a much messier chain rule.

It may help (though not I think to prove the general case) to start with functionals defined by integrals, i.e.:
\mathcal{F}[\rho] \equiv \int \phi(\rho(x))dx

then you get functional differential:
\delta \mathcal{F}[\rho] = \int \frac{d \phi(\rho)}{d\rho}\delta \rho dx

Then the functional derivative (as a distribution) is:
\left\langle \frac{\delta \mathcal{F}[\rho]}{\delta\rho} , f\right\rangle=\int\frac{d \phi(\rho)}{d\rho} f dx
thence
\frac{\delta \mathcal{F}[\rho]}{\delta\rho} =\frac{d \phi(\rho)}{d\rho}= \phi' \circ \rho

In standard calculus you will note that the derivative of a function is a multiplier for a linear term in the variable. The n-th derivative is the multiplier for an n-th degree term. Effectively then the first derivative times the variable is a linear function, and higher order n-th degree functions.

In the functional case, the first functional derivative contracts with the variable (a function) to yield a linear functional. This is why you get a "distribution" instead of a function. Because the space of functions is infinite dimensional, its dual space consists of more than linear functionals of the form:
f \mapsto \left\langle \phi , f \right\rangle
where \phi is a function. We must define a more general class of objects, distributions which are only defined by first giving a linear functional and then rewriting it in the form of an inner product above but with the \phi not meaningful as an actual function.

Note you can generalize further by considering an operator (not necessarily linear) which maps functions to functions and for which we can (with some restrictions) define an operator derivative

\frac{\Delta \Omega[\phi]}{\Delta \phi}
as an linear operator (for a given \phi).

Thence:
\frac{\Delta \Omega[\phi]}{\Delta \phi}[a \xi + b \eta] =a\frac{\Delta \Omega[\phi]}{\Delta \phi}[\xi] +b\frac{\Delta \Omega[\phi]}{\Delta \phi}[\eta]
is a function. (a and b scalars, f and g functions.)

Since you can compose operators you can then better discuss an operator chain rule:

\frac{\Delta \Omega[\Xi[\psi]]}{\Delta \psi}[f] = \frac{\Delta \Omega[\Xi[\psi]]}{\Delta \Xi}\left[\frac{\Delta \Xi}{\Delta \psi}[f]\right]

In essence one generalizes the Taylor expansion of a general operator in terms of "constant" plus linear operator plus bilinear operator ...
 
Last edited:
The derivative of "The Derivative"

I noticed that I didn't actually give you the definition of the operator derivative. It is defined as a linear operator such that for small variations of the function:
\Omega[f+\delta f] = \Omega[f]+ \frac{\Delta\Omega[f]}{\Delta f}[\delta f] + O(\delta f^2)

To get more rigorous:
\Omega[f+\delta f](x) = \Omega[f](x)+ \frac{\Delta\Omega[f]}{\Delta f}[\delta f](x) + O(\delta f^2)
where the order of the square of the variation is defined by taking the maximum magnitude of \delta f(x)\cdot \delta f(y) for all values of x and y.

You can then define the derivative explicitly by:
\frac{\Delta\Omega[f]}{\Delta f}[\delta f](x) = \lim_{\epsilon\to 0}<br /> \frac{\Omega[f + \epsilon\delta f](x)-\Omega[f](x)}{\epsilon}
where \delta f is any of a restricted class of functions e.g. test functions or bounded smooth functions with compact support or something similar.

Here is a clarifying example.

The derivative of a function is the action of an operator \mathbf{D} on that function.

We can thus take the operator derivative of the differential operator:

\mathbf{D}&#039;[f] = \frac{\Delta \mathbf{D}[f]}{\Delta f}

Since we are taking an operator derivative \mathbf{D}&#039;[f] will be a linear operator. But since the derivative is a linear operator its derivative will be a constant (with regard to f).

In short \mathbf{D}&#039;[f_1] = \mathbf{D}&#039;[f_2].

So the "value" of the "derivative of the derivative" is:

\mathbf{D}&#039;[f][g]= \lim_{\epsilon\to 0}\frac{\mathbf{D}[f+\epsilon g]-\mathbf{D}[f]}{\epsilon}<br /> =\lim_{\epsilon\to 0} \frac{\mathbf{D}[f] +\epsilon\mathbf{D}[g]-\mathbf{D}[f]}{\epsilon} = \mathbf{D}[g]

So the "derivative of the derivative" is "the derivative" in the same sense that the derivative of f(x)=cx is c viewed as a linear multiplier.

It may help more to think of the operator derivative as acting like a Greens function:
The derivative being linear can be expressed as an integral:
\mathbf{D}[f](x) = \int \delta&#039;(x-y)f(y)dy
where \delta&#039;(x-y) is the formal derivative of the Dirac delta function (limit of derivatives of normalized Gaussian functions as the variation goes to zero).The derivative then is the linear operator defined in "component form" by the two valued function D(x,y) = \delta&#039;(x-y). Think of the variables as acting analogous to the indices of vectors or matrices.

You can think of the higher order functional and operator derivatives as generalized Taylor expansions:

For functionals:
\mathcal{F}[\phi] = F^{(0)}+ \int F^{(1)}(x)\phi(x)dx + \iint F^{(2)}(x,y)\phi(x)\phi(y) dx dy + \iiint F^{(3)}(x,y,z)\phi(x)\phi(y)\phi(z)dx dy dz + \cdots

For operators:
\Omega[\phi](x) = \Omega^{(0)}(x) + \int \Omega^{(1)}(x,y)\phi(y)dy + \iint \Omega^{(2)}(x,y,z)\phi(y)\phi(z) dy dz + \cdots
where these "multi-variable functions" \Omega^{(k)} are rather multi-distributions since they only appear in integrals.
 
Last edited:
Wow, thanks! Now there's some reply :)

Obviously you are right and functionals can't be composed with functionals. I guess I little confused by the notation in my (physics) notes. I read and re-read it, and I think the idea is the following: We have a functional \Omega[\sigma] which we want to derive w.r.t. \rho. The implicit assumption is, that there is some physical connection between \rho and \sigma (as in: they are physical quantities which can -- in principle -- be calculated given one of them). I guess that makes \sigma a map of functions (plug in a \rho and you get a function \sigma[\rho] -- here's the square brackets that I passed too soon in reading and caused the confusion). Actually, in the notes on the left hand side there is a functional of \rho, on the right hand side there is a functional of \sigma and "... where it is understood that \sigma(r) = \sigma[\rho](r) corresponds to that [...] potential that gives rise to a [...] density profile \rho"

So now let me try to think it through mathematically. We have a functional \Omega[\sigma], where \sigma(x) = F[\rho] + f(x) is a (given) function depending on x, plus a number which depends on a function :) So basically then, \Omega is a functional of \rho, because giving a \rho determines my \sigma and let's me calculate the corresponding \Omega. Now I want to calculate \frac{\delta \Omega}{\delta \rho(r)} but I want to express it in terms of the derivative w.r.t. \sigma. For example, suppose that \Omega[\sigma] = \int \phi( \sigma(x) ) \, dx. Now I'm quite sure that \frac{\delta \sigma}{\delta \rho(x&#039;) } = \frac{\delta F[\rho]}{\delta \rho(x&#039;)} and \frac{\delta \Omega}{\delta \sigma(r)} = \frac{\partial \phi(\sigma)}{\partial \sigma} but I don't really see how to combine them.

I'm going to read your posts again, and see whether it's really necessary for me to talk about operators. They will anyway help me to understand the whole matter a bit better :) Thanks again!
 
Where I see you quote:
"... where it is understood that \sigma(\tau) = \sigma[\rho](\tau)..."
would seem to say that \sigma is in fact an operator in the sense that it takes as an argument \rho and the result is acting as a function (evaluated at \tau.)

But you can always view these operators on functions as variable dependent functionals. Given the operator \sigma maps the function \rho to the function \sigma[\rho] then we can look upon the value of that function \sigma[\rho](\tau) as the value of a (tau dependent) functional [\itex]\sigma_t[\rho][/itex].

With that in mind you can use the functional derivative to define the operator derivative and use the same notation and in this context define a chain rule.
 
jambaugh said:
...

But there is a problem with your invocation of the chain rule. The functional \mathcal{F} maps functions to scalars. If you assume \rho is again a functional then you can't compose it with \mathcal{F}

...
Ok, this is why F[\rho[\sigma]] is undefined in principle. But how does one have to interpret the functional composite F[\rho[\sigma]] in order to arrive at the following chain rule?
\frac{\delta \mathcal F[\rho]}{\delta \sigma(x)} = \int \frac{\delta \mathcal F[\rho]}{\delta \rho(x&#039;)} \frac{\delta \rho(x&#039;)}{\delta \sigma(x)} \, \mathrm{d}x&#039;
 
*push*
 
Back
Top