Functional derivative: chain rule

CompuChip · Jun 18, 2007

Hmm, I've been working with functional derivatives lately, and some things aren't particularly clear.

I took the definition Wikipedia gives, but since I know little of distribution theory I don't fully get it all (I just read the bracket thing as a function inner product :))

.

Anyway, I tried to derive some basic identities like the sum and product rule, which are quite straightforward, but I got kinda stuck at the chain rule. Suppose we have a functional [tex]\mathcal F[\rho][/tex] but [tex]\rho[\sigma][/tex] is itself a functional. Then it should be true that
[tex]\frac{\delta \mathcal F[\rho]}{\delta \sigma(x)} = \int \frac{\delta \mathcal F[\rho]}{\delta \rho(x')} \frac{\delta \rho(x')}{\delta \sigma(x)} \, \mathrm{d}x'[/tex]
but how do I go about proving this?

Thanks!

CompuChip · Jun 24, 2007

I don't want to sound impatient, but ... bump[/size] ... anyone?

jambaugh · Jun 24, 2007

The "bracket thing" is an inner product for functions (more generally distributions which are formal functions define only inside integrals and not necessarily in terms of values, as e.g. the Dirac delta "function")

But there is a problem with your invocation of the chain rule. The functional [itex]\mathcal{F}[/itex] maps functions to scalars. If you assume [itex]\rho[/itex] is again a functional then you can't compose it with [itex]\mathcal{F}[/itex]

In general you can't compose functionals since their domain and range are distinct types (functions vs numbers).

So you can either have a chain rule of the form:
[tex]\frac{\delta\mathcal{F}[\rho(\sigma)]}{\delta \sigma}= \frac{\delta \mathcal{F}[\rho]}{\delta \rho}\frac{d \rho}{d \sigma}[/tex]

Or invoke a parameterized functional (functional valued function) which will give a much messier chain rule.

It may help (though not I think to prove the general case) to start with functionals defined by integrals, i.e.:
[tex]\mathcal{F}[\rho] \equiv \int \phi(\rho(x))dx[/tex]

then you get functional differential:
[tex]\delta \mathcal{F}[\rho] = \int \frac{d \phi(\rho)}{d\rho}\delta \rho dx[/tex]

Then the functional derivative (as a distribution) is:
[tex]\left\langle \frac{\delta \mathcal{F}[\rho]}{\delta\rho} , f\right\rangle=\int\frac{d \phi(\rho)}{d\rho} f dx[/tex]
thence
[tex]\frac{\delta \mathcal{F}[\rho]}{\delta\rho} =\frac{d \phi(\rho)}{d\rho}= \phi' \circ \rho[/tex]

In standard calculus you will note that the derivative of a function is a multiplier for a linear term in the variable. The n-th derivative is the multiplier for an n-th degree term. Effectively then the first derivative times the variable is a linear function, and higher order n-th degree functions.

In the functional case, the first functional derivative contracts with the variable (a function) to yield a linear functional. This is why you get a "distribution" instead of a function. Because the space of functions is infinite dimensional, its dual space consists of more than linear functionals of the form:
[tex]f \mapsto \left\langle \phi , f \right\rangle[/tex]
where [itex]\phi[/itex] is a function. We must define a more general class of objects, distributions which are only defined by first giving a linear functional and then rewriting it in the form of an inner product above but with the [itex]\phi[/itex] not meaningful as an actual function.

Note you can generalize further by considering an operator (not necessarily linear) which maps functions to functions and for which we can (with some restrictions) define an operator derivative

[tex]\frac{\Delta \Omega[\phi]}{\Delta \phi}[/tex]
as an linear operator (for a given [itex]\phi[/itex]).

Thence:
[tex]\frac{\Delta \Omega[\phi]}{\Delta \phi}[a \xi + b \eta] =a\frac{\Delta \Omega[\phi]}{\Delta \phi}[\xi] +b\frac{\Delta \Omega[\phi]}{\Delta \phi}[\eta][/tex]
is a function. (a and b scalars, f and g functions.)

Since you can compose operators you can then better discuss an operator chain rule:

[tex]\frac{\Delta \Omega[\Xi[\psi]]}{\Delta \psi}[f] = \frac{\Delta \Omega[\Xi[\psi]]}{\Delta \Xi}\left[\frac{\Delta \Xi}{\Delta \psi}[f]\right][/tex]

In essence one generalizes the Taylor expansion of a general operator in terms of "constant" plus linear operator plus bilinear operator ...

jambaugh · Jun 24, 2007

The derivative of "The Derivative"

I noticed that I didn't actually give you the definition of the operator derivative. It is defined as a linear operator such that for small variations of the function:
[tex]\Omega[f+\delta f] = \Omega[f]+ \frac{\Delta\Omega[f]}{\Delta f}[\delta f] + O(\delta f^2)[/tex]

To get more rigorous:
[tex]\Omega[f+\delta f](x) = \Omega[f](x)+ \frac{\Delta\Omega[f]}{\Delta f}[\delta f](x) + O(\delta f^2)[/tex]
where the order of the square of the variation is defined by taking the maximum magnitude of [itex]\delta f(x)\cdot \delta f(y)[/itex] for all values of x and y.

You can then define the derivative explicitly by:
[tex]\frac{\Delta\Omega[f]}{\Delta f}[\delta f](x) = \lim_{\epsilon\to 0}<br /> \frac{\Omega[f + \epsilon\delta f](x)-\Omega[f](x)}{\epsilon}[/tex]
where [itex]\delta f[/itex] is any of a restricted class of functions e.g. test functions or bounded smooth functions with compact support or something similar.

Here is a clarifying example.

The derivative of a function is the action of an operator [itex]\mathbf{D}[/itex] on that function.

We can thus take the operator derivative of the differential operator:

[tex]\mathbf{D}'[f] = \frac{\Delta \mathbf{D}[f]}{\Delta f}[/tex]

Since we are taking an operator derivative [itex]\mathbf{D}'[f][/itex] will be a linear operator. But since the derivative is a linear operator its derivative will be a constant (with regard to f).

In short [itex]\mathbf{D}'[f_1] = \mathbf{D}'[f_2][/itex].

So the "value" of the "derivative of the derivative" is:

[tex]\mathbf{D}'[f][g]= \lim_{\epsilon\to 0}\frac{\mathbf{D}[f+\epsilon g]-\mathbf{D}[f]}{\epsilon}<br /> =\lim_{\epsilon\to 0} \frac{\mathbf{D}[f] +\epsilon\mathbf{D}[g]-\mathbf{D}[f]}{\epsilon} = \mathbf{D}[g][/tex]

So the "derivative of the derivative" is "the derivative" in the same sense that the derivative of f(x)=cx is c viewed as a linear multiplier.

It may help more to think of the operator derivative as acting like a Greens function:
The derivative being linear can be expressed as an integral:
[tex]\mathbf{D}[f](x) = \int \delta'(x-y)f(y)dy[/tex]
where [itex]\delta'(x-y)[/itex] is the formal derivative of the Dirac delta function (limit of derivatives of normalized Gaussian functions as the variation goes to zero).The derivative then is the linear operator defined in "component form" by the two valued function [itex]D(x,y) = \delta'(x-y)[/itex]. Think of the variables as acting analogous to the indices of vectors or matrices.

You can think of the higher order functional and operator derivatives as generalized Taylor expansions:

For functionals:
[tex]\mathcal{F}[\phi] = F^{(0)}+ \int F^{(1)}(x)\phi(x)dx + \iint F^{(2)}(x,y)\phi(x)\phi(y) dx dy + \iiint F^{(3)}(x,y,z)\phi(x)\phi(y)\phi(z)dx dy dz + \cdots[/tex]

For operators:
[tex]\Omega[\phi](x) = \Omega^{(0)}(x) + \int \Omega^{(1)}(x,y)\phi(y)dy + \iint \Omega^{(2)}(x,y,z)\phi(y)\phi(z) dy dz + \cdots[/tex]
where these "multi-variable functions" [itex]\Omega^{(k)}[/itex] are rather multi-distributions since they only appear in integrals.

CompuChip · Jun 25, 2007

Wow, thanks! Now there's some reply :)

Obviously you are right and functionals can't be composed with functionals. I guess I little confused by the notation in my (physics) notes. I read and re-read it, and I think the idea is the following: We have a functional [tex]\Omega[\sigma][/tex] which we want to derive w.r.t. [tex]\rho[/tex]. The implicit assumption is, that there is some physical connection between [tex]\rho[/tex] and [tex]\sigma[/tex] (as in: they are physical quantities which can -- in principle -- be calculated given one of them). I guess that makes [tex]\sigma[/tex] a map of functions (plug in a [tex]\rho[/tex] and you get a function [tex]\sigma[\rho][/tex] -- here's the square brackets that I passed too soon in reading and caused the confusion). Actually, in the notes on the left hand side there is a functional of [tex]\rho[/tex], on the right hand side there is a functional of [tex]\sigma[/tex] and "... where it is understood that [tex]\sigma(r) = \sigma[\rho](r)[/tex] corresponds to that [...] potential that gives rise to a [...] density profile [tex]\rho[/tex]"

So now let me try to think it through mathematically. We have a functional [tex]\Omega[\sigma][/tex], where [tex]\sigma(x) = F[\rho] + f(x)[/tex] is a (given) function depending on x, plus a number which depends on a function :) So basically then, [tex]\Omega[/tex] is a functional of [tex]\rho[/tex], because giving a [tex]\rho[/tex] determines my [tex]\sigma[/tex] and let's me calculate the corresponding [tex]\Omega[/tex]. Now I want to calculate [tex]\frac{\delta \Omega}{\delta \rho(r)}[/tex] but I want to express it in terms of the derivative w.r.t. [tex]\sigma[/tex]. For example, suppose that [tex]\Omega[\sigma] = \int \phi( \sigma(x) ) \, dx[/tex]. Now I'm quite sure that [tex]\frac{\delta \sigma}{\delta \rho(x') } = \frac{\delta F[\rho]}{\delta \rho(x')}[/tex] and [tex]\frac{\delta \Omega}{\delta \sigma(r)} = \frac{\partial \phi(\sigma)}{\partial \sigma}[/tex] but I don't really see how to combine them.

I'm going to read your posts again, and see whether it's really necessary for me to talk about operators. They will anyway help me to understand the whole matter a bit better :) Thanks again!

jambaugh · Jun 25, 2007

Where I see you quote:
"... where it is understood that [tex]\sigma(\tau) = \sigma[\rho](\tau)[/tex]..."
would seem to say that [tex]\sigma[/tex] is in fact an operator in the sense that it takes as an argument [tex]\rho[/tex] and the result is acting as a function (evaluated at [tex]\tau[/tex].)

But you can always view these operators on functions as variable dependent functionals. Given the operator [itex]\sigma[/itex] maps the function [itex]\rho[/itex] to the function [itex]\sigma[\rho][/itex] then we can look upon the value of that function [itex]\sigma[\rho](\tau)[/itex] as the value of a (tau dependent) functional [\itex]\sigma_t[\rho][/itex].

With that in mind you can use the functional derivative to define the operator derivative and use the same notation and in this context define a chain rule.

Derivator · Jul 1, 2011

jambaugh said:

...

But there is a problem with your invocation of the chain rule. The functional [itex]\mathcal{F}[/itex] maps functions to scalars. If you assume [itex]\rho[/itex] is again a functional then you can't compose it with [itex]\mathcal{F}[/itex]

...

Ok, this is why [itex]F[\rho[\sigma]][/itex] is undefined in principle. But how does one have to interpret the functional composite [itex]F[\rho[\sigma]][/itex] in order to arrive at the following chain rule?
[tex]\frac{\delta \mathcal F[\rho]}{\delta \sigma(x)} = \int \frac{\delta \mathcal F[\rho]}{\delta \rho(x')} \frac{\delta \rho(x')}{\delta \sigma(x)} \, \mathrm{d}x'[/tex]

Derivator · Oct 29, 2012

*push*

Functional derivative: chain rule

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Mixed approximation vs. full approximation for a power series expansion

Undergrad Correct Upper/Lower limits for continuation of solutions for ODE

Undergrad 2nd order ODE's, variation of parameters and the notorious constraint

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect