I Calculus of Variations on Kullback-Liebler Divergence

Click For Summary
The discussion revolves around minimizing Kullback-Leibler (KL) divergence using variational calculus, specifically by setting the distribution q equal to p. The original poster expresses difficulty in understanding the calculus of variations, particularly the relevance of Euler-Lagrange equations and functional derivatives. Key advice includes considering the constraints on q as a probability distribution, such as ensuring it integrates to one and remains nonnegative. Additional resources are suggested for incorporating these constraints into the variational approach. Overall, the focus is on finding an optimal q to minimize KL divergence given a fixed p.
Master1022
Messages
590
Reaction score
116
TL;DR
How to use calculus of variations on KL-divergence
Hi,

This isn't a homework question, but a side task given in a machine learning class I am taking.

Question: Using variational calculus, prove that one can minimize the KL-divergence by choosing ##q## to be equal to ##p##, given a fixed ##p##.

Attempt:

Unfortunately, I have never seen calculus of variations (it was suggested that we teach ourselves). I have been trying to watch some videos online, but I mainly just see references to Euler-Lagrange equations which I don't think are of much relevance here (please correct me if I am wrong) and not much explanation of the functional derivatives.

Nonetheless, I think this shouldn't be too hard, but am struggling to understand how to use the tools.

If we start with the definition of the KL-divergence we get:
\text{KL}[p||q] = \int p(x) log(\frac{p(x)}{q(x)}) dx = I

Would it be possible for anyone to help me get started on the path? I am not sure how to proceed really after I write down ## \frac{\delta I}{\delta q} ##?

Thanks in advance
 
Mathematics news on Phys.org
Euler Lagrange is what you want, but you also have to worry about the conditions that you have on q that come from it being a probability distribution, namely that the integral is 1 and it's always nonnegative. I think the integral constraint is the important part

http://liberzon.csl.illinois.edu/teaching/cvoc/node38.html

Has some notes on how to add constraints to the euler Lagrange equations.
 
Maybe you should start with fixed p, and then try to get optimal q to maximize or minimize KL.
 
Seemingly by some mathematical coincidence, a hexagon of sides 2,2,7,7, 11, and 11 can be inscribed in a circle of radius 7. The other day I saw a math problem on line, which they said came from a Polish Olympiad, where you compute the length x of the 3rd side which is the same as the radius, so that the sides of length 2,x, and 11 are inscribed on the arc of a semi-circle. The law of cosines applied twice gives the answer for x of exactly 7, but the arithmetic is so complex that the...