Calculus of Variations on Kullback-Liebler Divergence

Click For Summary

Discussion Overview

The discussion revolves around the application of variational calculus to minimize the Kullback-Leibler (KL) divergence by selecting a probability distribution ##q## that is equal to a fixed distribution ##p##. Participants explore the mathematical foundations and constraints involved in this process, particularly focusing on the calculus of variations and the properties of probability distributions.

Discussion Character

  • Exploratory, Technical explanation, Mathematical reasoning

Main Points Raised

  • One participant expresses uncertainty about the relevance of Euler-Lagrange equations and functional derivatives in the context of minimizing KL-divergence.
  • Another participant emphasizes the importance of considering the constraints on ##q##, specifically that it must be a probability distribution, which includes the conditions of non-negativity and normalization (integral equals 1).
  • Some participants suggest starting with a fixed ##p## and then finding the optimal ##q## to either maximize or minimize the KL-divergence.
  • A link to additional resources on incorporating constraints into the Euler-Lagrange equations is provided for further exploration.

Areas of Agreement / Disagreement

Participants generally agree on the need to consider the constraints of probability distributions when applying variational calculus, but there is no consensus on the specific approach to take or the relevance of certain mathematical tools.

Contextual Notes

Participants acknowledge the need to address the integral constraint and non-negativity conditions of ##q##, but the discussion does not resolve how these constraints should be incorporated into the variational calculus framework.

Master1022
Messages
590
Reaction score
116
TL;DR
How to use calculus of variations on KL-divergence
Hi,

This isn't a homework question, but a side task given in a machine learning class I am taking.

Question: Using variational calculus, prove that one can minimize the KL-divergence by choosing ##q## to be equal to ##p##, given a fixed ##p##.

Attempt:

Unfortunately, I have never seen calculus of variations (it was suggested that we teach ourselves). I have been trying to watch some videos online, but I mainly just see references to Euler-Lagrange equations which I don't think are of much relevance here (please correct me if I am wrong) and not much explanation of the functional derivatives.

Nonetheless, I think this shouldn't be too hard, but am struggling to understand how to use the tools.

If we start with the definition of the KL-divergence we get:
\text{KL}[p||q] = \int p(x) log(\frac{p(x)}{q(x)}) dx = I

Would it be possible for anyone to help me get started on the path? I am not sure how to proceed really after I write down ## \frac{\delta I}{\delta q} ##?

Thanks in advance
 
Physics news on Phys.org
Euler Lagrange is what you want, but you also have to worry about the conditions that you have on q that come from it being a probability distribution, namely that the integral is 1 and it's always nonnegative. I think the integral constraint is the important part

http://liberzon.csl.illinois.edu/teaching/cvoc/node38.html

Has some notes on how to add constraints to the euler Lagrange equations.
 
  • Like
Likes   Reactions: Master1022
Maybe you should start with fixed p, and then try to get optimal q to maximize or minimize KL.
 

Similar threads

  • · Replies 12 ·
Replies
12
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 49 ·
2
Replies
49
Views
7K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 22 ·
Replies
22
Views
2K