Calculus of Variations on Kullback-Liebler Divergence

Click For Summary
SUMMARY

The discussion focuses on minimizing Kullback-Leibler (KL) divergence using variational calculus, specifically by setting the distribution q equal to p under fixed conditions. The user seeks guidance on applying the Euler-Lagrange equations and functional derivatives to achieve this minimization. Key insights include the necessity of adhering to the constraints of probability distributions, such as normalization and non-negativity. Resources provided include links to relevant notes and discussions on incorporating constraints into the Euler-Lagrange framework.

PREREQUISITES
  • Understanding of Kullback-Leibler divergence
  • Familiarity with variational calculus
  • Knowledge of Euler-Lagrange equations
  • Concept of functional derivatives
NEXT STEPS
  • Study the application of Euler-Lagrange equations in optimization problems
  • Learn about constraints in variational calculus
  • Explore functional derivatives in the context of probability distributions
  • Investigate methods for minimizing KL divergence in machine learning
USEFUL FOR

Machine learning practitioners, data scientists, and researchers interested in optimization techniques related to probability distributions and KL divergence.

Master1022
Messages
590
Reaction score
116
TL;DR
How to use calculus of variations on KL-divergence
Hi,

This isn't a homework question, but a side task given in a machine learning class I am taking.

Question: Using variational calculus, prove that one can minimize the KL-divergence by choosing ##q## to be equal to ##p##, given a fixed ##p##.

Attempt:

Unfortunately, I have never seen calculus of variations (it was suggested that we teach ourselves). I have been trying to watch some videos online, but I mainly just see references to Euler-Lagrange equations which I don't think are of much relevance here (please correct me if I am wrong) and not much explanation of the functional derivatives.

Nonetheless, I think this shouldn't be too hard, but am struggling to understand how to use the tools.

If we start with the definition of the KL-divergence we get:
\text{KL}[p||q] = \int p(x) log(\frac{p(x)}{q(x)}) dx = I

Would it be possible for anyone to help me get started on the path? I am not sure how to proceed really after I write down ## \frac{\delta I}{\delta q} ##?

Thanks in advance
 
Physics news on Phys.org
Euler Lagrange is what you want, but you also have to worry about the conditions that you have on q that come from it being a probability distribution, namely that the integral is 1 and it's always nonnegative. I think the integral constraint is the important part

http://liberzon.csl.illinois.edu/teaching/cvoc/node38.html

Has some notes on how to add constraints to the euler Lagrange equations.
 
  • Like
Likes   Reactions: Master1022
Maybe you should start with fixed p, and then try to get optimal q to maximize or minimize KL.
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 22 ·
Replies
22
Views
2K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 33 ·
2
Replies
33
Views
4K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 0 ·
Replies
0
Views
1K
  • · Replies 2 ·
Replies
2
Views
610