Maximizing the Lagrangian with Constraints to Solving Problems

Click For Summary

Homework Help Overview

The discussion revolves around maximizing a Lagrangian function involving probabilities, specifically focusing on the term L = -Σ P(x,y) log P(x,y) with constraints related to sums over y. Participants are exploring how to differentiate the Lagrangian with respect to the probability function P(x,y) while considering constraints imposed by the problem.

Discussion Character

  • Exploratory, Assumption checking, Mathematical reasoning

Approaches and Questions Raised

  • Participants discuss the differentiation of the Lagrangian with respect to P(x',y') and question how to handle the constraint term. There is also a consideration of different forms of the Lagrangian and the implications of these forms on the solution.

Discussion Status

Some participants have provided alternative formulations of the Lagrangian and discussed the implications of these changes. There is an ongoing exploration of the relationship between the constraints and the resulting probability distributions, with some participants expressing confusion about specific terms and their derivations.

Contextual Notes

Participants are navigating various interpretations of the problem and its constraints, including references to lecture notes that may not align with the original problem statement. There is an acknowledgment of the need to clarify the setup and assumptions underlying the problem.

Legendre
Messages
59
Reaction score
0

Homework Statement



L = - \Sigma x,y (P(x,y) log P(x,y)) + \lambda \Sigmay (P(x,y) - q(x))

This is the Lagrangian. I need to maximize the first term in the sum with respect to P(x,y), subject to the constraint in the second term.

The first term is a sum over all possible values of x,y.

The second term is a sum over all possible values of ONLY y.

Homework Equations



Given above.

The Attempt at a Solution



I know I have to differentiate L with respect to P(x',y') where x',y' are fixed values.

The idea is that the first term consists of a sum of P(x,y) over all possible x,y. So, we just need to differentiate L with respect to P(x',y') while keeping all other P(x,y) constant.

But how do I differentiate the second term with respect to P(x',y')? It is a sum over all possible y values.

I know the solution is supposed to be \lambda when the second term is differentiated with respect to a particular P(x',y') but how do we obtain that?

THANKS! :)
 
Last edited:
Physics news on Phys.org
Legendre said:

Homework Statement



L = - \Sigma x,y (P(x,y) log P(x,y)) + \lambda \Sigmay (P(x,y) - q(x))

This is the Lagrangian. I need to maximize the first term in the sum with respect to P(x,y), subject to the constraint in the second term.

The first term is a sum over all possible values of x,y.

The second term is a sum over all possible values of ONLY y.

Homework Equations



Given above.

The Attempt at a Solution



I know I have to differentiate L with respect to P(x',y') where x',y' are fixed values.

The idea is that the first term consists of a sum of P(x,y) over all possible x,y. So, we just need to differentiate L with respect to P(x',y') while keeping all other P(x,y) constant.

But how do I differentiate the second term with respect to P(x',y')? It is a sum over all possible y values.

I know the solution is supposed to be \lambda when the second term is differentiated with respect to a particular P(x',y') but how do we obtain that?

THANKS! :)

Do you want to solve \max \sum_{x,y} P(x,y) \log P(x,y), subject to \sum_{y} P(x,y) = q(x) for all x? That is very different from what you wrote; the correct Lagrangian would be \sum_{x,y} P(x,y) \log P(x,y) + \sum_{x} \lambda_{x} [\sum_{y} P(x,y) - q(x)]. Remember: in this problem the variables are P(x,y); the x and y themselves are just summation indices. (Probably that would be a lot clearer if we wrote \sum_{i,j} P(i,j) \log P(i,j), etc.) So: for any fixed (x,y), dL/dP(x,y) = \log P(x,y) + 1 + \lambda_{x}.

RGV
 
Ray Vickson said:
Do you want to solve \max \sum_{x,y} P(x,y) \log P(x,y), subject to \sum_{y} P(x,y) = q(x) for all x? That is very different from what you wrote; the correct Lagrangian would be \sum_{x,y} P(x,y) \log P(x,y) + \sum_{x} \lambda_{x} [\sum_{y} P(x,y) - q(x)]. Remember: in this problem the variables are P(x,y); the x and y themselves are just summation indices. (Probably that would be a lot clearer if we wrote \sum_{i,j} P(i,j) \log P(i,j), etc.) So: for any fixed (x,y), dL/dP(x,y) = log P(x,y) + 1 + \lambda_{x}.

RGV

Yes thanks!

There is something else I don't get...

Page 10 of this lecture notes: http://pillowlab.cps.utexas.edu/teaching/CompNeuro10/slides/slides16_EntropyMethods.pdf

Says we should get the solution as P(x,y) = P(x) P(y). I have no idea how to get this.

Furthermore, they seem to have dL/dP(x,y) = \log P(x,y) + 1 + \Sigma \lambda_{x} P<sub>x</sub>(x). , which I think makes sense because the optimal P(x,y) should take into account the constraints?

The notes are for integrals but I am assuming it works for summations?
 
Last edited by a moderator:
The formula P(x,y) = f(x)*f(y) (not P(x)*P(y) ) only works if f(x) = q(x) [assuming sum_{y} f(y) = 1]
 
The results on page 10 are for a different probl than the one you posed in this Forum. Your posed problem has the solution P(x,y) = q(x)/N for all y, where N = number of y-values [assuming a discrete problem]. This follows from the Lagrange condition
log P(x,y) + 1 + lambda_x = 0 for all (x,y). That is, the distribution is uniform over y. The problem on page 10 of your link is:
max -sum_{x,y} P(x,y) log P(x,y), subject to sum_{y} P(x,y) = r(x) for all x, and sum_{x} P(x,y) = s(y) for all y. In this case the Lagrange method gives P(x,y) = r(x)*s(y) for all x and y.
 
Ray Vickson said:
The results on page 10 are for a different probl than the one you posed in this Forum. Your posed problem has the solution P(x,y) = q(x)/N for all y, where N = number of y-values [assuming a discrete problem]. This follows from the Lagrange condition
log P(x,y) + 1 + lambda_x = 0 for all (x,y). That is, the distribution is uniform over y. The problem on page 10 of your link is:
max -sum_{x,y} P(x,y) log P(x,y), subject to sum_{y} P(x,y) = r(x) for all x, and sum_{x} P(x,y) = s(y) for all y. In this case the Lagrange method gives P(x,y) = r(x)*s(y) for all x and y.

No, I meant to ask you about the problem on page 10. Your original reply solved my original problem.

To make it more precise:

Ignore everything that I have asked so far. The new problem is...

max - \Sigmax,y P(x,y) log P(x,y), subject to \Sigmay P(x,y) = r(x) for all x, and \Sigmax P(x,y) = s(y) for all y.

Why isn't dL/dP(x,y) = 1 + log P(x,y) + 1 + \Sigmai\lambda {xi} +\Sigmai\lambda {yi}?

Since the constraint look like:

\lambdaxi\Sigmay [ P(xi,y) - r(xi) ] in the lagrangian.

Shouldn't d / dP(x,y) of this be just \lambdax?

Thanks Ray, you're of great help! :)
 
Last edited:
Legendre said:
No, I meant to ask you about the problem on page 10. Your original reply solved my original problem.

To make it more precise:

Ignore everything that I have asked so far. The new problem is...

max - \Sigmax,y P(x,y) log P(x,y), subject to \Sigmay P(x,y) = r(x) for all x, and \Sigmax P(x,y) = s(y) for all y.

Why isn't dL/dP(x,y) = 1 + log P(x,y) + 1 + \Sigmai\lambda {xi} +\Sigmai\lambda {yi}?

Since the constraint look like:

\lambdaxi\Sigmay [ P(xi,y) - r(xi) ] in the lagrangian.

Shouldn't d / dP(x,y) of this be just \lambdax?

Thanks Ray, you're of great help! :)

L = -\sum_{x,y} P(x,y) \log P(x,y) + \sum_{x} \lambda_x (\sum_{y} P(x,y) - r(x)) + \sum_{y} \mu_y (\sum_{x} P(x,y) - s(y))
\displaystyle \frac{dL}{dP{x,y)} = -1 - \log P(x,y) + \lambda_x + \mu_y = 0
Thus, P(x,y) = \exp(-1+\lambda_x + \mu_y), so r(x) = \exp(-1 + \lambda_x) \sum_{y} \exp(\mu_y), which means that \exp(\lambda_x) = c_1 r(x), where c_1 = \exp(1) / \sum_{y} \exp(\mu_y) is a constant, and similarly, the equation \sum_{x} P(x,y) = s(y) leads to \exp(\mu_y) = c_2 s(y) for a constant c_2. Altogether we get P(x,y) = c_1 c_2 \exp(-1) r(x) s(y) = c r(x) s(y), where c is a constant. Normalizing gives c = 1.

RGV
 
Ray Vickson said:
L = -\sum_{x,y} P(x,y) \log P(x,y) + \sum_{x} \lambda_x (\sum_{y} P(x,y) - r(x)) + \sum_{y} \mu_y (\sum_{x} P(x,y) - s(y))
\displaystyle \frac{dL}{dP{x,y)} = -1 - \log P(x,y) + \lambda_x + \mu_y = 0
Thus, P(x,y) = \exp(-1+\lambda_x + \mu_y), so r(x) = \exp(-1 + \lambda_x) \sum_{y} \exp(\mu_y), which means that \exp(\lambda_x) = c_1 r(x), where c_1 = \exp(1) / \sum_{y} \exp(\mu_y) is a constant, and similarly, the equation \sum_{x} P(x,y) = s(y) leads to \exp(\mu_y) = c_2 s(y) for a constant c_2. Altogether we get P(x,y) = c_1 c_2 \exp(-1) r(x) s(y) = c r(x) s(y), where c is a constant. Normalizing gives c = 1.

RGV

Nice! Thanks a lot for the help!

Problem resolved. :D
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 18 ·
Replies
18
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
2
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K