Maximizing the Lagrangian with Constraints to Solving Problems

Click For Summary
SUMMARY

The discussion focuses on maximizing the Lagrangian function L = -Σx,y (P(x,y) log P(x,y)) + λΣy (P(x,y) - q(x)), where the goal is to maximize the first term with respect to P(x,y) under the constraint defined by the second term. Participants clarify that the correct differentiation of L with respect to P(x',y') leads to the condition log P(x,y) + 1 + λx = 0, indicating a uniform distribution over y. The final solution for the distribution is P(x,y) = c r(x) s(y), where c is a normalization constant, derived from the Lagrange multipliers λx and μy.

PREREQUISITES
  • Understanding of Lagrangian mechanics and optimization
  • Familiarity with probability distributions and constraints
  • Knowledge of calculus, particularly differentiation
  • Basic concepts of entropy in statistical mechanics
NEXT STEPS
  • Study the method of Lagrange multipliers in optimization problems
  • Learn about entropy and its applications in probability theory
  • Explore the derivation of joint probability distributions in statistics
  • Investigate the implications of uniform distributions in optimization contexts
USEFUL FOR

Mathematicians, statisticians, data scientists, and anyone involved in optimization problems, particularly in the fields of economics, machine learning, and statistical mechanics.

Legendre
Messages
59
Reaction score
0

Homework Statement



L = - \Sigma x,y (P(x,y) log P(x,y)) + \lambda \Sigmay (P(x,y) - q(x))

This is the Lagrangian. I need to maximize the first term in the sum with respect to P(x,y), subject to the constraint in the second term.

The first term is a sum over all possible values of x,y.

The second term is a sum over all possible values of ONLY y.

Homework Equations



Given above.

The Attempt at a Solution



I know I have to differentiate L with respect to P(x',y') where x',y' are fixed values.

The idea is that the first term consists of a sum of P(x,y) over all possible x,y. So, we just need to differentiate L with respect to P(x',y') while keeping all other P(x,y) constant.

But how do I differentiate the second term with respect to P(x',y')? It is a sum over all possible y values.

I know the solution is supposed to be \lambda when the second term is differentiated with respect to a particular P(x',y') but how do we obtain that?

THANKS! :)
 
Last edited:
Physics news on Phys.org
Legendre said:

Homework Statement



L = - \Sigma x,y (P(x,y) log P(x,y)) + \lambda \Sigmay (P(x,y) - q(x))

This is the Lagrangian. I need to maximize the first term in the sum with respect to P(x,y), subject to the constraint in the second term.

The first term is a sum over all possible values of x,y.

The second term is a sum over all possible values of ONLY y.

Homework Equations



Given above.

The Attempt at a Solution



I know I have to differentiate L with respect to P(x',y') where x',y' are fixed values.

The idea is that the first term consists of a sum of P(x,y) over all possible x,y. So, we just need to differentiate L with respect to P(x',y') while keeping all other P(x,y) constant.

But how do I differentiate the second term with respect to P(x',y')? It is a sum over all possible y values.

I know the solution is supposed to be \lambda when the second term is differentiated with respect to a particular P(x',y') but how do we obtain that?

THANKS! :)

Do you want to solve \max \sum_{x,y} P(x,y) \log P(x,y), subject to \sum_{y} P(x,y) = q(x) for all x? That is very different from what you wrote; the correct Lagrangian would be \sum_{x,y} P(x,y) \log P(x,y) + \sum_{x} \lambda_{x} [\sum_{y} P(x,y) - q(x)]. Remember: in this problem the variables are P(x,y); the x and y themselves are just summation indices. (Probably that would be a lot clearer if we wrote \sum_{i,j} P(i,j) \log P(i,j), etc.) So: for any fixed (x,y), dL/dP(x,y) = \log P(x,y) + 1 + \lambda_{x}.

RGV
 
Ray Vickson said:
Do you want to solve \max \sum_{x,y} P(x,y) \log P(x,y), subject to \sum_{y} P(x,y) = q(x) for all x? That is very different from what you wrote; the correct Lagrangian would be \sum_{x,y} P(x,y) \log P(x,y) + \sum_{x} \lambda_{x} [\sum_{y} P(x,y) - q(x)]. Remember: in this problem the variables are P(x,y); the x and y themselves are just summation indices. (Probably that would be a lot clearer if we wrote \sum_{i,j} P(i,j) \log P(i,j), etc.) So: for any fixed (x,y), dL/dP(x,y) = log P(x,y) + 1 + \lambda_{x}.

RGV

Yes thanks!

There is something else I don't get...

Page 10 of this lecture notes: http://pillowlab.cps.utexas.edu/teaching/CompNeuro10/slides/slides16_EntropyMethods.pdf

Says we should get the solution as P(x,y) = P(x) P(y). I have no idea how to get this.

Furthermore, they seem to have dL/dP(x,y) = \log P(x,y) + 1 + \Sigma \lambda_{x} P<sub>x</sub>(x). , which I think makes sense because the optimal P(x,y) should take into account the constraints?

The notes are for integrals but I am assuming it works for summations?
 
Last edited by a moderator:
The formula P(x,y) = f(x)*f(y) (not P(x)*P(y) ) only works if f(x) = q(x) [assuming sum_{y} f(y) = 1]
 
The results on page 10 are for a different probl than the one you posed in this Forum. Your posed problem has the solution P(x,y) = q(x)/N for all y, where N = number of y-values [assuming a discrete problem]. This follows from the Lagrange condition
log P(x,y) + 1 + lambda_x = 0 for all (x,y). That is, the distribution is uniform over y. The problem on page 10 of your link is:
max -sum_{x,y} P(x,y) log P(x,y), subject to sum_{y} P(x,y) = r(x) for all x, and sum_{x} P(x,y) = s(y) for all y. In this case the Lagrange method gives P(x,y) = r(x)*s(y) for all x and y.
 
Ray Vickson said:
The results on page 10 are for a different probl than the one you posed in this Forum. Your posed problem has the solution P(x,y) = q(x)/N for all y, where N = number of y-values [assuming a discrete problem]. This follows from the Lagrange condition
log P(x,y) + 1 + lambda_x = 0 for all (x,y). That is, the distribution is uniform over y. The problem on page 10 of your link is:
max -sum_{x,y} P(x,y) log P(x,y), subject to sum_{y} P(x,y) = r(x) for all x, and sum_{x} P(x,y) = s(y) for all y. In this case the Lagrange method gives P(x,y) = r(x)*s(y) for all x and y.

No, I meant to ask you about the problem on page 10. Your original reply solved my original problem.

To make it more precise:

Ignore everything that I have asked so far. The new problem is...

max - \Sigmax,y P(x,y) log P(x,y), subject to \Sigmay P(x,y) = r(x) for all x, and \Sigmax P(x,y) = s(y) for all y.

Why isn't dL/dP(x,y) = 1 + log P(x,y) + 1 + \Sigmai\lambda {xi} +\Sigmai\lambda {yi}?

Since the constraint look like:

\lambdaxi\Sigmay [ P(xi,y) - r(xi) ] in the lagrangian.

Shouldn't d / dP(x,y) of this be just \lambdax?

Thanks Ray, you're of great help! :)
 
Last edited:
Legendre said:
No, I meant to ask you about the problem on page 10. Your original reply solved my original problem.

To make it more precise:

Ignore everything that I have asked so far. The new problem is...

max - \Sigmax,y P(x,y) log P(x,y), subject to \Sigmay P(x,y) = r(x) for all x, and \Sigmax P(x,y) = s(y) for all y.

Why isn't dL/dP(x,y) = 1 + log P(x,y) + 1 + \Sigmai\lambda {xi} +\Sigmai\lambda {yi}?

Since the constraint look like:

\lambdaxi\Sigmay [ P(xi,y) - r(xi) ] in the lagrangian.

Shouldn't d / dP(x,y) of this be just \lambdax?

Thanks Ray, you're of great help! :)

L = -\sum_{x,y} P(x,y) \log P(x,y) + \sum_{x} \lambda_x (\sum_{y} P(x,y) - r(x)) + \sum_{y} \mu_y (\sum_{x} P(x,y) - s(y))
\displaystyle \frac{dL}{dP{x,y)} = -1 - \log P(x,y) + \lambda_x + \mu_y = 0
Thus, P(x,y) = \exp(-1+\lambda_x + \mu_y), so r(x) = \exp(-1 + \lambda_x) \sum_{y} \exp(\mu_y), which means that \exp(\lambda_x) = c_1 r(x), where c_1 = \exp(1) / \sum_{y} \exp(\mu_y) is a constant, and similarly, the equation \sum_{x} P(x,y) = s(y) leads to \exp(\mu_y) = c_2 s(y) for a constant c_2. Altogether we get P(x,y) = c_1 c_2 \exp(-1) r(x) s(y) = c r(x) s(y), where c is a constant. Normalizing gives c = 1.

RGV
 
Ray Vickson said:
L = -\sum_{x,y} P(x,y) \log P(x,y) + \sum_{x} \lambda_x (\sum_{y} P(x,y) - r(x)) + \sum_{y} \mu_y (\sum_{x} P(x,y) - s(y))
\displaystyle \frac{dL}{dP{x,y)} = -1 - \log P(x,y) + \lambda_x + \mu_y = 0
Thus, P(x,y) = \exp(-1+\lambda_x + \mu_y), so r(x) = \exp(-1 + \lambda_x) \sum_{y} \exp(\mu_y), which means that \exp(\lambda_x) = c_1 r(x), where c_1 = \exp(1) / \sum_{y} \exp(\mu_y) is a constant, and similarly, the equation \sum_{x} P(x,y) = s(y) leads to \exp(\mu_y) = c_2 s(y) for a constant c_2. Altogether we get P(x,y) = c_1 c_2 \exp(-1) r(x) s(y) = c r(x) s(y), where c is a constant. Normalizing gives c = 1.

RGV

Nice! Thanks a lot for the help!

Problem resolved. :D
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 18 ·
Replies
18
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
1K
  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
2
Views
1K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K