Maximizing the Lagrangian with Constraints: A Guide to Solving Problems

In summary, the problem is to maximize - \Sigma x,y P(x,y) log P(x,y) subject to the constraints that \Sigma y P(x,y) = r(x) for all x and \Sigma x P(x,y) = s(y) for all y. The Lagrangian for this problem is \Sigma x,y P(x,y) log P(x,y) + \Sigma x \lambda_x (\Sigma y P(x,y) - r(x)) + \Sigma y \lambda_y (\Sigma x P(x,y) - s(y)). To find the solution, we use the Lagrange method to set the derivatives of the Lagrangian with respect to P(x,y) equal to zero, which results in P(x
  • #1
Legendre
62
0

Homework Statement



L = - [itex]\Sigma[/itex] x,y (P(x,y) log P(x,y)) + [tex]\lambda[/tex] [itex]\Sigma[/itex]y (P(x,y) - q(x))

This is the Lagrangian. I need to maximize the first term in the sum with respect to P(x,y), subject to the constraint in the second term.

The first term is a sum over all possible values of x,y.

The second term is a sum over all possible values of ONLY y.

Homework Equations



Given above.

The Attempt at a Solution



I know I have to differentiate L with respect to P(x',y') where x',y' are fixed values.

The idea is that the first term consists of a sum of P(x,y) over all possible x,y. So, we just need to differentiate L with respect to P(x',y') while keeping all other P(x,y) constant.

But how do I differentiate the second term with respect to P(x',y')? It is a sum over all possible y values.

I know the solution is supposed to be [tex]\lambda[/tex] when the second term is differentiated with respect to a particular P(x',y') but how do we obtain that?

THANKS! :)
 
Last edited:
Physics news on Phys.org
  • #2
Legendre said:

Homework Statement



L = - [itex]\Sigma[/itex] x,y (P(x,y) log P(x,y)) + [tex]\lambda[/tex] [itex]\Sigma[/itex]y (P(x,y) - q(x))

This is the Lagrangian. I need to maximize the first term in the sum with respect to P(x,y), subject to the constraint in the second term.

The first term is a sum over all possible values of x,y.

The second term is a sum over all possible values of ONLY y.

Homework Equations



Given above.

The Attempt at a Solution



I know I have to differentiate L with respect to P(x',y') where x',y' are fixed values.

The idea is that the first term consists of a sum of P(x,y) over all possible x,y. So, we just need to differentiate L with respect to P(x',y') while keeping all other P(x,y) constant.

But how do I differentiate the second term with respect to P(x',y')? It is a sum over all possible y values.

I know the solution is supposed to be [tex]\lambda[/tex] when the second term is differentiated with respect to a particular P(x',y') but how do we obtain that?

THANKS! :)

Do you want to solve [itex] \max \sum_{x,y} P(x,y) \log P(x,y)[/itex], subject to [itex] \sum_{y} P(x,y) = q(x) [/itex] for all x? That is very different from what you wrote; the correct Lagrangian would be [itex]\sum_{x,y} P(x,y) \log P(x,y) + \sum_{x} \lambda_{x} [\sum_{y} P(x,y) - q(x)] [/itex]. Remember: in this problem the variables are P(x,y); the x and y themselves are just summation indices. (Probably that would be a lot clearer if we wrote [itex] \sum_{i,j} P(i,j) \log P(i,j)[/itex], etc.) So: for any fixed (x,y), [itex] dL/dP(x,y) = \log P(x,y) + 1 + \lambda_{x}. [/itex]

RGV
 
  • #3
Ray Vickson said:
Do you want to solve [itex] \max \sum_{x,y} P(x,y) \log P(x,y)[/itex], subject to [itex] \sum_{y} P(x,y) = q(x) [/itex] for all x? That is very different from what you wrote; the correct Lagrangian would be [itex]\sum_{x,y} P(x,y) \log P(x,y) + \sum_{x} \lambda_{x} [\sum_{y} P(x,y) - q(x)] [/itex]. Remember: in this problem the variables are P(x,y); the x and y themselves are just summation indices. (Probably that would be a lot clearer if we wrote [itex] \sum_{i,j} P(i,j) \log P(i,j)[/itex], etc.) So: for any fixed (x,y), [itex] dL/dP(x,y) = log P(x,y) + 1 + \lambda_{x}. [/itex]

RGV

Yes thanks!

There is something else I don't get...

Page 10 of this lecture notes: http://pillowlab.cps.utexas.edu/teaching/CompNeuro10/slides/slides16_EntropyMethods.pdf

Says we should get the solution as P(x,y) = P(x) P(y). I have no idea how to get this.

Furthermore, they seem to have [tex] dL/dP(x,y) = \log P(x,y) + 1 + \Sigma \lambda_{x} Px(x) [/tex]. , which I think makes sense because the optimal P(x,y) should take into account the constraints?

The notes are for integrals but I am assuming it works for summations?
 
Last edited by a moderator:
  • #4
The formula P(x,y) = f(x)*f(y) (not P(x)*P(y) ) only works if f(x) = q(x) [assuming sum_{y} f(y) = 1]
 
  • #5
The results on page 10 are for a different probl than the one you posed in this Forum. Your posed problem has the solution P(x,y) = q(x)/N for all y, where N = number of y-values [assuming a discrete problem]. This follows from the Lagrange condition
log P(x,y) + 1 + lambda_x = 0 for all (x,y). That is, the distribution is uniform over y. The problem on page 10 of your link is:
max -sum_{x,y} P(x,y) log P(x,y), subject to sum_{y} P(x,y) = r(x) for all x, and sum_{x} P(x,y) = s(y) for all y. In this case the Lagrange method gives P(x,y) = r(x)*s(y) for all x and y.
 
  • #6
Ray Vickson said:
The results on page 10 are for a different probl than the one you posed in this Forum. Your posed problem has the solution P(x,y) = q(x)/N for all y, where N = number of y-values [assuming a discrete problem]. This follows from the Lagrange condition
log P(x,y) + 1 + lambda_x = 0 for all (x,y). That is, the distribution is uniform over y. The problem on page 10 of your link is:
max -sum_{x,y} P(x,y) log P(x,y), subject to sum_{y} P(x,y) = r(x) for all x, and sum_{x} P(x,y) = s(y) for all y. In this case the Lagrange method gives P(x,y) = r(x)*s(y) for all x and y.

No, I meant to ask you about the problem on page 10. Your original reply solved my original problem.

To make it more precise:

Ignore everything that I have asked so far. The new problem is...

max - [itex]\Sigma[/itex]x,y P(x,y) log P(x,y), subject to [itex]\Sigma[/itex]y P(x,y) = r(x) for all x, and [itex]\Sigma[/itex]x P(x,y) = s(y) for all y.

Why isn't dL/dP(x,y) = 1 + log P(x,y) + 1 + [itex]\Sigma[/itex]i[itex]\lambda[/itex] {xi} +[itex]\Sigma[/itex]i[itex]\lambda[/itex] {yi}?

Since the constraint look like:

[itex]\lambda[/itex]xi[itex]\Sigma[/itex]y [ P(xi,y) - r(xi) ] in the lagrangian.

Shouldn't d / dP(x,y) of this be just [itex]\lambda[/itex]x?

Thanks Ray, you're of great help! :)
 
Last edited:
  • #7
Legendre said:
No, I meant to ask you about the problem on page 10. Your original reply solved my original problem.

To make it more precise:

Ignore everything that I have asked so far. The new problem is...

max - [itex]\Sigma[/itex]x,y P(x,y) log P(x,y), subject to [itex]\Sigma[/itex]y P(x,y) = r(x) for all x, and [itex]\Sigma[/itex]x P(x,y) = s(y) for all y.

Why isn't dL/dP(x,y) = 1 + log P(x,y) + 1 + [itex]\Sigma[/itex]i[itex]\lambda[/itex] {xi} +[itex]\Sigma[/itex]i[itex]\lambda[/itex] {yi}?

Since the constraint look like:

[itex]\lambda[/itex]xi[itex]\Sigma[/itex]y [ P(xi,y) - r(xi) ] in the lagrangian.

Shouldn't d / dP(x,y) of this be just [itex]\lambda[/itex]x?

Thanks Ray, you're of great help! :)

[tex]L = -\sum_{x,y} P(x,y) \log P(x,y) + \sum_{x} \lambda_x (\sum_{y} P(x,y) - r(x)) + \sum_{y} \mu_y (\sum_{x} P(x,y) - s(y))[/tex]
[tex] \displaystyle \frac{dL}{dP{x,y)} = -1 - \log P(x,y) + \lambda_x + \mu_y = 0[/tex]
Thus, [itex] P(x,y) = \exp(-1+\lambda_x + \mu_y)[/itex], so [itex]r(x) = \exp(-1 + \lambda_x) \sum_{y} \exp(\mu_y)[/itex], which means that [itex]\exp(\lambda_x) = c_1 r(x)[/itex], where [itex] c_1 = \exp(1) / \sum_{y} \exp(\mu_y) [/itex] is a constant, and similarly, the equation [itex] \sum_{x} P(x,y) = s(y) [/itex] leads to [itex] \exp(\mu_y) = c_2 s(y) [/itex] for a constant c_2. Altogether we get [itex] P(x,y) = c_1 c_2 \exp(-1) r(x) s(y) = c r(x) s(y)[/itex], where c is a constant. Normalizing gives c = 1.

RGV
 
  • #8
Ray Vickson said:
[tex]L = -\sum_{x,y} P(x,y) \log P(x,y) + \sum_{x} \lambda_x (\sum_{y} P(x,y) - r(x)) + \sum_{y} \mu_y (\sum_{x} P(x,y) - s(y))[/tex]
[tex] \displaystyle \frac{dL}{dP{x,y)} = -1 - \log P(x,y) + \lambda_x + \mu_y = 0[/tex]
Thus, [itex] P(x,y) = \exp(-1+\lambda_x + \mu_y)[/itex], so [itex]r(x) = \exp(-1 + \lambda_x) \sum_{y} \exp(\mu_y)[/itex], which means that [itex]\exp(\lambda_x) = c_1 r(x)[/itex], where [itex] c_1 = \exp(1) / \sum_{y} \exp(\mu_y) [/itex] is a constant, and similarly, the equation [itex] \sum_{x} P(x,y) = s(y) [/itex] leads to [itex] \exp(\mu_y) = c_2 s(y) [/itex] for a constant c_2. Altogether we get [itex] P(x,y) = c_1 c_2 \exp(-1) r(x) s(y) = c r(x) s(y)[/itex], where c is a constant. Normalizing gives c = 1.

RGV

Nice! Thanks a lot for the help!

Problem resolved. :D
 

1. What is the purpose of using Lagrange multipliers in optimization problems?

The purpose of using Lagrange multipliers is to find the maximum or minimum value of a function subject to one or more constraints. It allows us to incorporate these constraints into the optimization problem and find the optimal solution.

2. How do Lagrange multipliers work?

Lagrange multipliers work by introducing a new variable, called the Lagrange multiplier, to the objective function and the constraints. This variable helps to incorporate the constraints into the optimization problem, which can then be solved using the method of partial derivatives.

3. Can Lagrange multipliers be used for both constrained and unconstrained optimization problems?

Yes, Lagrange multipliers can be used for both constrained and unconstrained optimization problems. For unconstrained problems, the constraints are simply set to 0.

4. What are the advantages of using Lagrange multipliers?

One advantage of using Lagrange multipliers is that it allows us to solve optimization problems with multiple constraints. It also provides a systematic way to incorporate constraints into the objective function, making the solution process more efficient.

5. Are there any limitations to using Lagrange multipliers?

One limitation of using Lagrange multipliers is that it can only be applied to problems with differentiable objective and constraint functions. It may also be computationally expensive for problems with a large number of constraints.

Similar threads

  • Calculus and Beyond Homework Help
Replies
8
Views
474
  • Calculus and Beyond Homework Help
Replies
18
Views
1K
  • Calculus and Beyond Homework Help
Replies
2
Views
544
  • Calculus and Beyond Homework Help
Replies
4
Views
841
  • Calculus and Beyond Homework Help
Replies
2
Views
278
  • Calculus and Beyond Homework Help
Replies
2
Views
1K
Replies
1
Views
819
  • Calculus and Beyond Homework Help
Replies
7
Views
285
  • Calculus and Beyond Homework Help
Replies
1
Views
829
  • Calculus and Beyond Homework Help
Replies
2
Views
1K
Back
Top