Maximizing the Lagrangian with Constraints to Solving Problems

Legendre · Jul 5, 2011

Homework Statement

L = - [itex]\Sigma[/itex] _x,y (P(x,y) log P(x,y)) + [tex]\lambda[/tex] [itex]\Sigma[/itex]_y (P(x,y) - q(x))

This is the Lagrangian. I need to maximize the first term in the sum with respect to P(x,y), subject to the constraint in the second term.

The first term is a sum over all possible values of x,y.

The second term is a sum over all possible values of ONLY y.

Homework Equations

Given above.

The Attempt at a Solution

I know I have to differentiate L with respect to P(x',y') where x',y' are fixed values.

The idea is that the first term consists of a sum of P(x,y) over all possible x,y. So, we just need to differentiate L with respect to P(x',y') while keeping all other P(x,y) constant.

But how do I differentiate the second term with respect to P(x',y')? It is a sum over all possible y values.

I know the solution is supposed to be [tex]\lambda[/tex] when the second term is differentiated with respect to a particular P(x',y') but how do we obtain that?

THANKS! :)

Ray Vickson · Jul 7, 2011

Legendre said:

Homework Statement

L = - [itex]\Sigma[/itex] _x,y (P(x,y) log P(x,y)) + [tex]\lambda[/tex] [itex]\Sigma[/itex]_y (P(x,y) - q(x))

This is the Lagrangian. I need to maximize the first term in the sum with respect to P(x,y), subject to the constraint in the second term.

The first term is a sum over all possible values of x,y.

The second term is a sum over all possible values of ONLY y.

Homework Equations

Given above.

The Attempt at a Solution

I know I have to differentiate L with respect to P(x',y') where x',y' are fixed values.

The idea is that the first term consists of a sum of P(x,y) over all possible x,y. So, we just need to differentiate L with respect to P(x',y') while keeping all other P(x,y) constant.

But how do I differentiate the second term with respect to P(x',y')? It is a sum over all possible y values.

I know the solution is supposed to be [tex]\lambda[/tex] when the second term is differentiated with respect to a particular P(x',y') but how do we obtain that?

THANKS! :)

Do you want to solve [itex] \max \sum_{x,y} P(x,y) \log P(x,y)[/itex], subject to [itex] \sum_{y} P(x,y) = q(x) [/itex] for all x? That is very different from what you wrote; the correct Lagrangian would be [itex]\sum_{x,y} P(x,y) \log P(x,y) + \sum_{x} \lambda_{x} [\sum_{y} P(x,y) - q(x)] [/itex]. Remember: in this problem the variables are P(x,y); the x and y themselves are just summation indices. (Probably that would be a lot clearer if we wrote [itex] \sum_{i,j} P(i,j) \log P(i,j)[/itex], etc.) So: for any fixed (x,y), [itex] dL/dP(x,y) = \log P(x,y) + 1 + \lambda_{x}. [/itex]

RGV

Legendre · Jul 7, 2011

Ray Vickson said:

Do you want to solve [itex] \max \sum_{x,y} P(x,y) \log P(x,y)[/itex], subject to [itex] \sum_{y} P(x,y) = q(x) [/itex] for all x? That is very different from what you wrote; the correct Lagrangian would be [itex]\sum_{x,y} P(x,y) \log P(x,y) + \sum_{x} \lambda_{x} [\sum_{y} P(x,y) - q(x)] [/itex]. Remember: in this problem the variables are P(x,y); the x and y themselves are just summation indices. (Probably that would be a lot clearer if we wrote [itex] \sum_{i,j} P(i,j) \log P(i,j)[/itex], etc.) So: for any fixed (x,y), [itex] dL/dP(x,y) = log P(x,y) + 1 + \lambda_{x}. [/itex]

RGV

Yes thanks!

There is something else I don't get...

Page 10 of this lecture notes: http://pillowlab.cps.utexas.edu/teaching/CompNeuro10/slides/slides16_EntropyMethods.pdf

Says we should get the solution as P(x,y) = P(x) P(y). I have no idea how to get this.

Furthermore, they seem to have [tex] dL/dP(x,y) = \log P(x,y) + 1 + \Sigma \lambda_{x} P_x(x) [/tex]. , which I think makes sense because the optimal P(x,y) should take into account the constraints?

The notes are for integrals but I am assuming it works for summations?

Ray Vickson · Jul 7, 2011

The formula P(x,y) = f(x)*f(y) (not P(x)*P(y) ) only works if f(x) = q(x) [assuming sum_{y} f(y) = 1]

Ray Vickson · Jul 7, 2011

The results on page 10 are for a different probl than the one you posed in this Forum. Your posed problem has the solution P(x,y) = q(x)/N for all y, where N = number of y-values [assuming a discrete problem]. This follows from the Lagrange condition
log P(x,y) + 1 + lambda_x = 0 for all (x,y). That is, the distribution is uniform over y. The problem on page 10 of your link is:
max -sum_{x,y} P(x,y) log P(x,y), subject to sum_{y} P(x,y) = r(x) for all x, and sum_{x} P(x,y) = s(y) for all y. In this case the Lagrange method gives P(x,y) = r(x)*s(y) for all x and y.

Legendre · Jul 7, 2011

Ray Vickson said:

The results on page 10 are for a different probl than the one you posed in this Forum. Your posed problem has the solution P(x,y) = q(x)/N for all y, where N = number of y-values [assuming a discrete problem]. This follows from the Lagrange condition
log P(x,y) + 1 + lambda_x = 0 for all (x,y). That is, the distribution is uniform over y. The problem on page 10 of your link is:
max -sum_{x,y} P(x,y) log P(x,y), subject to sum_{y} P(x,y) = r(x) for all x, and sum_{x} P(x,y) = s(y) for all y. In this case the Lagrange method gives P(x,y) = r(x)*s(y) for all x and y.

No, I meant to ask you about the problem on page 10. Your original reply solved my original problem.

To make it more precise:

Ignore everything that I have asked so far. The new problem is...

max - [itex]\Sigma[/itex]_x,y P(x,y) log P(x,y), subject to [itex]\Sigma[/itex]_y P(x,y) = r(x) for all x, and [itex]\Sigma[/itex]_x P(x,y) = s(y) for all y.

Why isn't dL/dP(x,y) = 1 + log P(x,y) + 1 + [itex]\Sigma[/itex]_i[itex]\lambda[/itex] {x_i} +[itex]\Sigma[/itex]_i[itex]\lambda[/itex] {y_i}?

Since the constraint look like:

[itex]\lambda[/itex]_{x_i}[itex]\Sigma[/itex]_y [ P(x_i,y) - r(x_i) ] in the lagrangian.

Shouldn't d / dP(x,y) of this be just [itex]\lambda[/itex]_x?

Thanks Ray, you're of great help! :)

Ray Vickson · Jul 7, 2011

Legendre said:

No, I meant to ask you about the problem on page 10. Your original reply solved my original problem.

To make it more precise:

Ignore everything that I have asked so far. The new problem is...

max - [itex]\Sigma[/itex]_x,y P(x,y) log P(x,y), subject to [itex]\Sigma[/itex]_y P(x,y) = r(x) for all x, and [itex]\Sigma[/itex]_x P(x,y) = s(y) for all y.

Why isn't dL/dP(x,y) = 1 + log P(x,y) + 1 + [itex]\Sigma[/itex]_i[itex]\lambda[/itex] {x_i} +[itex]\Sigma[/itex]_i[itex]\lambda[/itex] {y_i}?

Since the constraint look like:

[itex]\lambda[/itex]_{x_i}[itex]\Sigma[/itex]_y [ P(x_i,y) - r(x_i) ] in the lagrangian.

Shouldn't d / dP(x,y) of this be just [itex]\lambda[/itex]_x?

Thanks Ray, you're of great help! :)

[tex]L = -\sum_{x,y} P(x,y) \log P(x,y) + \sum_{x} \lambda_x (\sum_{y} P(x,y) - r(x)) + \sum_{y} \mu_y (\sum_{x} P(x,y) - s(y))[/tex]
[tex] \displaystyle \frac{dL}{dP{x,y)} = -1 - \log P(x,y) + \lambda_x + \mu_y = 0[/tex]
Thus, [itex] P(x,y) = \exp(-1+\lambda_x + \mu_y)[/itex], so [itex]r(x) = \exp(-1 + \lambda_x) \sum_{y} \exp(\mu_y)[/itex], which means that [itex]\exp(\lambda_x) = c_1 r(x)[/itex], where [itex] c_1 = \exp(1) / \sum_{y} \exp(\mu_y) [/itex] is a constant, and similarly, the equation [itex] \sum_{x} P(x,y) = s(y) [/itex] leads to [itex] \exp(\mu_y) = c_2 s(y) [/itex] for a constant c_2. Altogether we get [itex] P(x,y) = c_1 c_2 \exp(-1) r(x) s(y) = c r(x) s(y)[/itex], where c is a constant. Normalizing gives c = 1.

RGV

Legendre · Jul 8, 2011

Ray Vickson said:

[tex]L = -\sum_{x,y} P(x,y) \log P(x,y) + \sum_{x} \lambda_x (\sum_{y} P(x,y) - r(x)) + \sum_{y} \mu_y (\sum_{x} P(x,y) - s(y))[/tex]
[tex] \displaystyle \frac{dL}{dP{x,y)} = -1 - \log P(x,y) + \lambda_x + \mu_y = 0[/tex]
Thus, [itex] P(x,y) = \exp(-1+\lambda_x + \mu_y)[/itex], so [itex]r(x) = \exp(-1 + \lambda_x) \sum_{y} \exp(\mu_y)[/itex], which means that [itex]\exp(\lambda_x) = c_1 r(x)[/itex], where [itex] c_1 = \exp(1) / \sum_{y} \exp(\mu_y) [/itex] is a constant, and similarly, the equation [itex] \sum_{x} P(x,y) = s(y) [/itex] leads to [itex] \exp(\mu_y) = c_2 s(y) [/itex] for a constant c_2. Altogether we get [itex] P(x,y) = c_1 c_2 \exp(-1) r(x) s(y) = c r(x) s(y)[/itex], where c is a constant. Normalizing gives c = 1.

RGV

Nice! Thanks a lot for the help!

Problem resolved. :D

Maximizing the Lagrangian with Constraints to Solving Problems

Homework Statement

Homework Equations

The Attempt at a Solution

Homework Statement

Homework Equations

The Attempt at a Solution

FAQ: Maximizing the Lagrangian with Constraints to Solving Problems

1. What is the purpose of using Lagrange multipliers in optimization problems?

2. How do Lagrange multipliers work?

3. Can Lagrange multipliers be used for both constrained and unconstrained optimization problems?

4. What are the advantages of using Lagrange multipliers?

5. Are there any limitations to using Lagrange multipliers?

Similar threads

Hot Threads

Recent Insights