Maximizing the Lagrangian with Constraints to Solving Problems

Legendre · Jul 5, 2011

Homework Statement

L = - \Sigma _x,y (P(x,y) log P(x,y)) + \lambda \Sigma_y (P(x,y) - q(x))

This is the Lagrangian. I need to maximize the first term in the sum with respect to P(x,y), subject to the constraint in the second term.

The first term is a sum over all possible values of x,y.

The second term is a sum over all possible values of ONLY y.

Homework Equations

Given above.

The Attempt at a Solution

I know I have to differentiate L with respect to P(x',y') where x',y' are fixed values.

The idea is that the first term consists of a sum of P(x,y) over all possible x,y. So, we just need to differentiate L with respect to P(x',y') while keeping all other P(x,y) constant.

But how do I differentiate the second term with respect to P(x',y')? It is a sum over all possible y values.

I know the solution is supposed to be \lambda when the second term is differentiated with respect to a particular P(x',y') but how do we obtain that?

THANKS! :)

Ray Vickson · Jul 7, 2011

Legendre said:

Homework Statement

L = - \Sigma _x,y (P(x,y) log P(x,y)) + \lambda \Sigma_y (P(x,y) - q(x))

This is the Lagrangian. I need to maximize the first term in the sum with respect to P(x,y), subject to the constraint in the second term.

The first term is a sum over all possible values of x,y.

The second term is a sum over all possible values of ONLY y.

Homework Equations

Given above.

The Attempt at a Solution

I know I have to differentiate L with respect to P(x',y') where x',y' are fixed values.

The idea is that the first term consists of a sum of P(x,y) over all possible x,y. So, we just need to differentiate L with respect to P(x',y') while keeping all other P(x,y) constant.

But how do I differentiate the second term with respect to P(x',y')? It is a sum over all possible y values.

I know the solution is supposed to be \lambda when the second term is differentiated with respect to a particular P(x',y') but how do we obtain that?

THANKS! :)

Do you want to solve \max \sum_{x,y} P(x,y) \log P(x,y), subject to \sum_{y} P(x,y) = q(x) for all x? That is very different from what you wrote; the correct Lagrangian would be \sum_{x,y} P(x,y) \log P(x,y) + \sum_{x} \lambda_{x} [\sum_{y} P(x,y) - q(x)]. Remember: in this problem the variables are P(x,y); the x and y themselves are just summation indices. (Probably that would be a lot clearer if we wrote \sum_{i,j} P(i,j) \log P(i,j), etc.) So: for any fixed (x,y), dL/dP(x,y) = \log P(x,y) + 1 + \lambda_{x}.

RGV

Legendre · Jul 7, 2011

Ray Vickson said:

Do you want to solve \max \sum_{x,y} P(x,y) \log P(x,y), subject to \sum_{y} P(x,y) = q(x) for all x? That is very different from what you wrote; the correct Lagrangian would be \sum_{x,y} P(x,y) \log P(x,y) + \sum_{x} \lambda_{x} [\sum_{y} P(x,y) - q(x)]. Remember: in this problem the variables are P(x,y); the x and y themselves are just summation indices. (Probably that would be a lot clearer if we wrote \sum_{i,j} P(i,j) \log P(i,j), etc.) So: for any fixed (x,y), dL/dP(x,y) = log P(x,y) + 1 + \lambda_{x}.

RGV

Yes thanks!

There is something else I don't get...

Page 10 of this lecture notes: http://pillowlab.cps.utexas.edu/teaching/CompNeuro10/slides/slides16_EntropyMethods.pdf

Says we should get the solution as P(x,y) = P(x) P(y). I have no idea how to get this.

Furthermore, they seem to have dL/dP(x,y) = \log P(x,y) + 1 + \Sigma \lambda_{x} P<sub>x</sub>(x). , which I think makes sense because the optimal P(x,y) should take into account the constraints?

The notes are for integrals but I am assuming it works for summations?

Ray Vickson · Jul 7, 2011

The formula P(x,y) = f(x)*f(y) (not P(x)*P(y) ) only works if f(x) = q(x) [assuming sum_{y} f(y) = 1]

Ray Vickson · Jul 7, 2011

The results on page 10 are for a different probl than the one you posed in this Forum. Your posed problem has the solution P(x,y) = q(x)/N for all y, where N = number of y-values [assuming a discrete problem]. This follows from the Lagrange condition
log P(x,y) + 1 + lambda_x = 0 for all (x,y). That is, the distribution is uniform over y. The problem on page 10 of your link is:
max -sum_{x,y} P(x,y) log P(x,y), subject to sum_{y} P(x,y) = r(x) for all x, and sum_{x} P(x,y) = s(y) for all y. In this case the Lagrange method gives P(x,y) = r(x)*s(y) for all x and y.

Legendre · Jul 7, 2011

Ray Vickson said:

The results on page 10 are for a different probl than the one you posed in this Forum. Your posed problem has the solution P(x,y) = q(x)/N for all y, where N = number of y-values [assuming a discrete problem]. This follows from the Lagrange condition
log P(x,y) + 1 + lambda_x = 0 for all (x,y). That is, the distribution is uniform over y. The problem on page 10 of your link is:
max -sum_{x,y} P(x,y) log P(x,y), subject to sum_{y} P(x,y) = r(x) for all x, and sum_{x} P(x,y) = s(y) for all y. In this case the Lagrange method gives P(x,y) = r(x)*s(y) for all x and y.

No, I meant to ask you about the problem on page 10. Your original reply solved my original problem.

To make it more precise:

Ignore everything that I have asked so far. The new problem is...

max - \Sigma_x,y P(x,y) log P(x,y), subject to \Sigma_y P(x,y) = r(x) for all x, and \Sigma_x P(x,y) = s(y) for all y.

Why isn't dL/dP(x,y) = 1 + log P(x,y) + 1 + \Sigma_i\lambda {x_i} +\Sigma_i\lambda {y_i}?

Since the constraint look like:

\lambda_{x_i}\Sigma_y [ P(x_i,y) - r(x_i) ] in the lagrangian.

Shouldn't d / dP(x,y) of this be just \lambda_x?

Thanks Ray, you're of great help! :)

Ray Vickson · Jul 7, 2011

Legendre said:

No, I meant to ask you about the problem on page 10. Your original reply solved my original problem.

To make it more precise:

Ignore everything that I have asked so far. The new problem is...

max - \Sigma_x,y P(x,y) log P(x,y), subject to \Sigma_y P(x,y) = r(x) for all x, and \Sigma_x P(x,y) = s(y) for all y.

Why isn't dL/dP(x,y) = 1 + log P(x,y) + 1 + \Sigma_i\lambda {x_i} +\Sigma_i\lambda {y_i}?

Since the constraint look like:

\lambda_{x_i}\Sigma_y [ P(x_i,y) - r(x_i) ] in the lagrangian.

Shouldn't d / dP(x,y) of this be just \lambda_x?

Thanks Ray, you're of great help! :)

L = -\sum_{x,y} P(x,y) \log P(x,y) + \sum_{x} \lambda_x (\sum_{y} P(x,y) - r(x)) + \sum_{y} \mu_y (\sum_{x} P(x,y) - s(y))
\displaystyle \frac{dL}{dP{x,y)} = -1 - \log P(x,y) + \lambda_x + \mu_y = 0
Thus, P(x,y) = \exp(-1+\lambda_x + \mu_y), so r(x) = \exp(-1 + \lambda_x) \sum_{y} \exp(\mu_y), which means that \exp(\lambda_x) = c_1 r(x), where c_1 = \exp(1) / \sum_{y} \exp(\mu_y) is a constant, and similarly, the equation \sum_{x} P(x,y) = s(y) leads to \exp(\mu_y) = c_2 s(y) for a constant c_2. Altogether we get P(x,y) = c_1 c_2 \exp(-1) r(x) s(y) = c r(x) s(y), where c is a constant. Normalizing gives c = 1.

RGV

Legendre · Jul 8, 2011

Ray Vickson said:

L = -\sum_{x,y} P(x,y) \log P(x,y) + \sum_{x} \lambda_x (\sum_{y} P(x,y) - r(x)) + \sum_{y} \mu_y (\sum_{x} P(x,y) - s(y))
\displaystyle \frac{dL}{dP{x,y)} = -1 - \log P(x,y) + \lambda_x + \mu_y = 0
Thus, P(x,y) = \exp(-1+\lambda_x + \mu_y), so r(x) = \exp(-1 + \lambda_x) \sum_{y} \exp(\mu_y), which means that \exp(\lambda_x) = c_1 r(x), where c_1 = \exp(1) / \sum_{y} \exp(\mu_y) is a constant, and similarly, the equation \sum_{x} P(x,y) = s(y) leads to \exp(\mu_y) = c_2 s(y) for a constant c_2. Altogether we get P(x,y) = c_1 c_2 \exp(-1) r(x) s(y) = c r(x) s(y), where c is a constant. Normalizing gives c = 1.

RGV

Nice! Thanks a lot for the help!

Problem resolved. :D

Maximizing the Lagrangian with Constraints to Solving Problems

Homework Statement

Homework Equations

The Attempt at a Solution

Homework Statement

Homework Equations

The Attempt at a Solution

Similar threads

Distance between a Clock's hands when the distance is increasing most rapidly

Volume with spherical coordinates

Does this series converge uniformly?

Use greedy vertex coloring algorithm to prove the upper bound of χ

Polar integral

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers