Constrained Optimization with the KKT Approach

Click For Summary

Discussion Overview

The discussion revolves around the KKT (Karush-Kuhn-Tucker) approach to constrained optimization, particularly in the context of minimizing a function subject to equality and inequality constraints. Participants explore the implications of the KKT conditions, the formulation of the generalized Lagrangian, and the relationship between unconstrained and constrained minimization.

Discussion Character

  • Technical explanation
  • Conceptual clarification
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant describes the formulation of the generalized Lagrangian and its role in constrained optimization, questioning how minimizing the maximum Lagrangian relates to finding a minimum within the constraint set.
  • Another participant emphasizes the need for a methodical search to ensure the minimum is found over the constrained set, expressing confusion about how unconstrained minimization could yield a minimum within the constraints.
  • Some participants propose that maximizing the multipliers for constraints forces the minimization to remain within the feasible region defined by the constraints.
  • One participant presents a specific example to illustrate their confusion regarding the application of the KKT approach, questioning the correct ordering of minimization and maximization steps.
  • Another participant suggests that the ordering of operations in the KKT approach may be misunderstood, advocating for a specific sequence of minimization and maximization to clarify the process.

Areas of Agreement / Disagreement

Participants express varying levels of understanding and agreement regarding the KKT approach and its application. Some agree on the necessity of maximizing the multipliers to ensure constraint adherence, while others remain confused about the relationship between constrained and unconstrained minimization, indicating that the discussion is unresolved.

Contextual Notes

Participants highlight the importance of understanding the constraints and the implications of the KKT conditions, but there are unresolved questions about the mathematical steps and the correct application of the KKT approach in practice.

SilverSoldier
Messages
26
Reaction score
3
I'm reading the book Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville, and currently reading this chapter on numerical methods--specifically, the section on constrained optimization.

The book states the following.

Suppose we wish to minimize a function ##f\left(\textbf{x}\right)## over some set $$S=\left\{\textbf{x}\,|\,\forall\,i,g^{(i)}\left(\textbf{x}\right)=0,\,\forall\,j,h^{(j)}\left(\textbf{x}\right)\leq0\right\},$$ where ##g^{(i)}## and ##h^{(j)}## are functions called the equality constraints and inequality constraints respectively; i.e., we wish to find ##\displaystyle\min_{\textbf{x}\in S}f\left(\textbf{x}\right)##.

To do this, we start by defining a function ##L\left(\textbf{x}, \boldsymbol{\lambda}, \boldsymbol{\alpha}\right)## called the generalized Lagrangian as follows; $$L\left(\textbf{x}, \boldsymbol{\lambda}, \boldsymbol{\alpha}\right)=f\left(\textbf{x}\right)+\sum_i\lambda_i\cdot g^{(i)}\left(\textbf{x}\right)+\sum_j\alpha_j\cdot h^{(j)}\left(\textbf{x}\right).$$ Then, $$\min_{\textbf{x}\in S}f\left(\textbf{x}\right)=\min_{\textbf{x}}\max_{\boldsymbol{\lambda}}\max_{\boldsymbol{\alpha}, \boldsymbol{\alpha}\geq0}L\left(\textbf{x}, \boldsymbol{\lambda}, \boldsymbol{\alpha}\right).$$ The way I understand this is that we start by picking some random values for ##\textbf{x}## and ##\boldsymbol{\lambda}##, and then find the best non-negative ##\boldsymbol{\alpha}## for which ##L\left(\textbf{x}, \boldsymbol{\lambda}, \boldsymbol{\alpha}\right)## is maximum.

Say the ##\textbf{x}## we picked randomly happened to lie in ##S##. Then we should end up at the result that ##\boldsymbol{\alpha}=\textbf{0}##.
It remains now to maximize ##L\left(\textbf{x}, \boldsymbol{\lambda}, \textbf{0}\right)## over ##\boldsymbol{\lambda}##, and the minimize the resulting function over ##\textbf{x}##. Again, because the ##\textbf{x}## we have picked randomly happens to lie in ##S##, ##L\left(\textbf{x}, \boldsymbol{\lambda}, \textbf{0}\right)## is already maximum for any ##\boldsymbol{\lambda}##, because ##\forall\,i, g^{(i)}\left(\textbf{x}\right)=0##.

Therefore, in such a case where the ##\textbf{x}## we pick randomly just so happens to lie in ##S##, we see that $$\max_{\boldsymbol{\lambda}}\max_{\boldsymbol{\alpha}, \boldsymbol{\alpha}\geq0}L\left(\textbf{x}, \boldsymbol{\lambda}, \boldsymbol{\alpha}\right)=f\left(\textbf{x}\right).$$ What I do not understand is how minimizing ##\displaystyle\max_{\boldsymbol{\lambda}}\max_{\boldsymbol{\alpha}, \boldsymbol{\alpha}\geq0}L\left(\textbf{x}, \boldsymbol{\lambda}, \boldsymbol{\alpha}\right)=f\left(\textbf{x}\right)## over ##\textbf{x}## now will keep ##\textbf{x}## within ##S##, and find a minimum in this constrained region.

If $$\max_{\boldsymbol{\lambda}}\max_{\boldsymbol{\alpha}, \boldsymbol{\alpha}\geq0}L\left(\textbf{x}, \boldsymbol{\lambda}, \boldsymbol{\alpha}\right)=f\left(\textbf{x}\right),$$ mustn't we have $$\min_{\textbf{x}}\max_{\boldsymbol{\lambda}}\max_{\boldsymbol{\alpha}, \boldsymbol{\alpha}\geq0}L\left(\textbf{x}, \boldsymbol{\lambda}, \boldsymbol{\alpha}\right)=\min_{\textbf{x}}f\left(\textbf{x}\right),$$ which is different from ##\displaystyle\min_{\textbf{x}\in S}f\left(\textbf{x}\right)##?
 
Physics news on Phys.org
SilverSoldier said:
It remains now to ..... minimize the resulting function over ##\textbf{x}##. .....

SilverSoldier said:
mustn't we have $$\min_{\textbf{x}}\max_{\boldsymbol{\lambda}}\max_{\boldsymbol{\alpha}, \boldsymbol{\alpha}\geq0}L\left(\textbf{x}, \boldsymbol{\lambda}, \boldsymbol{\alpha}\right)=\min_{\textbf{x}}f\left(\textbf{x}\right),$$ which is different from ##\displaystyle\min_{\textbf{x}\in S}f\left(\textbf{x}\right)##?
Yes. You already said that you need to minimize over ##\textbf{x}##. If ##\textbf{x}## was picked randomly, then you would have to repeat this process enough to feel confident that you have found a minimum over ##\textbf{x}##, or do a methodical search for the minimum.
 
FactChecker said:
Yes. You already said that you need to minimize over ##\textbf{x}##. If ##\textbf{x}## was picked randomly, then you would have to repeat this process enough to feel confident that you have found a minimum over ##\textbf{x}##, or do a methodical search for the minimum.
But it is necessary to find the minimum over values in ##S## only. How does an unconstrained minimization of ##f## over all values of ##\textbf{x}## result in its minimum over values in ##S##? :confused:
 
SilverSoldier said:
But it is necessary to find the minimum over values in ##S## only. How does an unconstrained minimization of ##f## over all values of ##\textbf{x}## result in its minimum over values in ##S##? :confused:
Oh, I am sorry. I missed the point. By maximizing over ##\lambda_x## and ##\alpha_x \ge 0## for any given ##x##, you are finding the multipliers that provide the best penalty for violating the constraints of ##g## and ##h##. So that forces the minimization over ##x## to stay within ##S##.

UPDATE: After thinking more about this, I do not feel I am knowledgeable enough to give authoritative help here. I will leave this to others.
 
Last edited:
I think fact checker has it. If you pick a value of x for which ##g^i(x)>0## you can pick ##\lambda_i>>>0## and ##L## is very positive. Similarly if ##g^i(x) <0## you can pick ##\lambda_i<<<<0##. So ##g^i(x)=0## when you minimize ##L##.

But for the inequality constraints you only get ##h^i(x)\leq0## since otherwise we make ##\alpha^i>>>>0##
 
  • Like
Likes   Reactions: FactChecker
I'm sorry I still don't understand it and I am quite confused. Shouldn't we have ##\displaystyle\min_{\textbf{x}\in S}f\left(\textbf{x}\right)=\max_{\boldsymbol{\lambda}}\max_{\boldsymbol{\alpha}, \boldsymbol{\alpha}\geq0}\min_{\textbf{x}}L\left(\textbf{x}, \boldsymbol{\lambda}, \boldsymbol{\alpha}\right)## or ##\displaystyle\min_{\textbf{x}\in S}f\left(\textbf{x}\right)=\max_{\boldsymbol{\alpha}, \boldsymbol{\alpha}\geq0}\max_{\boldsymbol{\lambda}}\min_{\textbf{x}}L\left(\textbf{x}, \boldsymbol{\lambda}, \boldsymbol{\alpha}\right)## instead?

Say we wanted to find the minimum of ##\dfrac{x^2+y^2}{5}## subject to the constraints ##x^2+y-2\leq0## and ##10x-y-22=0##. The actual minimum, which is ##1.6## given these constraints, is achieved at ##x=2## and ##y=-2##. To find this minimum using the KKT approach, we first define the generalized Lagrangian, as follows; $$\dfrac{x^2+y^2}{5}+\lambda\left(10x-y-22\right)+\alpha\left(x^2+y-2\right).$$ Say we start by picking ##\left(1.8, -4\right)## randomly for ##\left(x,y\right)##. Notice ##\left(1.8, -4\right)## satisfies both constraints and so ##\displaystyle\max_{\boldsymbol{\lambda}}\max_{\boldsymbol{\alpha}, \boldsymbol{\alpha}\geq0}L\left(\textbf{x}, \boldsymbol{\lambda}, \boldsymbol{\alpha}\right)=f\left(1.8, -4\right)=3.85##. What does it now mean to minimize this? Does this mean we now vary ##x## and ##y## until we look for values that minimize ##\dfrac{x^2+y^2}{5}##?

Can you also please describe how this approach may be implemented in code?
 
I think you're thinking about the ordering wrong. The first thing is min over x. So you can imagine having some function ##K(x)## that e are trying to minimize. That function is max over ##\lambda##, max over ##\alpha##, ##L(x,\lambda, \alpha)##. So the ordering in the OP says for each ##x##, pick the worst choice of ##\lambda## and ##\alpha## to compute ##K(x)##. Then find the ##x## that makes this as small as possible.
 
Last edited:
  • Like
Likes   Reactions: FactChecker
SilverSoldier said:
I'm sorry I still don't understand it and I am quite confused. Shouldn't we have ##\displaystyle\min_{\textbf{x}\in S}f\left(\textbf{x}\right)=\max_{\boldsymbol{\lambda}}\max_{\boldsymbol{\alpha}, \boldsymbol{\alpha}\geq0}\min_{\textbf{x}}L\left(\textbf{x}, \boldsymbol{\lambda}, \boldsymbol{\alpha}\right)## or ##\displaystyle\min_{\textbf{x}\in S}f\left(\textbf{x}\right)=\max_{\boldsymbol{\alpha}, \boldsymbol{\alpha}\geq0}\max_{\boldsymbol{\lambda}}\min_{\textbf{x}}L\left(\textbf{x}, \boldsymbol{\lambda}, \boldsymbol{\alpha}\right)## instead?
If the first thing you do is minimize wrt ##x## then you are already too late to try to keep ##x## within ##S##. The first thing you must do is to maximize wrt ##\lambda## and ##\alpha## to force ##x## to stay within ##S##.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 0 ·
Replies
0
Views
2K
  • · Replies 0 ·
Replies
0
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 4 ·
Replies
4
Views
3K
Replies
5
Views
3K
  • · Replies 18 ·
Replies
18
Views
4K
Replies
4
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K