Constrained Optimization via Lagrange Multipliers

Kreizhn · Aug 5, 2010

Hi,

I'm trying to do a constrained optimization problem. I shall omit the details as I don't think they're important to my issue. Let f:\mathbb R^n \to \mathbb R and c:\mathbb R^n \to \mathbb R^+\cup\{0\} be differentiable functions, where \mathbb R^+ = \left\{ x \in \mathbb R : x> 0 \right\}. The problem I want to solve is
\min_{\vec x\in\mathbb R^n} f(\vec x), \quad \text{ subject to } c(\vec x) = 0

Now all constrained optimization procedures that I know of use Lagrange multipliers. In this case, the Lagrangian would be
L(\vec x,\lambda) = f(\vec x) - \lambda c(\vec x)
and the key to this theory is that at the optimal point \vec x^* the gradients are parallel, so that
\nabla f(\vec x^*) = \lambda \nabla c(\vec x^*) [/itex]<br /> However, every x that satisfies c(\vec x)=0 is a minima of c(\vec x), we have that \nabla c(\vec x) =\vec 0 for all feasible points \vec x. Hence the gradient of the constraint is orthogonal to every feasible point and Lagrange multipliers breaks down.<br /> <br /> Is there a way to fix this? To change the constraint so that the gradient is non-zero? Should I switch to a non-Lagrange multiplier method? If such a method exists, what are they called? Any help would be appreciated.

HallsofIvy · Aug 6, 2010

Kreizhn said:

Hi,

I'm trying to do a constrained optimization problem. I shall omit the details as I don't think they're important to my issue. Let f:\mathbb R^n \to \mathbb R and c:\mathbb R^n \to \mathbb R^+\cup\{0\} be differentiable functions, where \mathbb R^+ = \left\{ x \in \mathbb R : x> 0 \right\}. The problem I want to solve is
\min_{\vec x\in\mathbb R^n} f(\vec x), \quad \text{ subject to } c(\vec x) = 0

Now all constrained optimization procedures that I know of use Lagrange multipliers. In this case, the Lagrangian would be
L(\vec x,\lambda) = f(\vec x) - \lambda c(\vec x)
and the key to this theory is that at the optimal point \vec x^* the gradients are parallel, so that
\nabla f(\vec x^*) = \lambda \nabla c(\vec x^*) [/itex]<br /> However, every x that satisfies c(\vec x)=0 is a minima of c(\vec x), we have that \nabla c(\vec x) =\vec 0 for all feasible points \vec x.

<br /> Are you saying, that, for this particular c, c(x)= 0 for all x such that x is a minimum of c(x)? What about maxima? They will also have \nabla c= 0. And, of course, if both maxima and minima of c occur when c(x)= 0, c is identically 0. That's not much of a constraint! <br /> <br /> <blockquote data-attributes="" data-quote="" data-source="" class="bbCodeBlock bbCodeBlock--expandable bbCodeBlock--quote js-expandWatch"> <div class="bbCodeBlock-content"> <div class="bbCodeBlock-expandContent js-expandContent "> Hence the gradient of the constraint is orthogonal to every feasible point and Lagrange multipliers breaks down.<br /> <br /> Is there a way to fix this? To change the constraint so that the gradient is non-zero? Should I switch to a non-Lagrange multiplier method? If such a method exists, what are they called? Any help would be appreciated. </div> </div> </blockquote> Well, you could, going back to Calc I methods, solve the constraint equation for one of the variables in terms of the others. That would reduce to a problem in one less dimension with no constraint.

Kreizhn · Aug 10, 2010

Hey. Thanks for the reply and sorry it took me so long to reply in turn.

Perhaps I should have mentioned that c(x) is differentiable. In any case, the fact that the set S = c^{-1}(0) = \{ x \in \mathbb R^n : c(x) = 0 \} corresponds to minima of c is found from the codomain of c given by \mathbb R^+ \cup \{0\}. I think it's clear from this and differentiability then that any such x \in S will be a min of c(x) and hence \nabla c(x) \equiv \vec 0, \forall x \in S. I'm not sure why you mentioned maxima when they cannot possibly occur in the feasible set. I can further guarantee that S \neq \emptyset.
Thus the feasible set of solutions to
\min_{\vec x \in \mathbb R^n} f(\vec x)
with c(x) = 0 acting as an active constraint implies that the solution must occur in S. This is where the problem with the Lagrange multipliers comes in since the gradient of f and c cannot possibly be parallel.

Also, the constraint is highly non-linear so it is impossible to isolate any variables.

Do you know of a method other than say, penalty functions, that deals with constrained optimization without the use of lagrange multipliers?

I don't think it will help, but here are the functions:
f(\vec x) = \sum_{i=1}^n x_i^2
c(\vec x) = 2^{m+1} - 2\Re\text{Tr}\left[X_d^\dagger X_f(x) \right]
where X_d \in \mathfrak{SU}(2^m) and X_f: \mathbb R^n \to \mathfrak{SU}(2^m) is defined by
X_f(x) = \prod_{i=1}^n \exp\left[ -i H_i x_i \right]
for H_i \in \mathfrak{su}(2^m), i=1,\ldots, n.

trambolin · Aug 10, 2010

If you have some sort of a feasibility certificate, then every x that is a minimum does not make too much of a sense. Either your function is not convex so you have local minima or your objective is somewhat flat so there is no unique solution. Maybe you need more constraints to make some sense out of that equation constraints. Also put the equation constraints as x\leq 0, x \geq 0 if you already didn't.

Kreizhn · Aug 10, 2010

Thanks for the reply trambolin. I'm not certain about what you mean by

...then every x that is a minimum does not make too much of a sense.

The set of feasible points are precisely those which satisfy c(x) = 0. Any such x is a minimum of c(x). However, c(x) is not the function we are now trying to minimize. We are trying to minimize f(x) given that c(x) = 0. It's like we're trying to find the best \vec x that simultaneously satisfies
\min f(\vec x), \min c(\vec x)

To my knowledge we do not normally talk about the convexity of the objective function in the case of constrained optimization. The implication that the local min of the function is the global min is meaningless unless we are fortunate enough that the min lies within the feasible set. Whether the uniqueness condition of convexity holds or not for the non-linear case I'm not certain about, but nonetheless the issue is that we are unable to find solutions as L.M methods break down.

Theoretically, there are no more constraints to add. This is equivalent to a fixed arc number, terminal-state time-optimal control problem. The constraint equation is precisely that the Lie-group restricted evolution of the identity matrix driven by the control fields H_i arrive at a desired matrix; that is, that our initial state arrives at the desired final state. We now want to minimize the time in which this happens.

trambolin · Aug 11, 2010

If you have a feasibility certificate (say first order, Slater, or second order etc. ) you have a feasible set right? So, there are some x such that c(x) = 0. You say every x is a minimum of c(x). Now what I am saying is this. Why do you even bother about this? It is the f(x) that should be the point of interest. You take any x from the feasible set and plug into f(x) and that gives you a number. And you minimize over this. Now if f is not convex OR if your set is not path connected you run into problems such as getting stuck in a local minimum or not having all the feasible points in the optimization scheme. Try Yalmip wiki for some additional technical details. What you write here as a question is the essence of the optimality. Hence the lagrange multiplier does not break down it tells you that you find the optimal (arguably local) point.

Kreizhn · Aug 11, 2010

What do you mean "why do you even bother with [c(x)]"? This is the essence of equality constrained optimization. The constraints are what define the feasible set; that is, the feasible set is precisely the level set of the constraint function. Furthermore, the theory of Lagrange multipliers for implementing solutions makes use of the critical fact that the solution will occur at a point at which the gradient of the constraints and the objective function are parallel.

Now here's the thing, it is that theory of Lagrange multipliers from which the Karush-Kuhn-Tucker conditions are derived, and consequently the Slater, first and second order optimality conditions. The fact that the level set if c(x) = 0 also coincides with minima of c(x) means that all such optimality conditions break down. In particular, the statement that if \mathcal L(x,\lambda) is the Lagrangian, then at the optimal point in the state/costate space (x^*,\lambda^*)
\nabla_x \mathcal L(x^*,\lambda^*) = 0 \Rightarrow \nabla f(x^*) = \lambda^* \nabla c(x^*)
This condition will never be satisfied. However, we know a solution exists because not only is the feasible set non-empty (which we can prove), it's also believed (heuristically) to be totally disconnected. Thus while we would be satisfied with an infimizing solution, it seems the set is at most countable within an arbitrary radius of the origin.

Furthermore, since we do not have a priori knowledge of the location of all members of the feasible set, we cannot apply combinatorial optimization techniques to the problem. Hence we rely on relaxation techniques of the feasible domain. Hence why Lagrangian methods have hitherto been employed.

Kreizhn · Aug 11, 2010

If you need further evidence, note that the proof of the first-order optimality conditions (the KKT conditions) require that the optimal solution satisfy the linear independence constraint qualification (LICQ). In this case the active constraint gradient is identically zero, which is itself a linearly dependent vector. Hence KKT and anything derived from it cannot be used to solve this problem.

trambolin · Aug 11, 2010

Yes but we cannot understand each other. Minimization over your manifold, if I am not mistaken, can be embedded in a semidefinite programming setup. Maybe I can point you to the solvers which handles these kind of problems and you can check them.

http://users.isy.liu.se/johanl/yalmip/pmwiki.php?n=Tutorials.AutomaticDualization"

trambolin · Aug 11, 2010

Also I forgot to ask if the manifold is linear or not that you optimize over. And also if you checked the dual?

Kreizhn · Aug 12, 2010

Yes, there seem to be some issues in communicating our points across.

The ambient space of the domain is linear as it is just \mathbb R^n. The zero-level set of the constraint is not linear.

I have not checked the dual, but to my understanding maximizing the dual will not give you the solution unless you have a constraint qualification. LICQ most certainly does not hold, so that seems to be moot.

Constrained Optimization via Lagrange Multipliers

Similar threads

Undergrad Geometry problem of interest with a 3-4-5 triangle

Undergrad Trigonometry problem of interest

Insights Fixing Things Which Can Go Wrong With Complex Numbers

High School Can Six Pencils Be Arranged to Each Touch the Other Five?

High School Excel: converting a 3-ish week count into a monthly count

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers