# Homework Help: Implementing barrier/penalty functions to *Equality* constant

1. May 30, 2017

### maistral

Hi. I seem to have forgotten how to implement equality constraints to barrier NLPs and quadratic NLPs.

Say for example I have this problem:

Max Z = x12 + 2 x22
ST:
x12 + x22 ≤ 1
x1+ x2 ≤ 1

The unconstrained problem (quadratic penalty - correct me if I'm wrong) then becomes
Z = - x12 - 2 x22 - α [x12 + x22 - 1]2 - β[x1+ x2 - 1]2

I'm wondering how will I implement another constraint in equality format; ie.

Max Z = x12 + 2 x22
ST:
x12 + x22 ≤ 1
x1+ x2 ≤ 1
x1- x2 = 0.5 ← how do I add this constraint in penalty, or barrier form?

2. May 30, 2017

### Ray Vickson

Your penalties for inequality-constraint violation are incorrect; what you wrote is appropriate to constraints of the form $x_1^2+x_2^2 = 1$ and $x_1+x_2 = 1$. (So, you already have the correct penalty for an equality constraint!)

For inequality constraints you can either:
(1) use a barrier function such as $M/(1-x_1^2-x_2^2)$, and start from a point where the denominator is > 0. Then minimization will keep the denominator >0 at all points. The resulting problem becomes increasingly difficult as we decrease $M$ to make the barrier "tighter", and if the true solution is on the boundary of the constraint you will never reach it.
(2) Use a one-sided penalty such as $M \max(0,x_1^2+x_2^2-1)$, but this would be harder to deal with due to its inherently non-differentiable nature.
A better-behaved version is $M [\max(0,x_1^2-x_2^2-1)]^2$, which at least has continuous first derivatives (and this is often very important).

A better approach combines Lagrange multipliers with penalties; it is known as the Method of Multipliers, or Augmented Lagrangian method. For your problem one possible formulation would be
$$\max x_1^2+x_2^2 - M_1(x_1+x_2-1)^2 - M_2 (\max(0,x_1^2+x_2^2-1))^2 \\ + \lambda_1(x_1+x_2-1) + \lambda_2 (1-x_1^2-x_2^2).$$
In a "pure" penalty method (with no $\lambda$ parameters) the penalty parameters $M_1, M_2$ must get very large in order to give a good solution; at the same time, real-world algorithms on finite word-length computing machines have increasing difficulty dealing with such large-$M$ problems (eventually forcing a stop). Magically, including the Lagrange multiplier terms $\lambda_1, \lambda_2$ allow one to get an accurate (i.e., exact) solution without resorting to very large $M_1$ and $M_2$. Of course, one still needs to start with moderate values of the $M$s and perhaps increase them a few times, but at the same time the Lagrange multiplier estimates get updated as well.

See. eg., http://www.math.mtu.edu/~msgocken/ma5630spring2003/lectures/nlp/nlp/node6.html
or http://web.mit.edu/dimitrib/www/Multiplier_Methods_Paper.pdf

Last edited: May 30, 2017
3. May 30, 2017

### maistral

Thanks for replying. I have a few more questions.

1. I did remember using a barrier function, I think it was the logarithmic barrier that I combined with a Newton-Raphson type of optimization (I think this was the reason why first derivatives were so important; am I correct in this as well?). IIRC, my formulation was something like:

Z = x12 + 2x22 - μ [ ln(1 - x12 - x22) + ln(x1 + x2 - 1)]

Is this correct? If that's the case, then how should I add the equality constraint? Is it correct to just add the γ⋅(x1 - x2 - 0.5)2 (the γ there is the penalty coefficient) in there, since it will be penalized anyway if it deviates from the straight line equation? I mean, is this form correct:

Z = x12 + 2x22 - μ [ ln(1 - x12 - x22) + ln(x1 + x2 - 1)] - γ⋅(x1 - x2 - 0.5)2

or is it this one:

Z = x12 + 2x22 - μ [ ln(1 - x12 - x22) + ln(x1 + x2 - 1) + (x1 - x2 - 0.5)2]

Or there is another way to implement a logarithmic barrier that would surround the straight line (but if I recall correctly this is impossible because putting a straight line inside the logarithm will cause the value to reach infinity or something)?

2. If I go implement a 'pure penalty' method as you implied, if I get you correctly it should look like:

Z = - x12 - 2 x22 - α [max(0, x12 + x22 - 1)]2 - β[max(0, x1+ x2 - 1)]2 - γ(x1 - x2 - 0.5)2

Then I would run my unconstrained optimization methods. If that's the case, how should I write this max function in an equation form - How should I write the derivatives? I'm at a total loss to this one. Unless this max thing means writing if-else statements or sorts.

It would really help me if you would write the penalty form and the barrier form of the problems, so I can re-study and learn them, this time, correctly. This topic isn't included in our coursework, I was just interested in studying them (thus the belief of having the penalty method for inequality constraints written correctly for TWO years, and as you can see I was actually incorrect).

I'm sorry for the mega-load of questions. I really, really, want to learn these methods, but everytime I try to study them it's like translating hieroglyphics using Chinese characters. I even ended up learning the method incorrectly as no one is teaching me how to do this correctly. I will have to revise my "memory notes" . I will also read the papers you sent me. Thank you very much for replying.

PS: If it matters these are what I have been reading:
https://ocw.mit.edu/courses/enginee...ing-2010/lecture-notes/MITESD_77S10_lec08.pdf
http://users.jyu.fi/~jhaka/opt/TIES483_constrained_indirect.pdf

Last edited: May 30, 2017
4. May 30, 2017

### Ray Vickson

*****************************************
No, not quite correct. It should be
$$Z =x_1^2 + 2 x_2^2 -\mu_1 \ln(1-x_1^2-x_2^2) - \mu_2 \ln(1 - x_1 - x_2),$$
where $\mu_1, \mu_2 > 0$ are penalty or barrier parameters (ultimately becoming large) Do you see why the form you wrote is incorrect?

****************************************

**************************
Yes, just append the term $-M(x_1 + x_2 - 0.5)^2$, $M > 0$. Note the minus sign, since you are maximizing and so want a constraint violation to lessen your objective.

***************************
**************************

As to the "derivative" issues: those are dealt with in the literature on penalty and/or barrier methods. For example,
http://www-personal.umich.edu/~mepelman/teaching/NLP/Handouts/NLPnotes12_89.pdf
has some formulas

************************
*******************

For practical constrained nonlinear optimization problems, many of the most effective methods deal with hybrid types of algorithms: they handle linear (inequality and equality) constraints by a version of the reduced-gradient, or generalized reduced-gradient method, while handling nonlinear constraints by an augmented Lagrangian approach using the types of penalties I showed above. Personally, I do not like logarithmic penalties (or barriers) because they do not allow for boundary solutions; IMHO a quadratic penalty is better, because boundary solutions are allowed and often even attained by well-designed algorithms that use augmented lagrangians with adjusted penalty parameters and adjusted lagrange multiplier estimates.

5. May 30, 2017

### maistral

Oh. So basically for every constraint there will be an assigned parameter.

Then if I follow the barrier methods, those parameters should reduce to 0, then if I follow penalty methods (ie. the one with the straight line equation), those should approach infinity?

Thank you very much. The cloud of confusion is dispersing.

6. May 31, 2017

### Ray Vickson

An alternative method of dealing with inequality constrains would be to use slack or surplus variables. Thus we could replace the inequality $x_1^2+x_2^2 \leq 1$ by $x_1^2 + x_2^2 + s = 1$ and require $s \geq 0$, or we could get rid of inequalities altogether by writing $x_1^2+x_2^2+ s^2 = 1$. The latter can be put into the objective with a penalty, so we get the penalty term $-M (x_1^2 + x_2^2 + s^2-1)^2$.

The fact that this rather obvious method is NOT recommended in the post-1960s literature is evidence that it has been tried many times in the past and been found not to work well. (Actually, after retiring I have not kept up with the literature, so I don't know whether the method has been tried with an augmented lagrangian approach. If not, that might make a decent Master's degree thesis topicl)

7. Jun 1, 2017

### maistral

Hi! So I researched more on the topic as I am seriously interested in this, I do hope your patience does not run out on me (lol ). I will, however, keep on trying to learn as much as I can.

So I searched more on the barrier methods and I found this paper:
http://www.sce.carleton.ca/faculty/chinneck/po/Chapter18.pdf

In this paper I see that they only used one parameter for ALL the constraints (the three inequalities and the single inequality). It 'did' work for my problem, but I have a few questions. Note that my optimization problem is:

Max Z = x12 + 2x22
ST:
x1 + x2 ≥ 1
x12 + x22 ≤ 1
x1 - x2 = 0.5

I ran the MS Excel Solver and found the optimum point x1 = 0.911438, x2 = 0.411438.

1. I used their method, which involved using an inverse function barrier and a quadratic penalty similar to what you have been telling me, except that they used a single parameter r only. My unconstrained problem is:

Z = x1 + 2x22 - u / (1 - (x12 + x22)) - u / ((x1 + x2) - (1 / √u) * (x1 - x2 - 0.5) 2

It does converge to 0.9110133 and 0.4113009. May I know, what is the implication/disadvantage of using one parameter only? Because it seems so easy to use that I'm terrified I might be doing something wrong again .

2. I tried modifying my method and used a logarithmic barrier instead. I strictly followed the (one parameter - literally changed the inverses to logarithms ONLY) but I seem to be unable to get the answer above. My formulation is this:

Z = x1 + 2x22 - u ln(1 - (x12 + x22)) - u ln((x1 + x2) - (1 / √u) * (x1 - x2 - 0.5) 2

It always converges to the point (0, 1) as if the straight line equality constraint isn't even in there. This logarithmic method converges to (0, 1) similar to the inverse barrier where I removed the equality constraint.

If I, however, try to remove the equality constraint from the logarithmic method, it converges to (0.6666, 0.3333) which isn't even an optima.

Thank you for your responses, sensei. Rest assured that your replies are not wasted, as I am studying this really hard.

Last edited: Jun 1, 2017
8. Jun 1, 2017

### maistral

Update:

I did, try to plot the logarithmic barrier equation and it does show the optimal point (0.91, 0.41). What's the problem? -_-

9. Jun 1, 2017

### maistral

Update 2: Kindly ignore the second question, I managed to fix it. Apparently I had to fix some signs as my setup for logarithmic barriers are for minimization.

10. Jun 2, 2017

### maistral

Update 3: I think I know now why changing all the inequalities to equalities + slack variables is a horrible idea.

The giant system of equations that have to be solved...

11. Jun 2, 2017

### Ray Vickson

That has nothing to do with how numerical optimization methods actually work: we do not try to solve large, horrible systems of equations "analytically". In typical unconstrained optimization algorithms we use successive direction-selecting, line-searching methods.

You need to abandon the idea that what you can do by hand for small problems will work as well on large problems. Solving constrained problems having thousands of variables and hundreds to thousands of constraints cannot be guided by hand-worked examples. And no: I am not joking. See, eg.,
https://scicomp.stackexchange.com/questions/17488/large-scale-nonlinear-optimization-problem ,
which describes a nonlinear optimization problem having up to about 1 million variables; the post is asking for advice about dealing with it. The author claims to have successfully solved cases having over 700,000 variables on an old notebook computer, using downloadable optimization packages.

Other examples of very large, real-world nonlinear optimizations can be found by entering the keywords 'large scale nonlinear programming' in Google.

Last edited: Jun 2, 2017
12. Jun 2, 2017

### maistral

Sorry. What I mean is the optima keeps on converging somewhere else (negative decision variables, in particular) with no physical meaning

13. Jun 9, 2017

### maistral

Hi, I have one last query regarding the the Augmented Lagrangian method. I do not understand the formulation for this one as the first one has no 'max' function and the other inequality has a max function. It's this one:

How do I write the unconstrained problem correctly, at least for my problem:

max Z = x1 + 2x22
ST:
x12 + x22 ≤ 1
x1 + x2 ≥ 1
x1 - x2 = 0.5

Is it:
max Z = x12 + 2x22 - M1[x1 + x2 - 1]2 - M2[1 - x12 + x22]2 - M3[x1 + x2 - 0.5]2 + λ1[x1 + x2 - 1] + λ2[1 - x12 - x22]

Then I'm supposed to optimize with respect to x1, x2, λ1 and λ2 while manipulating M1, M2, and M3 in a similar way to the barrier method.

By the way, thank you very much. At the bare minimum I am able to solve NLP's now, thanks to you. I think I'm getting the hang of it now, I just wanted to confirm my doubts regarding this ALM.

14. Jun 9, 2017

### Ray Vickson

It should be
$$\begin{array}{rcl} \max Z &=& x_1^2 + 2 x_2^2 -M_1 [\max(x_1^2+x_2^2-1,0)]^2 + \lambda_1 \max (x_1^2+x_2^2-1,0) \\ &&- M_2 [\max(1-x_1 - x_2 ,0)]^2 + \lambda_2 \max(1-x_1-x_2,0)\\ &&- M_3 (x_1 - x_2 - 1/2)^2 + \lambda_2 (x_1 - x_2 - 1/2) \end{array}$$

However, that would be a poor way of dealing with the problem. A much better strategy would be to adopt an active-set method, where we start off assuming some equalities, such as just $x_1 - x_2 = 1/2$ and no other constraints. If we can start from a point where the other two constraints are satisfied, then we can try to maximize the objective in the subspace $x_1-x_2 = 1/2$, but only until one of the other constraints would start to be violated. Numerically, you would do this by a sequence of line-searches, and would stop the procedure as soon as you encounter a boundary of one of the other two constraints. Then, you would add in the constraint whose boundary you just hit as an equality constraint (so try to stay on the boundary); at the same time you would use a Lagrange multiplier test to check if the original equality constraint should stay active or be released. (However, if we started with $x_1 - x_2 = 1/2$ we can never relax that, because it is required to remain an equality.)

Suppose, for example, we hit the boundary of the constraint $x_1^2 + x_2 ^2 \leq 1$, so are at a point $(x_{10}, x_{20})$ with $x_{10}-x_{20} = 1/2$ and $x_{10}^2 + x_{20}^2 = 1$. Now both constraints $x_1 - x_2 = 1/2$ and $x_1^2 + x_2^2 \leq 1$ are equalities, and the constraint $x_1+x_2 \geq 1$ is possibly inactive (that is, the inequality is possibly strict). In this case we would be done, because the two active constraints together determine uniquely the two variables $x_2$ and $x_2$. This example is really much too simple to show how things really would work.

Many of the most successful NLP methods adopt an active set strategy, with linear constraints dealt with directly (no penalties, just Lagrange multipliers), and with active nonlinear constraints dealt with using penalty + Lagrange multipliers (that is, a "method of multipliers"). When it is done that way we never have to deal with things like $\max(g(x),0)$, but only with 0 (if the inequality is inactive) or with $g(x)$ itself if the inequality is active.

15. Jun 9, 2017

### maistral

This is overwhelming. I'll try to digest it one step at a time, and I'll ask questions again after I understand the concepts you stated (and at least tried running the algorithm).

Thank you for being there, I strongly appreciate it. :D

16. Jun 9, 2017

### Ray Vickson

Nothing beats a good book.

17. Jun 9, 2017

### maistral

Too bad I learn by example. And the books I find are all in math hieroglyphics.

Still; I'll continue fighting.

18. Jun 9, 2017

### Ray Vickson

http://cepac.cheme.cmu.edu/pasi2011/library/biegler/PASI2011_LB_nlp.pdf

It talks about the (generalized) reduced-gradient method and how it is used in various real-world NLP algorithms---including the EXCEL Solver tool.

It tells you which of the available codes could handle problems having more than a million variables and/or constraints.

In the last third it studies penalty methods.

Enjoy and profit!

19. Jun 11, 2017

### maistral

I am trying to make this Augmented Lagrangian work at the bare minimum before I even attempt to go GRG.

So this is my problem. I tried studying this rather simple minimization problem to 'understand' how the ALM works.

min Z = x12
ST: x1 - 1 = 0

Obviously, the answer is x1 = 1. Thanks to you I can make every single barrier or penalty function work and solve the problem - As for the penalty function, I am able to solve it as:
min Z = x12 + M1 ⋅ (x1 - 1)2
where M1 is my penalty parameter.

My problem with this ALM is it does not seem to work, or I am missing something important. My equation is:
min Z = x12 + M1 ⋅ (x1 - 1)2 + λ1(x1 - 1)

If I understand the method correctly, apparently I have to optimize this equation with respect to x1 and λ1 while varying M1.

I seem to be unable to make it work. Assuming I start off with M1 = 1, my x1 is converging to zero while my Lagrange parameter is converging to a massive value approaching infinity if I attempt to minimize it.

If I use a massive value for M1 initially, then it WOULD converge to x1 = 1 but isn't doing the ALM pointless here?

Am I missing something here? Or the method fails in these kinds of problems?

Thanks!

20. Jun 11, 2017

### maistral

Ah, I found out what's wrong. I don't optimize the lambda parameter, I use an iterative algorithm to update it. Hold on.

21. Jun 11, 2017

### maistral

You have no idea how teary-eyed and satisfied I am right now.

At the bare minimum, I believe, I know how to execute the method and I have an idea how the method works, even a little bit. Thank you very much. Now I'm trying to mix the ALM with barrier functions and I'll see what happens. Thank you very much again!

22. Jun 11, 2017

### Ray Vickson

Why does the method work?

Look at solving $\min f(x)$, subject to $g(x) =0$ by a quadratic penalty method. For a fixed, finite penalty parameter $m > 0$ the unconstrained penalized problem is $\min Z_m = f(x) + m[g(x)]^2.$ In general, for finite $m$ the solution $x_m$ will not satisfy $g(x) = 0$, but instead will have $g(x_m) = r_m$, some non-zero value. The optimality condition is $\nabla_x Z_m = 0$, so $\nabla_x f(x_m) + 2 m r_m \nabla_x g(x_m) = 0.$ This is the first-order condition for the problem $\min f(x)$ subject to $g(x) = r_m$ for the nonzero value $r_m$; thus, we have the optimal solution of a "nearby" problem.

Now comes the clincher: suppose we look at the penalty method for a modified problem $\max f(x)$ subject to $g(x) = a$. The penalized problem would be $\max Z_{m,a} = f(x) + m [g(x)-a]^2$. For finite $m > 0$ the solution would not satisfy $g(x) = a$, but if we are clever and lucky it might satisfy $g(x) = 0$! In other words, we could get the solution of the $g = 0$ problem by using the penalty problem for $g=a$, if we somehow choose the right value of $a$.

Note that $f(x) + m[g(x)-a]^2 = f(x) + m[g(x)]^2 - 2ma g(x) + m a^2$. Apart from the additive constant $m a^2$, this is the augmented Lagrangian with $\lambda = 2 m a$. That means that if we can choose the right value of $\lambda$ the augmented penalty problem will get the exact solution right away. We do NOT need to make $m$ large for this to happen. In practice, one may choose to increase $m$ by some modest amount from one step to the next, while at the same time updating the value of $\lambda$. Workable algorithms along this line can be very fast and efficient.

As an example, consider your problem $\min x^2$ s.t. $x-1 = 0$. I will denote the Lagrange multiplier by $u$ instead of $\lambda$ (a common choice in the optimization literature), so the augmented penalty problem is
$$\min Z_m = x^2 + m(x-1)^2 + u(x-1)$$
For fixed $m > 0$ and $u$ the solution is
$$x = x_{m,u} = \frac{2m-u}{2m+2}$$.
This gives a new Lagrange multiplier equal to
$$u_{\text{new}} = u + 2m (x_{\text{new}} -1).$$

Fixing $m$ at $m = 1$ and starting from $u_0 = 0$ the sequence of solution values $x_n$ and $u_n$ are:
$$\begin{array}{rcc} n & x_n & u_n \\ 1 &0.500000 & -1.000000\\ 2 &0.750000 &-1.500000 \\ 3 &0.875000 &-1.750000 \\ 4 & 0.937500 & -1.875000 \\ 5 & 0.968750 &-1.937500 \\ 6 & 0.984375 & -1.968750 \\ 7 & 0.992188 &-1.984375 \\ 8 & 0.996094 & -1.992188 \\ 9 & 0.998047 & -1.996094 \\ 10 & 0.999023 & -1.998047 \\ 11 & 0.999512 & -1.999023 \\ 12 &0.999756 &-1.999512 \\ \end{array}$$

We see that the $x_n$ and $u_n$ are converging to the correct optimal values $x = 1$ and $u = -2$, and all that is happening while fixing $m > 0$ at the quite small value $m = 1$. Faster convergence would obtained by increasing $m$ as well from one step to the next.

Last edited: Jun 11, 2017
23. Jun 11, 2017

### maistral

I'll give it a shot. Thanks again for your help!