# Penalty Function

1. Jun 4, 2007

### daudaudaudau

Hi.

This is just a general question about a problem I think I'm having. If I have a function f(x1,x2,x3) and I want to find it's global minimum for x1>0,x2>0,x3>0 using unconstrained minimization, I've read that I can use a penalty function to make the function increase monotonically if any of the variables become less than 0. This way a minimum will never be found below 0.

BUT what if f(...) is decreasing at the time when (for instance) x1 goes below zero? Then I've created a valley in my function that is not supposed to be there, and the optimization routine thinks this is the minimum. At least I suspect.

Is this not a general problem with penalty functions?

Regards,
Anders

2. Jun 5, 2007

### caslav.ilic

The penalty function will generally have some sort of constant penalty multiplier, which puts relative weight on the penalty when a constraint is broken.

The unconstrained optimization algorithm is started with some value of the multiplier, and as you note, it may indeed find the false minimum, which can be checked by seeing that some constraints have remained broken. However, the hope is that this minimum is nevertheless somewhere near the true minimum, so the idea is to increase the penalty parameter after each unconstrained optimization, starting new unconstrained run from the point of previous minimum. These "external" iterations, with increasing penalty parameter in between, are then done as long as one is not happy with how much the constraints are broken.

A question may be why starting with a "low" penalty multiplier at all, not going immediatelly for a "high" one? That's because the high penalty multiplier will make most uncostrained algorithms perform poorly ("ill-conditioned Hessian" is the jargon), i.e. take too much time or run into floating-point problems. So, it's best to be close to the solution when penalty multiplier is high, and therefore the gradual increase of the multiplier.

Another problem is that too low penalty multiplier may make the optimization diverge. Hopefully, the unconstrained optimization routine should have some way of detecting and reporting divergence, so that the multiplier can be increased and the optimization restarted from the same point.

--
Chusslove Illich (Часлав Илић)

3. Jun 5, 2007

### daudaudaudau

Hi.

Thank you very much for your comments. Increasing the multiplier value really helped me a lot and it makes my function converge more often. Right now I just double the value of the mulitplier between runs. But I don't use the "false minimum" as the starting point for further iterations. If I do that it seems I'm stuck no matter what value the multiplier has. My penalties are not directly based upon the arguments to the objective function, but on a quantity that I calculate inside the function. Maybe this has something to do with it?

Also I observe that for some initial conditions I find better minima than for other, even though none of them are particularly close to the found minimum. This must just be attributed to the fact that my function has multiple minima, and depending on the initial conditions the optimization algorithm follows different paths, don't you think?

- Anders

4. Jun 6, 2007

### caslav.ilic

Well, you can start from the one same point (discard current minimum) after each increase of multiplier, but then those problems with convergence may appear. Or not, depends on what is the function like and how the penalty is constructed.

If the function has multiple minima, then indeed the usual optimizers will find the "nearest" one, not the global. I still didn't get to looking into deterministic global optimizers :)

However, I forgot to ask, what do you use as unconstrained optimizer anyway? I just assumed it's some gradient-based scheme, but... The problem of being stuck in "false minimum" so that you have to restart, may depend on this as well.

--
Chusslove Illich (Часлав Илић)

5. Jun 6, 2007

### daudaudaudau

I'm using the nonlinear least squares algorithm in Matlab (lsqnonlin) which is gradient based as far as I can see.

6. Jun 7, 2007

### caslav.ilic

Now it depends on whether you consider the problem "solved enough" :), but in general few other things could be investigated.

Try out different constraint penalizations. Don't know what exactly you are using now, but two typical would be quadratic and absolute valued, both having the constant multiplier. The quadratic is smoother which is nice for the optimization algorithm, but the absolute valued has a theoretical ability to produce exact minimum when the multiplier is high enough (quadratic is only "getting arbitrarely close").

Try out different unconstrained optimizers, I don't know what exactly Matlab has, but I'd expect more of them are available. Least-squares ones are tweaked to some properties of least squares problems, from which your problem may deviate when the penalties are added, and that may confuse the algorithm. A typical general algorithm would be something like BFGS.

Try out the compositions (optimizer, penalties, multiplier increases, starting points) on a simple problem, just to make sure they at least reasonably fit together. For example (out of Nocedal & Wright):
$$\min f(x_1, x_2) = 2 (x_1^2 + x_2^2 - 1) - x_1, \quad \textrm{subject to } x_1^2 + x_2^2 - 1 \ge 0$$
for which the solution is (1, 0).

--
Chusslove Illich (Часлав Илић)

7. Jun 8, 2007

### daudaudaudau

Hi.

It doesn't seem to make much of a difference whether I use quadric or absolute valued penalties. I did a few tests and the results were almost the same.

My teacher also suggested using different algorithms. So far I have tried LSQ and minimax. I find a minimum with LSQ and then pass it on as the initial point for the minimax algorithm. Maybe I should say that the problem I'm solving is about making a 2D function as "flat" as possible in an interval 0<x<1. Using LSQ, "flat" means fitting my function to a constant level, whereas in the minimax case "flat" is minimizing the maximum distance from my function to the constant level(which is also a variable). This may sound like a strange problem, but it has to do with analog filter design.

I will take your advice and try out the different combinations. Right now my main concern is that of picking the right initial guess. Most times I get decent minima (I think my function has a lot!) and sometimes I hit what I believe is the global minimum. Trying out different algorithms might help, I guess. But so far I'm pretty satisfied with the performance of the minimization. Increasing the penalty multiplier was a significant step for me.

-Anders