How to optimise something when multiple parameters change the output

CraigH · Mar 14, 2014

This dilemma seems to occurs all the time in so many different engineering problems. It seems impossible to optimise something with multiple parameters, for three reasons:

There are too many possible combinations of these parameters to be able to simulate them all
You optimise one parameter at a time, but then you don't know if your final result is the best possible result. E.g: You start with a neural network with 5 layers, and 4 neurons in each layer. You perform a sweep on the number of layers in the neural network, plotting the accuracy of the network vs number of layers. You find that the optimum number is 3 Layers. You then optimise the number of neurons in the first layer, and find that the optimum is 10 neurons, then you optimise the number in the second layer and find 7 is the best, and then the number in the third and find that 6 is the best. However, this might not be the best overall solution: For example the accuracy of the network might be better if you start by using 3 neurons in the first layer, which is not the optimum, and then optimise the number in the second and third layer, finding that you get a different number of neurons but a much better accuracy compared to the first method.
You optimise parameter 1 that governs property X, and then you optimise parameter 2 that governs property Y, but then this has changed property X, so you go back and optimise the parameter 1, but then this changes property Y.

This seems like a very fundamental problem when designing any system in any area of engineering, and I thought that there may be a standard method of approaching this problem, or at least a few known methods that do a pretty good job. If there is a solution to this dilemma, can somebody please tell me?

Additional Details

The neural network example is the problem I am currently having, but I'll give another example of this problem I have had in the past. I was trying to design a patch antenna that has a resonant frequency of 2GHz. The resonant frequency is mainly dependant on the width of the patch, and the gain of the antenna was mainly dependant on the insertion depth. In CST, I performed a sweep on the width and picked the value that gave the lowest s parameter at 2GHz, I then performed a sweep on the insertion depth and picked the value that reduces the S parameter to the lowest value (I decided that -40dB was acceptable). But this then changed the frequency at which this gain happens, so I did a sweep on the width again, and I picked the value that gave the lowest s parameter, but now the s parameter is too high again, so I optimise the insertion depth... and so on.

AlephZero · Mar 14, 2014

I don't know much about neural networks, but I would guess selecting a good topology for a network is a practical problem that would be covered in books on the subject.

For "large scale" optimization problems, the objective is usually to get a solution that is "good enough" rather than try to fund the global optimum solution. There can be many "local" optimum solutions with not much to choose between them.

If you have n variables to optimize, you can consider each set of n values as a geometrical point in n-dimensional space. Visualizing how algorithms work is easy when n = 2 (e.g. draw something that looks like a contour map, and the optimum is the highest or lowest point on the map). When n > 3, drawing pictures is hard, but the math works the same way for any value of n.

In general "optimizing one variable at a time" is very inefficient for the reason you discovered: the variables interact with each other. When n = 2, this is like trying to get to the top of a mountain by only moving north/south or east/west. If the mountain is a long "ridge" running from northeast to southwest, for example, that is obviously a bad plan.

Better methods attempt to find "search directions" which are linear combinations of the variables, such that the interaction between the directions is small (and ideally zero).

There are several well-known algorithms for this. My personal favorite is BFGS (named from the four people who invented it). Another popular one is DFP (The "F" is the same guy in each).

A different approach is to try to find a region that contains the optimum, and then subdivide it into smaller regions. One version of this the Nelder-Mead algorithm.

Yet another way is simply to try points "at random", and keep track of the best solutions. Then try new points "close" to the best solutions you have found so far. One version of this is called "simulated annealing".

If you can define the function you want to optimize mathematically, all these algorithms are in systems like Matlab. If you need another software package like CST to find how "good" a particular design is, you can usually automate the process by making the optimization algorithm create the input file to run CST, run the model, and then extract the relevant data from the CST output file.

Finding some course notes or a textbook on optimization is probably a better way to learn more than googling for the individual methods.

CraigH · Mar 14, 2014

Thank you for your answer! After a bit of googling and asking around I have read about the algorithms you mentioned a few times. I believe they are called heuristic algorithms? I was actually told about these at the start of my project, but I assumed that I was supposed to use these as an alternative method to train the neural network. I have implemented the Genetic Algorithm to train the neural network, and I'm now trying to optimise the network topology so that it trains better. It seems strange using the Genetic Algorithm to optimise something that will be using the Genetic Algorithm to optimise something else, but I suppose it makes sense.
As for books on neural network topology most resources I have seen suggest to only use one layer, as it has been proved that a single layer neural network is a universal approximator. This makes it easier to optimise the number of neurons as you now only have one parameter to optimize. However I had a meeting with a PHD student a few days ago who specialises in neural networks and he told me to stick with 3 hidden layers. he says he guarantees 3 hidden layers is the best for the particular problem I'm working on. (which reminds me, I need to email him and ask why he said that...)

AlephZero · Mar 14, 2014

CraigH said:

(which reminds me, I need to email him and ask why he said that...)

One of my early (and cynical) mentors in industry explained it like this: When you start work in a new field, you don't know anything. So you ask advice from three people and you probably get three different answers, and you don't know which is right.

But after a while, you realize that most people have personal prejudices about the "best" way to do things.

Eventually, you get some prejudices of your own, and then it is obvious that people who share your prejudices are right and the others are wrong

CraigH · Mar 14, 2014

*Other books do mention optimising the topology for networks with more than one layer, but I have yet to find one which gives an exact procedure to follow that will find the optimum topology. They just mention rules of thumb, and tips, (such as use more neurons in the first few layers to create more features for subsequent layers to work with) Thats what this question was all about, methods to find the best possible solution to a problem that can't be solved by optimising one variable.

I also just wanted to know that I wasn't alone in having this problem. In class we are given projects such as "design a narrow band high gain 2GHz patch antenna", and we are taught all the science relating to the project, but never taught how to exactly go about designing it. It just frustrated me when I had to submit a design that might not have been the optimal design.

How to optimise something when multiple parameters change the output

1. How do I identify the most important parameters to optimize?

2. How do I determine the optimal values for each parameter?

3. What is the trade-off between optimizing multiple parameters?

4. How do I handle conflicting objectives when optimizing multiple parameters?

5. How do I validate the optimized values for each parameter?

Similar threads

Hot Threads

Recent Insights