Undergrad Finding Multiple Local Minima of High-Dimensional Functions

Click For Summary
The discussion focuses on the challenge of finding multiple local minima in high-dimensional functions, particularly in the context of training recurrent neural networks (RNNs) for tasks like working memory. Participants explore the idea of modifying the cost function to penalize solutions close to previously found minima, while acknowledging potential issues such as symmetry in RNNs and the risk of losing other optima. Quantum annealing and simulated annealing are suggested as potential methods for exploring local minima, with a distinction made between finding all local minima versus those close to the global minimum. The conversation highlights the complexity of non-convex optimization and the ambition to systematically catalog diverse solutions generated by RNNs. Ultimately, the pursuit of identifying all local optima remains a challenging yet intriguing mathematical endeavor.
madness
Messages
813
Reaction score
69
TL;DR
Are there methods for exhaustively finding local optima? Could one do so iteratively by adding a cost function which avoids previously explored optima?
Hi all,

Let me give some background to my question. In computational neuroscience it's now fashionable to train recurrent neural networks (RNNs) to solve some task (working memory for example). We do so by defining some cost function and training the model weights to minimize it (using gradient descent). However one usually finds that there are multiple unique solutions which solve the task well but in conceptually distinct ways. I would like to find a way to iteratively search through and catalogue all the solutions that an RNN can come up with for a task. (Notice that this problem is not unique to RNNs - it could arise in many non-convex optimization problems).

The first idea that comes to mind is to add a term to the cost function which penalizes solutions that are close to a previous solution. One could then repeat this iteratively to find further solutions. Does anyone know if methods based on this principle exist?

Immediate problems that spring to mind are: 1) RNNs have so much symmetry that the network could find the same solution, for example with permuted neuron indices. This could be address by choosing an appropriate pentalty. 2) Adding these penalties to the cost function could destroy or change the other local optima we are looking for.

Thanks for your help.
 
Physics news on Phys.org
Such a cost function could easily hide other local minima near the one already found. I am afraid that there is no truly reliable way to exhaustively find all the local minima of arbitrary functions. The only way I know of to get a variety of minima is to use different initial points in several runs with one of the standard optimization algorithms.
 
This may be a futuristic idea, but one thing you might look into is the quantum annealing process. It is my understanding that the solution of that process will be one with close to the minimal energy level. There can be different solutions if several local minimums have close to the minimal energy. Several runs might give you several alternative solutions with an objective value close to the minimum. I have no personal experience with this. There are quantum computers made by D-Wave that use the quantum annealing approach and there are quantum annealing algorithms that run on conventional computers.
 
FactChecker said:
Such a cost function could easily hide other local minima near the one already found. I am afraid that there is no truly reliable way to exhaustively find all the local minima of arbitrary functions.

Thanks for your answer FactChecker. I had suspected this. I wonder if a similar approach might nevertheless still work reasonably well in practice? For example, what if we used the new penalized cost function to generate the new initial condition. We could then search on the original unpenalized cost function started from that point. Or do principled methods already exist for choosing the set of initial conditions?

Your point about annealing seems interesting. Certainly simulated annealing (https://en.wikipedia.org/wiki/Simulated_annealing) looks like it could be useful here.
 
madness said:
Thanks for your answer FactChecker. I had suspected this. I wonder if a similar approach might nevertheless still work reasonably well in practice?
I should say that I have no practical experience with this, so I will leave it for others to advise you further.
Your point about annealing seems interesting. Certainly simulated annealing (https://en.wikipedia.org/wiki/Simulated_annealing) looks like it could be useful here.
One thing that made me think of this is that I got the impression that you are only interested in local minima whose objective function value is close to the true minimal value. I believe that the annealing process would tend to have that property. You should probably clarify if you are interested in all local minima or only in those with a value close to the true minimum.
 
madness said:
Are there methods for exhaustively finding local optima?
Is this the same as finding a global optimum? If so then yes, but the problem is in general hard. If not, I am not sure what you mean.
 
FactChecker said:
I should say that I have no practical experience with this, so I will leave it for others to advise you further.One thing that made me think of this is that I got the impression that you are only interested in local minima whose objective function value is close to the true minimal value. I believe that the annealing process would tend to have that property. You should probably clarify if you are interested in all local minima or only in those with a value close to the true minimum.

It's a little tricky to define exactly what I mean by unique solutions, as I'm really looking for solutions that appear to implement different strategies in solving the task. Perhaps the notion is to find a set of solutions which each solve the problem sufficiently well (cost function is sufficiently low at each solution) but do so in sufficiently different ways (so minima are somehow well-separated in parameter space). Another desiderata might be that the minimum is sufficiently wide, so that the solution isn't "brittle", i.e. very sensitive on parameter values. However, exploring those brittle solutions might also be interesting.
 
pbuk said:
Is this the same as finding a global optimum? If so then yes, but the problem is in general hard. If not, I am not sure what you mean.

Only if the loss function is convex. Hopefully my above reply to FactChecker makes that more clear.
 
madness said:
Only if the loss function is convex. Hopefully my above reply to FactChecker makes that more clear.
Not really, if the loss function is convex then there is only one local minimum which is also the global minimum and this can easily be found: I don't think this is what you mean either.

If you have found all the local minima then by definition the global minimum is the least of these.

Anyway, let's not get caught up on terminology, the link I gave outlines 18 methods for global optimization and links to books and papers with hundreds more: good luck!
 
  • #10
pbuk said:
Not really, if the loss function is convex then there is only one local minimum which is also the global minimum and this can easily be found: I don't think this is what you mean either.

If you have found all the local minima then by definition the global minimum is the least of these.

Anyway, let's not get caught up on terminology, the link I gave outlines 18 methods for global optimization and links to books and papers with hundreds more: good luck!
Finding all the local minima is exactly what I'm trying to do! Most non-convex optimisation approaches seem only interested in finding a global minimum rather than finding multiple local minima.

Thanks for the link. There's certainly plenty I have to learn in the field of non-convex optimisation.
 
  • #11
madness said:
I would like to find a way to iteratively search through and catalogue all the solutions that an RNN can come up with for a task. (Notice that this problem is not unique to RNNs - it could arise in many non-convex optimization problems).

Technically, "all the solutions that an RNN can come up with" might be a different set of solutions that "all possible solutions", so the RNN problem may have unique aspects.

The ambition to have a method of finding all possible local optima to an arbitrary cost function is a mathematical daydream, but you can find much literature on finding local optima to particular types of cost functions. Some daydreams are worth pursuing. If your interest is exploring well known territory, you can probably spend years doing this.

In the spirit of a mathematical daydream, I'll offer this daydream. Since you take a liberal view of cost functions and are willing to consider modifying them, why not also take a liberal attitude toward the data! Imagine beginning with cost function ##F## and made-up data ##D_0## on which ##F## has 5 local optima. Apply transformations ##T_1, T_2,...T_n## to ##D_0## that "move it toward" the actual data ##D## and move the local optima to new locations without adding any new local optima. The transformations ##T_i## can be viewed as operations performed by stages of a neural net. So we hopefully end up with a computation that identifies 5 local optima of the cost function when it is applied to something close to the original data.

I don't know how you intend to interpret a local optima as a different strategies for doing a task. Perhaps if you begin with "toy" data and identify a particular strategy with each local optima, you can identify those same strategies with transformed local optima on the transformed data.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
4K
  • · Replies 13 ·
Replies
13
Views
3K
  • · Replies 13 ·
Replies
13
Views
3K
  • · Replies 1 ·
Replies
1
Views
12K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 5 ·
Replies
5
Views
4K