Finding Multiple Local Minima of High-Dimensional Functions

In summary: I should say that I have no practical experience with this, so I will leave it for others to advise you further.
  • #1
madness
815
70
TL;DR Summary
Are there methods for exhaustively finding local optima? Could one do so iteratively by adding a cost function which avoids previously explored optima?
Hi all,

Let me give some background to my question. In computational neuroscience it's now fashionable to train recurrent neural networks (RNNs) to solve some task (working memory for example). We do so by defining some cost function and training the model weights to minimize it (using gradient descent). However one usually finds that there are multiple unique solutions which solve the task well but in conceptually distinct ways. I would like to find a way to iteratively search through and catalogue all the solutions that an RNN can come up with for a task. (Notice that this problem is not unique to RNNs - it could arise in many non-convex optimization problems).

The first idea that comes to mind is to add a term to the cost function which penalizes solutions that are close to a previous solution. One could then repeat this iteratively to find further solutions. Does anyone know if methods based on this principle exist?

Immediate problems that spring to mind are: 1) RNNs have so much symmetry that the network could find the same solution, for example with permuted neuron indices. This could be address by choosing an appropriate pentalty. 2) Adding these penalties to the cost function could destroy or change the other local optima we are looking for.

Thanks for your help.
 
Physics news on Phys.org
  • #2
Such a cost function could easily hide other local minima near the one already found. I am afraid that there is no truly reliable way to exhaustively find all the local minima of arbitrary functions. The only way I know of to get a variety of minima is to use different initial points in several runs with one of the standard optimization algorithms.
 
  • #3
This may be a futuristic idea, but one thing you might look into is the quantum annealing process. It is my understanding that the solution of that process will be one with close to the minimal energy level. There can be different solutions if several local minimums have close to the minimal energy. Several runs might give you several alternative solutions with an objective value close to the minimum. I have no personal experience with this. There are quantum computers made by D-Wave that use the quantum annealing approach and there are quantum annealing algorithms that run on conventional computers.
 
  • #4
FactChecker said:
Such a cost function could easily hide other local minima near the one already found. I am afraid that there is no truly reliable way to exhaustively find all the local minima of arbitrary functions.

Thanks for your answer FactChecker. I had suspected this. I wonder if a similar approach might nevertheless still work reasonably well in practice? For example, what if we used the new penalized cost function to generate the new initial condition. We could then search on the original unpenalized cost function started from that point. Or do principled methods already exist for choosing the set of initial conditions?

Your point about annealing seems interesting. Certainly simulated annealing (https://en.wikipedia.org/wiki/Simulated_annealing) looks like it could be useful here.
 
  • #5
madness said:
Thanks for your answer FactChecker. I had suspected this. I wonder if a similar approach might nevertheless still work reasonably well in practice?
I should say that I have no practical experience with this, so I will leave it for others to advise you further.
Your point about annealing seems interesting. Certainly simulated annealing (https://en.wikipedia.org/wiki/Simulated_annealing) looks like it could be useful here.
One thing that made me think of this is that I got the impression that you are only interested in local minima whose objective function value is close to the true minimal value. I believe that the annealing process would tend to have that property. You should probably clarify if you are interested in all local minima or only in those with a value close to the true minimum.
 
  • #6
madness said:
Are there methods for exhaustively finding local optima?
Is this the same as finding a global optimum? If so then yes, but the problem is in general hard. If not, I am not sure what you mean.
 
  • #7
FactChecker said:
I should say that I have no practical experience with this, so I will leave it for others to advise you further.One thing that made me think of this is that I got the impression that you are only interested in local minima whose objective function value is close to the true minimal value. I believe that the annealing process would tend to have that property. You should probably clarify if you are interested in all local minima or only in those with a value close to the true minimum.

It's a little tricky to define exactly what I mean by unique solutions, as I'm really looking for solutions that appear to implement different strategies in solving the task. Perhaps the notion is to find a set of solutions which each solve the problem sufficiently well (cost function is sufficiently low at each solution) but do so in sufficiently different ways (so minima are somehow well-separated in parameter space). Another desiderata might be that the minimum is sufficiently wide, so that the solution isn't "brittle", i.e. very sensitive on parameter values. However, exploring those brittle solutions might also be interesting.
 
  • #8
pbuk said:
Is this the same as finding a global optimum? If so then yes, but the problem is in general hard. If not, I am not sure what you mean.

Only if the loss function is convex. Hopefully my above reply to FactChecker makes that more clear.
 
  • #9
madness said:
Only if the loss function is convex. Hopefully my above reply to FactChecker makes that more clear.
Not really, if the loss function is convex then there is only one local minimum which is also the global minimum and this can easily be found: I don't think this is what you mean either.

If you have found all the local minima then by definition the global minimum is the least of these.

Anyway, let's not get caught up on terminology, the link I gave outlines 18 methods for global optimization and links to books and papers with hundreds more: good luck!
 
  • #10
pbuk said:
Not really, if the loss function is convex then there is only one local minimum which is also the global minimum and this can easily be found: I don't think this is what you mean either.

If you have found all the local minima then by definition the global minimum is the least of these.

Anyway, let's not get caught up on terminology, the link I gave outlines 18 methods for global optimization and links to books and papers with hundreds more: good luck!
Finding all the local minima is exactly what I'm trying to do! Most non-convex optimisation approaches seem only interested in finding a global minimum rather than finding multiple local minima.

Thanks for the link. There's certainly plenty I have to learn in the field of non-convex optimisation.
 
  • #11
madness said:
I would like to find a way to iteratively search through and catalogue all the solutions that an RNN can come up with for a task. (Notice that this problem is not unique to RNNs - it could arise in many non-convex optimization problems).

Technically, "all the solutions that an RNN can come up with" might be a different set of solutions that "all possible solutions", so the RNN problem may have unique aspects.

The ambition to have a method of finding all possible local optima to an arbitrary cost function is a mathematical daydream, but you can find much literature on finding local optima to particular types of cost functions. Some daydreams are worth pursuing. If your interest is exploring well known territory, you can probably spend years doing this.

In the spirit of a mathematical daydream, I'll offer this daydream. Since you take a liberal view of cost functions and are willing to consider modifying them, why not also take a liberal attitude toward the data! Imagine beginning with cost function ##F## and made-up data ##D_0## on which ##F## has 5 local optima. Apply transformations ##T_1, T_2,...T_n## to ##D_0## that "move it toward" the actual data ##D## and move the local optima to new locations without adding any new local optima. The transformations ##T_i## can be viewed as operations performed by stages of a neural net. So we hopefully end up with a computation that identifies 5 local optima of the cost function when it is applied to something close to the original data.

I don't know how you intend to interpret a local optima as a different strategies for doing a task. Perhaps if you begin with "toy" data and identify a particular strategy with each local optima, you can identify those same strategies with transformed local optima on the transformed data.
 

1. What is the significance of finding multiple local minima in high-dimensional functions?

Finding multiple local minima in high-dimensional functions is important because it allows us to identify the different possible solutions or optimal points of a complex problem. This can help us understand the behavior of the function and make more informed decisions in various fields such as engineering, economics, and data analysis.

2. How do you determine the number of local minima in a high-dimensional function?

The number of local minima in a high-dimensional function can be determined by analyzing the function's gradient or Hessian matrix. A local minimum occurs when the gradient is equal to zero and the Hessian matrix is positive definite. By analyzing the eigenvalues of the Hessian matrix, we can determine the number of local minima in the function.

3. What are some methods for finding multiple local minima in high-dimensional functions?

There are several methods for finding multiple local minima in high-dimensional functions, including gradient descent, simulated annealing, genetic algorithms, and particle swarm optimization. These methods use different approaches to explore the function's landscape and identify multiple local minima.

4. Can finding multiple local minima in high-dimensional functions be a challenging task?

Yes, finding multiple local minima in high-dimensional functions can be a challenging task due to the complexity and dimensionality of the function. It can also be difficult to determine if a local minimum is the global minimum or just one of many local minima. Therefore, it is important to use appropriate methods and carefully analyze the results.

5. How can finding multiple local minima in high-dimensional functions benefit real-world applications?

Finding multiple local minima in high-dimensional functions can benefit real-world applications by providing a better understanding of the function's behavior and identifying multiple optimal solutions. This can lead to improved decision-making, cost reduction, and increased efficiency in various fields such as engineering, finance, and machine learning.

Similar threads

Replies
1
Views
996
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Calculus and Beyond Homework Help
Replies
14
Views
2K
  • Calculus
Replies
1
Views
3K
  • General Math
Replies
13
Views
1K
Replies
1
Views
2K
Replies
2
Views
2K
Replies
2
Views
1K
Back
Top