Finding Multiple Local Minima of High-Dimensional Functions

madness · Jan 21, 2020

Hi all,

Let me give some background to my question. In computational neuroscience it's now fashionable to train recurrent neural networks (RNNs) to solve some task (working memory for example). We do so by defining some cost function and training the model weights to minimize it (using gradient descent). However one usually finds that there are multiple unique solutions which solve the task well but in conceptually distinct ways. I would like to find a way to iteratively search through and catalogue all the solutions that an RNN can come up with for a task. (Notice that this problem is not unique to RNNs - it could arise in many non-convex optimization problems).

The first idea that comes to mind is to add a term to the cost function which penalizes solutions that are close to a previous solution. One could then repeat this iteratively to find further solutions. Does anyone know if methods based on this principle exist?

Immediate problems that spring to mind are: 1) RNNs have so much symmetry that the network could find the same solution, for example with permuted neuron indices. This could be address by choosing an appropriate pentalty. 2) Adding these penalties to the cost function could destroy or change the other local optima we are looking for.

Thanks for your help.

FactChecker · Jan 22, 2020

Such a cost function could easily hide other local minima near the one already found. I am afraid that there is no truly reliable way to exhaustively find all the local minima of arbitrary functions. The only way I know of to get a variety of minima is to use different initial points in several runs with one of the standard optimization algorithms.

FactChecker · Jan 22, 2020

This may be a futuristic idea, but one thing you might look into is the quantum annealing process. It is my understanding that the solution of that process will be one with close to the minimal energy level. There can be different solutions if several local minimums have close to the minimal energy. Several runs might give you several alternative solutions with an objective value close to the minimum. I have no personal experience with this. There are quantum computers made by D-Wave that use the quantum annealing approach and there are quantum annealing algorithms that run on conventional computers.

madness · Jan 23, 2020

FactChecker said:

Such a cost function could easily hide other local minima near the one already found. I am afraid that there is no truly reliable way to exhaustively find all the local minima of arbitrary functions.

Thanks for your answer FactChecker. I had suspected this. I wonder if a similar approach might nevertheless still work reasonably well in practice? For example, what if we used the new penalized cost function to generate the new initial condition. We could then search on the original unpenalized cost function started from that point. Or do principled methods already exist for choosing the set of initial conditions?

Your point about annealing seems interesting. Certainly simulated annealing (https://en.wikipedia.org/wiki/Simulated_annealing) looks like it could be useful here.

FactChecker · Jan 23, 2020

madness said:

Thanks for your answer FactChecker. I had suspected this. I wonder if a similar approach might nevertheless still work reasonably well in practice?

I should say that I have no practical experience with this, so I will leave it for others to advise you further.

Your point about annealing seems interesting. Certainly simulated annealing (https://en.wikipedia.org/wiki/Simulated_annealing) looks like it could be useful here.

One thing that made me think of this is that I got the impression that you are only interested in local minima whose objective function value is close to the true minimal value. I believe that the annealing process would tend to have that property. You should probably clarify if you are interested in all local minima or only in those with a value close to the true minimum.

pbuk · Jan 23, 2020

madness said:

Are there methods for exhaustively finding local optima?

Is this the same as finding a global optimum? If so then yes, but the problem is in general hard. If not, I am not sure what you mean.

madness · Jan 23, 2020

FactChecker said:

I should say that I have no practical experience with this, so I will leave it for others to advise you further.One thing that made me think of this is that I got the impression that you are only interested in local minima whose objective function value is close to the true minimal value. I believe that the annealing process would tend to have that property. You should probably clarify if you are interested in all local minima or only in those with a value close to the true minimum.

It's a little tricky to define exactly what I mean by unique solutions, as I'm really looking for solutions that appear to implement different strategies in solving the task. Perhaps the notion is to find a set of solutions which each solve the problem sufficiently well (cost function is sufficiently low at each solution) but do so in sufficiently different ways (so minima are somehow well-separated in parameter space). Another desiderata might be that the minimum is sufficiently wide, so that the solution isn't "brittle", i.e. very sensitive on parameter values. However, exploring those brittle solutions might also be interesting.

madness · Jan 23, 2020

pbuk said:

Is this the same as finding a global optimum? If so then yes, but the problem is in general hard. If not, I am not sure what you mean.

Only if the loss function is convex. Hopefully my above reply to FactChecker makes that more clear.

pbuk · Jan 23, 2020

madness said:

Only if the loss function is convex. Hopefully my above reply to FactChecker makes that more clear.

Not really, if the loss function is convex then there is only one local minimum which is also the global minimum and this can easily be found: I don't think this is what you mean either.

If you have found all the local minima then by definition the global minimum is the least of these.

Anyway, let's not get caught up on terminology, the link I gave outlines 18 methods for global optimization and links to books and papers with hundreds more: good luck!

madness · Jan 23, 2020

pbuk said:

Not really, if the loss function is convex then there is only one local minimum which is also the global minimum and this can easily be found: I don't think this is what you mean either.

If you have found all the local minima then by definition the global minimum is the least of these.

Anyway, let's not get caught up on terminology, the link I gave outlines 18 methods for global optimization and links to books and papers with hundreds more: good luck!

Finding all the local minima is exactly what I'm trying to do! Most non-convex optimisation approaches seem only interested in finding a global minimum rather than finding multiple local minima.

Thanks for the link. There's certainly plenty I have to learn in the field of non-convex optimisation.

Stephen Tashi · Jan 28, 2020

madness said:

I would like to find a way to iteratively search through and catalogue all the solutions that an RNN can come up with for a task. (Notice that this problem is not unique to RNNs - it could arise in many non-convex optimization problems).

Technically, "all the solutions that an RNN can come up with" might be a different set of solutions that "all possible solutions", so the RNN problem may have unique aspects.

The ambition to have a method of finding all possible local optima to an arbitrary cost function is a mathematical daydream, but you can find much literature on finding local optima to particular types of cost functions. Some daydreams are worth pursuing. If your interest is exploring well known territory, you can probably spend years doing this.

In the spirit of a mathematical daydream, I'll offer this daydream. Since you take a liberal view of cost functions and are willing to consider modifying them, why not also take a liberal attitude toward the data! Imagine beginning with cost function ##F## and made-up data ##D_0## on which ##F## has 5 local optima. Apply transformations ##T_1, T_2,...T_n## to ##D_0## that "move it toward" the actual data ##D## and move the local optima to new locations without adding any new local optima. The transformations ##T_i## can be viewed as operations performed by stages of a neural net. So we hopefully end up with a computation that identifies 5 local optima of the cost function when it is applied to something close to the original data.

I don't know how you intend to interpret a local optima as a different strategies for doing a task. Perhaps if you begin with "toy" data and identify a particular strategy with each local optima, you can identify those same strategies with transformed local optima on the transformed data.

Finding Multiple Local Minima of High-Dimensional Functions

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Graduate Expected numbers of cards of a last color remaining

Undergrad How do E[X] and E[|X|] relate?

Undergrad The problem of points

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect