Hi everyone. I hope this is in the proper place - not sure where to put it. Anyway, I am working on an optimization problem in the research I am doing and my partner and I have found that in order to quickly converge on a solution using a specific PSO (the firefly algorithm - it's awesome, look it up), we need a good distribution of initial fireflies that are "evenly" spaced. To that end, we found that it would be useful to use a low-discrepancy sequence instead of a pseudorandom number generator to prevent clumping. Typically, this is not a hard problem. However, the optimization problem we are considering has 29 dimensions, and we are looking to use as few as 60 initial fireflies, which presents a problem for most low-discrepancy sequences, like the Halton sequence and the Faure sequence. One technique, however, seems to be good: the Poisson Disk Sampling method. The first step in the Poisson Disk Sampling method is to generate a grid so that each grid cell contains at most one point in your space. This is easy enough when you're simply working with numbers, but that is not the case for us: rather, we are working with physical quantities with different dimensions. For example - one of our axes ranges from 0-20 Tesla while another axis ranges from 0.02m to 1m. According to the documents we have perused, the size of these grid cells should be [itex]\frac{r}{\sqrt{n}},[/itex] where r is the minimum distance between desired distance between two points and n is the number of dimensions. However, r is not so easily defined when you are working with axes that have different units! For example, one cannot calculate the distance between the two points [itex]1.0 \hat{T} + 0.2 \hat{m}[/itex] and [itex]1.5 \hat{T} - 0.4 \hat{m},[/itex] because to do so would require adding two physical quantities with different dimensions, which breaks dimensional consistency. So how would one even consider defining this grid? I've thought about it a lot, but I cannot think of an adequate way to define the grid used in the Poisson Disk Sampling method. The method that I have already implemented doesn't use a grid, which means it can still be consistent, but it's significantly slower because instead of checking whether or not a grid has multiple points in it, it needs to calculate values for each other point, meaning it decreases significantly as the number of dimensions increases. Thanks for reading!
Its 30 years since I did non-linear optimization and I've no idea whether this new fangled firefly algorithm is any good or not, but if you want to start with a uniform grid it sounds like normalizing your data might be a good idea. Why do you think physical dimensions are important, is your optimization function dimensionally consistent?
Yes, the optimization function better be dimensionally consistent, because it's not even an equation! It's just a black-box model using Monte Carlo methods. But that's a different story. Well, the firefly algorithm seems to be decent and relatively quick. The reason why I think physical dimensions are important is because I don't want to bias any one axis. For example: say I have a search space that varies from 0-10 meters in one direction and 0-1 Tesla in the other direction. Then clearly I can't have one grid cell be 1 Tesla x 1 meter - that would be biased. So should I have it be 0.1 Tesla x 1 meter? What if I know that the search space in one axis is small? That I would only want to search a few points in that axis and that would be enough? How do I know when the grid cells are "squares"? That's what this is about - the Poisson Disk Sampling method calls for square (or cubic, or hypercubic) cells, meaning I have to be able to compare values with different dimensions. Another question - the number of test points that we are trying to test is between 50-60. We have 29 dimensions that we need to vary. Clearly, we can't make an even search space, i.e. there is no number between 50 and 60 that is the 29th power of some integer. That is, in a 3D space, the uniform grid that you would make with x^3 test points would have x test values along each axis. So what do we do when instead of a 3, we have a 29?
You are using the word dimension in three different ways: "to do so would require adding two physical quantities with different dimensions, which breaks dimensional consistency" - here the use is correct and understood: you are referring to the physical dimensions of length, mass, time and charge. But it does not matter what units (metres, feet, parsecs...), or what the dimensions (length, time, charge...) are of those units. We may talk about the "area" or "volume" of cells of a grid, but there is no significance in the fact that we may multiply length^{4} x mass^{2} x charge x time^{-3} to calculate this "volume" - it's just a model. "The reason why I think physical dimensions are important is because I don't want to bias any one axis ... So should I have it be 0.1 Tesla x 1 meter?" - here you are talking about units (Tesla, metre), and the domain of the problem. Depending on the optimisation method used, it may be useful to normalize the domain - this means to scale the variables so that the minimum and maximum possibile values lie at the ends of the range [0, 1]. "We have 29 dimensions that we need to vary" - here you are talking about the number of variables of the problem. But before I go on, can you explain further exactly what it is you are trying to optimize?
Right - sorry. The word dimension can be ambiguous because it is used differently in different contexts. With regard to the first quote - multiplying dimensions is fine. It's the adding of units which makes it tricky, right? Multiplying diemsnions is fine, as you said. But adding dimensions isn't defined. With regard to the second quote - I think you're right. It probably would be useful to normalize the domain. The only problem is that the normalization constant is somewhat arbitrary. For example - most of the axes in the domain with the dimension length vary from 0.02m to 1.0m. However, there are two length axes which have different domains. Most of the other axes have dimension B-field/length, and the domain chosen for those axes is equally arbitrary: they vary from -20 to 20, which is simply the physical upper limit, so couldn't that be an issue? With regard to the final quote - you are correct. I was using dimension in the linear algebra sense of the word here, and for that I apologize. Yes, I meant the number of variables. Anyway, for an explanation. We are trying to optimize certain output parameters of a charged-particle beamline using non-linear optimization techniques (we've seem to have run out of other options). For this, we need to tune 13 magnet strengths (the axes with a magnetic field as part of their dimension) and sixteen lengths (the ones with length as the dimension). So far, we have looked into local optimization techniques, including the oft-used L-BFGS-B technique (which is conveniently implemented in Python already). Unfortunately, because we are modelling this beamline, we have to use Monte Carlo techniques to evaluate the beamline output parameters with various input parameters, meaning the objective function evaluations take a long time. So, we are trying to find algorithms which converge quickly or can be highly parallelized. The firefly algorithm seems to work for this, but we require an approximately evenly-spaced initial search space, and for that we require a grid, which has been a problem. I mentioned previously that the total number of test points should be between 50-60. That's because, ideally, we use no more than 60 processors, and we would like to evaluate the objective function at each test point simultaneously using one processor each. Sorry for the long-windedness. Did any of that make sense?
Yes, lots. First I am going to assume that you have exhausted all other possibilities e.g. splitting the problem down so you just use three magnets to optimise the initial focus of the beam, then another three to direct it etc. I'll also assume that the problem is not untractable by any deterministic method - if the optimization function is very chaotic e.g. a precise alignment of 13 fields will yield a value that is much greater than in the surrounding region it may not be possible to find a maximum except by luck. In terms of setting up the problem I think we agreed that the best way to go is to normalize the variables by setting the domain of each to be [0, 1]. It's probably a good idea to think about the normalization function too - so if the variable is the position of a magnet on a track you might like to use ## n(x) = a(x + d)^2 + b ## where d is the distance of the end of the track from the centre of the beam; this costs little and could help with normalization of effective cell size. Other than this, don't worry about the physical significance of the variables - the solution space is an entirely abstract model and the fact that a distance of 0.1 along one axis represents a difference in the square of length and along another axis the square root of field strength makes no difference at all (as long as you convert correctly before calculating your objective function of course!) Now the variables are normalized, ## \frac{r}{\sqrt n} ## should make more sense and you can find your Poisson grid. If you are worried about the low number of initial points ("fireflies") compared to dimensions but you are constrained to 60 parallel processes I'd suggest that once you have results for n=60 you try n=120 and look for any differences. Can you reduce the cost of the objective function? Can you use a weighted sample instead of Monte Carlo?
You are correct on both fronts - unfortunately, effectively unnoticeable changes in the beam at one point can cause dramatic changes in the beam downstream (especially at a few specific points), so it's hard to predict the desired beam shape at any given point. We simply have several parameters that we are trying to minimize. Normalizing the variables is fine. But what do you mean by the normalization function? Why would we square it instead of just performing a linear transformation on the domain to force it to [0,1]? I suppose if we normalize it it will all make sense - then we just have to keep track of how we convert back. But doesn't that add bias to the optimization (which we are trying to avoid) because the upper bounds on all of the distance axes are arbitrary (arbitrarily set at 1m)? And the same with the quadrupole magnets? That's what I'm worried about. Otherwise, normalizing is fine because it makes each axis dimensionless. That should work. We're currently testing that now with another quasi-random sequence generator, the Halton sequence, but that doesn't seem to be performing particularly well. I've been thinking of ways to do this. Unfortunately, the system is so complex (and dare I say, chaotic) that even a small change in the initial position in phase-space of the particle could dramatically change the results of whether or not it even makes it through the beamline, let alone where it comes out, etc. The software we are using for these simulations is G4Beamline - maybe you've heard of the Geant4 particle engine? It's based on that. Another thing I've been thinking about is that we know so little about this objective function except that, outside a few select areas, the beamline lets very little particles through (obviously a bad thing). In our scalarizing function, we added a piecewise parameter that sets the value to 100 if there are less than 100 particles that make it through (usually we get between 500 and 2500, so that's reasonable). This is to "punish" bad results so our results don't reduce to having one, central particle making it through and having a small standard deviation and very central profile (because there would only be one particle!). Unfortunately, if we evenly distributed test points across the domains of each axis, my guess is that 95% of them would fall in this area of passing less than 100 particles. This indicates that the potential solution space is very small... but we don't know where to look. This makes it hard, even for a metaheuristic algorithm to find local and global minima. At the moment, we have a strong initial starting point, which has been a reference point for past optimization algorithms and returns decent results. However, even if we dramatically change some of the parameters in this initial condition, we can sometimes get good result out that vary dramatically from the results received when using the original initial condition. This indicates that the solution space is "long" but "thin" in that there is a small strip of solution space that all yield good results but distinct results, and there may even be islands of good initial conditions that are far from the initial conditions we have been using, and as far as I know, I have no way of determining where these islands lie.
Hmmm, the problem looks intractable. What you are saying is that you have a black box with 29 knobs on; you have a few combinations of settings that work but you are trying to find some better ones by twiddling the knobs at random. Surely there must be a better way - you must have some understanding of what is going on inside the black box? If you really can't analytically reduce the problem, the only other way to cope with a chaotic objective function is to use a fine grid - it doesn't matter what algorithm you use, if the step size is large enough to miss a local maximum that might be a global maximum you will never find it. So this means calculating a lot of points, which means reducing the cost of the objective function. I don't think I can offer you any more. When I was a university student this was the sort of project final year undergraduate/MSc students worked on as a computational maths module - is this a possibility for you?
If we use simple models of the magnets, the problem becomes a lot easier. And we have done that - we have modeled the magnets and attempted to get good results that way, and the results were decent. Unfortunately, those results weren't as good as we had hoped. So, we wanted to get better results using a meta-heuristic. Unfortunately, because magnetic fields are really hard to work with, it is impossible to "solve" the beam analytically. Regarding the fine grid - we are discovering that as we go along. The more starting points we use, the better the results, unfortunately. I wish I could delegate it to an undergraduate student... but we are undergraduate students! I appreciate all of the help. You've given us a lot to think about.