How? Do you have examples of what is still being found multiple times?
Thanks your your help!
The bug was that one test was performed twice and another zero times. It's looking like it takes ~20 hours to find all the squares, which is much better than the 458h it took at the beginning of this.
What's impressive is that once the diagonals and the center row and column are placed, 5% of the time we find a magic square. Technically, 0.6% of the time we find 8. But if you just place those 17 numbers at random the probability of finding a magic square is about 4 x 10-7. So we're doing a million times better.
I find @.Scott 's numbers very intriguing. About half of the X-patterns have no solution. That's potentially a factor of 2 right there, and the rule may be applicable elsewhere.
I had considered parallelizing this, but it's not trivial. Part of the reason is that the code was not designed that way; there's a single Square object, and part of the problem is that parallelizing by center element leaves you with a maximum of 13 workers and a very unbalanced system.
It takes 75774 seconds to do an exhaustive search.
I looked at the main diagonals mod 5. There are three patterns, which I call a straight (e.g. 20134), two pair (e.g. 33144), and three of a kind (e.g. 31114). All three are present, so if there is a way to identify non-solutions mod 5, it will involve both diagonals.
I plan to parallelize with pThreads. Load balancing will come later, but I will probably do it at the main diagonal level. Each thread will look at every Nth element in that list of rows. The machine this will run on has 12 cores and 24 threads. so I suspect N will be 25. Fully load balanced, I imagine it will take between 1 and 2 hours to run.
3+8+13+18+23 = 65 The other fives of a kind don't add up to the right number. Further, there's only one such row (and its permutations) for every center square.
So, I managed to rework the parallelization and got it down to under an hour. Barely. I can find all 275M squares in 59 minutes and 23 seconds. This uses 32 threads on a 12C/24T processor. It's 22.9x faster than one processor.
About ten minutes towards the end, threads start terminating. So there is some room to improve the load balancing: perhaps 5%. This is mitigated somewhat by the fact that as threads terminate, the chip speeds up.
Another potential source for a 5% gain is squares with 13 as their center. If I find a square with 12 as its center, I have also found one with a 14 as its center, under the {26-x} transformation. If I do with with a square with center 13, I find a different square with center 13. So effectively I find each pair twice. If I could find a way to quickly reject the "second" member of the pair, it would double the speed of the 13's, saving maybe 5% overall in time.
That same symmetry has interesting implications for early detection of patterns with no solution. There's been speculation that looking at subsets of cells (one or two diagonals) mod 5 one can conclude there are no solutions. Because of the {26-x} symmetry, should such a pattern be found, a second pattern could be found swapping 0 and 1, and 2 and 4. I think that's very interesting as it is not the "normal" 1 and 4/2 and 3.
I'm not sure if this is a breakthrough or proves it impossible.
It's getting difficult to disentangle the different cuts. What do we have so far:
* The upper left corner is the largest element on the diagonals (excluding the center, but if the center is 13 it's trivially true for that as well). This covers rotation and the corner-connector symmetry
* The upper right corner is larger than the lower left corner. This covers mirrored cases.
* The upper left connector is larger than the lower right connector. This covers the 2/4 exchange symmetry.
If we apply the 1<->25 swap to a solution then we have to rearrange everything to fit that pattern again. We would need an inequality that's guaranteed to swap.
Here is an approach that separates the symmetries better:
* The sum of the corners needs to be larger than the sum of the connectors (can they be equal? That would need to be treated separately). This covers only the corner-connector symmetry.
* The upper left corner is the largest corner
* The upper right corner is larger than the lower left corner.
* The upper left connector is larger than the lower right connector.
If we apply the 1-25 swap then the first condition gets reversed, so clearly we need to do the corner/connector swap. In general we might have to rotate and mirror the magic square, and it's possible that we need to use the 2/4 exchange.
I didn't find a nice way to throw away 1/2 of the magic squares, but I found a classification: Consider the center row/column. If the sum of the outer numbers is larger than the sum of the inner numbers, then this is still true after the 1-25 swap and applying the corner/connector symmetry. This splits our sample of magic squares (with 13 in the center) into two groups. Maybe one of the groups is empty, or has some nice patterns?
Fourth, place center vertical column.
Fifth, place center horizontal row requiring
3M+2(A+E+U+Y)+(C+W)+(K+O)=195 (Parity + NineMinusFour rolled into one)
A range check on the 8 remaining numbers, which gives me a 1% speedup. For example, if 65-A-C-D = 4 and the smallest of the eight remaining numbers is 4, I know there's no solution.
Finally, with the 8 remaining numbers, pick 2 for B and F and complete. Test for validity and uniqueness. Print base square and the 7 (3 if M=13) that derive from it.
You are right that I could divide my M=13 set into two based on C,K,O and W (or H, L, N and R). The problem is that this information arrives late in the process. By the time I get to the 6th step, I have only 56 tests remaining.
I'm doing some detailed profiling now, but most time is spent in Steps 4 and 5.
I'd also need to do a bit of work to place a cut in Step 4. I started off using a stl::list of rows, filtering at each step. I discovered half the time was spent in the row constructor. So I replaced the Step 5 stl::list with an stl::vector. This helped (although I am spending a frightful amount of time clearing vector elements) but for some reason cuts at Step 4 cause a double free somewhere. So a test would be slow - I'd need to clean up that mess first.
The way I load-balance is to have thread X look at rows in Step 1 with number mod N (number of threads) equal to X. As you see, it works pretty well, but not perfectly. Some rows take longer to go through than others, and I am counting on them averaging out. Putting more intelligence in would help: e.g. if I know that for M=1 thread 6 takes the least time and 9 the most, and for M=2, 11 takes the least and 4 the most, I would reassign the work so that the thread that handled 6 (9) for M=1 gets 4 (11) for M=2.
Steps 1-3 take < 1% of the time.
Step 4 - 14%
Step 5 - 64%
Step 6 - 22%
I thought Step 6 is smaller than it was because the cost of determining you don't have to do a calculation is comparable to the cost of doing the calculation.
With -Ofast, gprof tells me 92% of the time is spent finding squares and 3% checking that candidate magic squares are in fact magic. Another 5% is spent checking rows to see if they can be placed (steps 2,3 and 5).
Without-Ofast, 28% of the time is spent handling C++ overloading. I can get cells with with x,y coordinates or a single 5*x+y coordinate. 6.5% is spent seeing if numbers are used, and everything else is a few percent her and a few percent there.
So I was motivated by the work of @.Scott . I took all the solutions mod 5 and plotted the patterns looking only at the 3's on the diagonals. There are 511 such patterns. An astonishing 179 have no solutions.
So I was motivated by the work of @.Scott . I took all the solutions mod 5 and plotted the patterns looking only at the 3's on the diagonals. There are 511 such patterns. An astonishing 179 have no solutions.
That's an interesting result - and perhaps it has direct application to the original post.
I've kind of dropped the ball on this. "Real" work has gotten in the way.
It has some application, although in the clear light of day it's less exciting.
The 512th pattern is all 3's. There are nine of them, and there are only eight numbers congruent to 3, so that never makes it into the sample.
Many of the 179 have a diagonal with exactly five 3's. This is helpful, but there are not many rows that are all 3's: there is 3, 8,13, 18 and 23, plus their permutations. Also, many of the 179 have a diagonal with exactly four 3's. This is even less useful, because there are no such rows: if you have four 3's and a sum to 65, what must the fifth one be?
Where it gets more interesting is the remaining set. It appears (checking now) that there needs to be 5 or fewer 3's on the two diagonals. That will take a bigger bite out of the number of possibilities. I would want to prove it before using it. Otherwise it feels like cheating.
For kicks, I am running a similar check on the placement of zeros mod 5. (Which is related to the placement of 1's)
The X-patterns that lead to no solutions have 6 or more elements with the same remainder mod 5. There are a very very small number (like 2) of patterns with 4 or 5 common remainders, but the bulk is 6+. How many does this eliminate that aren't eliminated some other way? Zero.