Sum of All Positive Integers n where d+n/d is Prime to 100M

PeterDonis · Jun 17, 2019

Vanadium 50 said:

I wonder if it allocates memory in non-contiguous blocks

It could, although I'm not sure how that would affect the performance of simple nested for loops over slices of a list that's not being mutated. All of the variable accesses at the C level inside the interpreter should be just pointers.

Vanadium 50 said:

Python does garbage collection automatically, right?

Yes. But disabling automatic gc doesn't affect the execution time, so this doesn't seem to be the issue. (Mostly Python deallocates objects based on their reference count going to 0; the automatic gc is for cases that can't be handled that simply, like circular references. I don't think there are any of those in my code.)

PeterDonis · Jun 17, 2019

Vanadium 50 said:

perhaps a B-tree. The NlogN would come from it traversing the tree

If you mean the interpreter has to do this when allocating small blocks of memory for new objects, yes, this is one of my prime suspects at this point, since one thing the for loops can't avoid doing is allocating new objects on every iteration to do the tests and yield the combinations.

Vanadium 50 · Jun 17, 2019

I am actually thinking if it has very small memory blocks, it needs to figure out where everything is even when reading. But what I can't figure out is why anyone would do this on a modern CPU.

PeterDonis · Jun 17, 2019

Vanadium 50 said:

I am actually thinking if it has very small memory blocks, it needs to figure out where everything is even when reading.

Python objects at the C level are just pointers to data structures. There are variable name lookups, which in the general case are hash table lookups, but for local variables inside functions those are optimized to array accesses.

Vanadium 50 · Jun 17, 2019

OK, I give up. But since the behavior is crazy, any explanation of it is bound to be a little crazy too.

Vanadium 50 · Jun 18, 2019

Some final thoughts:

Can I do better than one second? Maybe a little. As I said, memory access is the bottleneck. A bool is a byte, so I need 100 MB to store my prime array. With clever packing, I can get a factor of at least 16. But how much can I speed up? Well, the prime-finding is 800 ms, and about half of that is memory access, and maybe I can double the speed of that, so there's 200 ms to be gained. About 20%. There are slightly faster` sieves, so maybe I can get a 30% savings, but probably not much more.

Looking at it from the other direction, there's probably 150 ms of memory access that's unavoidable, and similarly 100 ms of arithmetic that's unavoidable, so I am within a factor of maybe 4 of what the hardware can do. That's pretty good.

To do much faster, one needs a different algorithm. An exhaustive test, even one that works with factors, needs to test almost every number in the range for primality. (61% of the odd numbers) That suggests I need to sieve. And sieving already dominates my time. What I am thinking of is a) if there is one solution, can I use it to find another solution? b) is there some way of generating solutions from "seeds" - other numbers from which a solution can be derived? c) if I exclude a candidate solution, is there some way to infer that another candidate can be excluded? Otherwise I would be nibbling around the edges.

PeterDonis · Jun 18, 2019

Vanadium 50 said:

An exhaustive test, even one that works with factors, needs to test almost every number in the range for primality. (61% of the odd numbers)

I'm not sure what you mean by this. The solution I'm using that does factorization (to compare with the combinatorial one) iterates through all numbers equal to 2 modulo 4 in the range (starting with 6); that's 25% of the numbers in the range. It only tries to factor the ones for which n + 1 and n/2 + 2 are prime, doing a lookup into the pre-computed primes, (and the factoring algorithm, besides using the pre-computed list of primes, exits immediately if it finds a repeated prime factor, so it only completely factors the squarefree numbers).

Vanadium 50 · Jun 18, 2019

What I mean is that so many numbers need to be tested by primality, there is likely little to be gained by stopping the sieve early and using primality testing (e.g. Adelman, Miller-Rabin) on any larger numbers.

scottdave · Sep 5, 2019

If I recall correctly, they say most Project Euler programs should be able to run in less than 2 minutes. This is where some thinking is necessary to find the 'better way'.

I'm just now reading through this and haven't read through all the suggestions yet, so this may have been suggested: could you build a set of numbers which have failed the test? Check to see if the next number to check is a multiple of one of those numbers.

Vanadium 50 · Sep 22, 2019

One last thought. Most Project Euler questions do not lend themselves well to parallelization, partly by design. This one does.

Improvement is 20%. The reason is, as discussed earlier, memory bandwidth is a limiting factor. More processes get us closer to that wall.

Vanadium 50 · Oct 11, 2019

Update update: switched from an i5-6500 (4C/4T) to a 3900X (12C/24T). So with 4-10x the computing power, the incremental improvement is...<drum roll>... 40%.

Sum of All Positive Integers n where d+n/d is Prime to 100M

Similar threads

Hot Threads

Recent Insights