Sum of All Positive Integers n where d+n/d is Prime to 100M

  • Thread starter Thread starter Arman777
  • Start date Start date
  • Tags Tags
    Euler
AI Thread Summary
The discussion focuses on solving Project Euler Problem 357, which involves finding the sum of all positive integers n not exceeding 100 million such that for every divisor d of n, the expression d + n/d is prime. Participants share code snippets for generating divisors and testing for primality, noting that the current implementation is slow for larger numbers. Suggestions include using the Sieve of Eratosthenes to generate a set of primes for faster membership testing, as set lookups are significantly quicker than list lookups. Profiling the code is emphasized to identify bottlenecks and optimize performance. The conversation highlights the importance of efficient algorithms in computational problems.
  • #51
Vanadium 50 said:
My numbers are already factored.

When you say this, do you mean you are iterating over combinations of prime factors (from the pre-computed list of primes), or that you are using the pre-computed list of primes (actually a hash table in this case) to allow fast factorization?

I ask because I have tried both methods in Python and the latter method is running much, much faster, which seems counterintuitive to me, but combinatorial code in Python in general seems to run slow when the number of items in the underlying set gets this large. So it's possible that it's a Python quirk I'm dealing with and not that the combinatorial method is inherently slower.
 
Physics news on Phys.org
  • #52
PeterDonis said:
do you mean you are iterating over combinations of prime factors

This. I do this in several loops. One for the case with exactly 2 factors, one with exactly 3, etc.

I'd post the code, but I am converting from iterators to just plain loops now, so absolutely nothing works.
 
  • #53
Vanadium 50 said:
I do this in several loops. One for the case with exactly 2 factors, one with exactly 3, etc.

Ok, that's basically the same method I'm using for the combinatorial case. But are you doing some kind of pre-filtering to avoid having to generate all the combinations?
 
  • #54
Here's what I am doing for the n = 4 case, as an example:

C++:
for(int i = 0; (i < nprimes-2) && (primes[i] < cbrt(UPPERBOUND/2.0)); i++) {
  a = primes[i];
  for(int j = i+1; primes[j] < sqrt(0.5*UPPERBOUND/a); j++) {
  b = primes[j];
    for(int k = j+1; 2*a*b*primes[k] < UPPERBOUND; k++) {
      c = primes[k];
      if(is_prime[2*a*b*c+1] && is_prime[a*b*c+2] &&
         is_prime[a + 2*b*c] && is_prime[b + 2*a*c] && is_prime[c + 2*a*b] &&
         is_prime[2*a + b*c] && is_prime[2*b + a*c] && is_prime[2*c + a*b] ) {
          answer += 2*a*b*c;
       }
    }
  }
}

primes is just an array of prime numbers, starting with 3, and is_prime is an array of bools.

Here's the idea. We know 2 is a factor, so set up a quad of primes in increasing order {2,a,b,c}. Then test the quad all at once. There are 2n factors to test, but half of those tests are redundant, so we have here 8 tests to do.

I don't have any pretesting, but I do calculate the largest bounds. For example, what is the largest a can be? It's when a, b and c are consecutive and nearly equal. How big can b be? It's when b and c are nearly equal (and I account for 2 and a already being factors).

Overall, my strategy has been not to think of this as testing properties of numbers but to think of this as testing properties of sets of numbers.

Moving from iterators to plain old loops fixed the over-counting problem. Usually iterators simplify things, but here they were just making things more complicated. I get n = 2 has 30301 solutions, n = 3 has 9009, and n = 4 has 312. That covers 97% of the answer and takes about 200 ms.
 
  • Like
Likes Klystron
  • #55
Vanadium 50 said:
My code has a bug in it; it finds about 1% too many solutions (not too few!). Most likely, I am doing one test twice and another test not at all.

I don't know C++ . Maybe you are testing for the same cases such as (2,3,7,5) and (2,7,5,3) etc. Or they might be a problem in the range of the prime numbers
 
  • #56
Arman777 said:
Maybe you are testing for the same cases such as (2,3,7,5) and (2,7,5,3) etc. Or they might be a problem in the range of the prime numbers

Highly unlikely. The structure of the loops (look at the code) makes the former unlikely, and the latter would give too few solutions, not too many. While rewriting from scratch fixed the problem so a) this is moot and b) we will never know, the most likely possibility is a typo in the testing line.

In any event, I converted to plain for loops, and here is the output:

CPU time used to find primes: 810 ms 30301 two-factor candidates found 9009 three-factor candidates found 312 four-factor candidates found 3 five-factor candidates found CPU time used: 990 ms And the answer is <deleted>

So I am under - just - my one second goal.
 
Last edited by a moderator:
  • #57
There are no 6-7-8 factors ? because you said at max we should have 8 factors ?
 
  • #58
There could be solutions with higher n - but there aren't.

What's happening is that the number of candidate solutions is falling (e.g. for n = 8 there are only four possible values of a) and at the same time the conditions they need to pass is growing (exponentially).
 
  • Like
Likes Arman777
  • #59
P.S. I just found out that Carl Pomerance (yes, that Carl Pomerance) proved that you need to go to at least 2 x 109 to need n > 5. Since the problem only goes to 108, we don't need to even check 6, 7, and 8.
 
Last edited:
  • #60
Vanadium 50 said:
I don't have any pretesting, but I do calculate the largest bounds.

Yes, I see that and it's basically the same thing I'm doing with the combinatorial version in Python. But coding it with straight Python for loops, basically the equivalent of your C++ code, is still slower by about an order of magnitude than the factoring version (where I'm using the pre-computed dictionary of primes to speed up prime factorization, and also filtering out as many numbers as possible before factoring to minimize its impact).

This is very disappointing and confusing to me, as while Python is expected to be slow, it's not supposed to be that slow, nor is it supposed to invert one's usual intuition about things like, as you put it, factorization is slow but multiplication is fast.

(I'm also confused and disappointed by the poor performance of Python generators for this problem. First, the generator version of the Sieve of Eratosthenes is about an order of magnitude slower at getting all of the primes up to 10^8 than the brute force version using a pre-allocated array. And second, the generator version of the combinatorial code is at least an order of magnitude slower than the for loop version--i.e., two orders of magnitude slower than the factorization version. I'm still trying to figure out what's going on with all this.)
 
  • #61
PeterDonis said:
This is very disappointing and confusing to me, as while Python is expected to be slow, it's not supposed to be that slow, nor is it supposed to invert one's usual intuition about things like, as you put it, factorization is slow but multiplication is fast.
That bugs me too. Since yesterday I am trying to use Numba and PyPy however I couldn't find a way to run them. Maybe you can find a way to speed things by using Numba, PyPy or CPython
 
Last edited:
  • #62
PeterDonis said:
This is very disappointing and confusing to me, as while Python is expected to be slow, it's not supposed to be that slow, nor is it supposed to invert one's usual intuition about things like, as you put it, factorization is slow but multiplication is fast.

Can you try another implementation of Python? I agree with you that this doesn't look right.

I suspect some inefficiencies in memory access. My reasoning behind this is based on the fact that running my code on machines with faster CPU but slower memory slows it down. So, if the fact that there are 100MB long arrays wasn't enough of a tip-off that memory access would be important, measurements confirm it.
 
  • Like
Likes Arman777
  • #63
Vanadium 50 said:
Can you try another implementation of Python?

I don't have any others handy, and anyway I personally am more interested in figuring out what's going on with the standard CPython implementation.

Vanadium 50 said:
I suspect some inefficiencies in memory access.

That's a possibility, yes, but if that's what's going on, I don't see how to fix it, since I'm not doing anything unusual with the Python code.

For example, here is the function that corresponds to your n = 4 case as far as generating the combinations; it's only yielding 3-tuples because it's only iterating over the primes from 3 on (that's what the primes parameter is, a list of those primes); the function that calls this one takes care of multiplying by 2 and testing the particular combination to see if it belongs in the result set. This function, and its counterparts for the other sizes of combinations, is where my code is spending most of its time. (The q parameter is n divided by 2, so it will be 50,000,000 for the n = 10^8 case. The prod function just multiplies together all items in an iterable and returns the product.)

Python:
def _factor_gen_3(primes, q, debug=True):
    for i, a in enumerate(primes[:-2]):
        if a * prod(primes[i + 1:i + 3]) >= q:
            break
        for j, b in enumerate(primes[i + 1:-1]):
            if a * b * prod(primes[i + j + 2:i + j + 3]) >= q:
                break
            for c in takewhile(lambda p: a * b * p < q, primes[i + j + 2:]):
                yield (a, b, c)
 
  • #64
I don't see why that should be slow. The lambda is a bit unusual, but there's no reason it should be slower than an ordinary function.

Maybe a clue would be to try other values of n (100,000,000 in the question asked) and try and infer the computational complexity. Is the slowness O(n), or O(n2) or something else?
 
  • #65
Vanadium 50 said:
The lambda is a bit unusual

It's just easier than having to define a separate function for something so simple that only appears as a predicate in takewhile. As you say, it shouldn't make any difference to speed.

Vanadium 50 said:
Is the slowness O(n), or O(n2) or something else?

Roughly quadratic from the comparisons I've been able to make.
 
  • #66
PeterDonis said:
Roughly quadratic from the comparisons I've been able to make.

Well, the sieve itself is O(n log log n). So if you are seeing quadratic behavior, it's not the algorithm: it's that executing the python internals is taking longer than it should. Again, I would guess memory, just because it's easy to imagine how the second memory allocation might take more than the first.
 
  • #67
Vanadium 50 said:
the sieve itself is O(n log log n). So if you are seeing quadratic behavior, it's not the algorithm

I'm not counting the time it takes to pre-compute the list of primes using the sieve; I'm just talking about the time it takes to generate the combinations of prime factors once the list of primes is pre-computed.

Since the combination generator functions use nested for loops, we would expect them to be quadratic or worse if it weren't for the loop truncations. (After all, a straight count of ##r## element combinations taken from ##N## elements total goes like ##N^r## for large ##N##.) But the loop truncations should limit the number of combinations to roughly linear in ##N##--at least I think so, since basically we're just generating all squarefree numbers less than ##N##, which is roughly linear in ##N##.
 
  • #68
PeterDonis said:
loop truncations should limit the number of combinations to roughly linear in N

Right. \frac{6}{\pi^2}N if you want to be precise. But fundamentally it's got to be O(N) since you are testing (most) of the numbers, just in a different order.

Same point as above, though. The algorithm can't be doing this to you. It's got to be something internal to your Python implementation.
 
  • #69
PeterDonis said:
Roughly quadratic from the comparisons I've been able to make.

After further testing, it looks closer to ##N \log N##.

Vanadium 50 said:
The algorithm can't be doing this to you. It's got to be something internal to your Python implementation.

I'm afraid so, yes.
 
  • #70
If it's NlogN, I wonder if it allocates memory in non-contiguous blocks: perhaps a B-tree. The NlogN would come from it traversing the tree rather than just incrementing a counter.

Python does garbage collection automatically, right? If so, maybe its a side effect of keeping track of objects that gets overwhelmed by these giant arrays. That would also explain why I don't see it; I made C++ give me contiguous arrays.
 
  • #71
Vanadium 50 said:
I wonder if it allocates memory in non-contiguous blocks

It could, although I'm not sure how that would affect the performance of simple nested for loops over slices of a list that's not being mutated. All of the variable accesses at the C level inside the interpreter should be just pointers.

Vanadium 50 said:
Python does garbage collection automatically, right?

Yes. But disabling automatic gc doesn't affect the execution time, so this doesn't seem to be the issue. (Mostly Python deallocates objects based on their reference count going to 0; the automatic gc is for cases that can't be handled that simply, like circular references. I don't think there are any of those in my code.)
 
  • #72
Vanadium 50 said:
perhaps a B-tree. The NlogN would come from it traversing the tree

If you mean the interpreter has to do this when allocating small blocks of memory for new objects, yes, this is one of my prime suspects at this point, since one thing the for loops can't avoid doing is allocating new objects on every iteration to do the tests and yield the combinations.
 
  • #73
I am actually thinking if it has very small memory blocks, it needs to figure out where everything is even when reading. But what I can't figure out is why anyone would do this on a modern CPU.
 
  • #74
Vanadium 50 said:
I am actually thinking if it has very small memory blocks, it needs to figure out where everything is even when reading.

Python objects at the C level are just pointers to data structures. There are variable name lookups, which in the general case are hash table lookups, but for local variables inside functions those are optimized to array accesses.
 
  • #75
OK, I give up. But since the behavior is crazy, any explanation of it is bound to be a little crazy too.
 
  • #76
Some final thoughts:

Can I do better than one second? Maybe a little. As I said, memory access is the bottleneck. A bool is a byte, so I need 100 MB to store my prime array. With clever packing, I can get a factor of at least 16. But how much can I speed up? Well, the prime-finding is 800 ms, and about half of that is memory access, and maybe I can double the speed of that, so there's 200 ms to be gained. About 20%. There are slightly faster` sieves, so maybe I can get a 30% savings, but probably not much more.

Looking at it from the other direction, there's probably 150 ms of memory access that's unavoidable, and similarly 100 ms of arithmetic that's unavoidable, so I am within a factor of maybe 4 of what the hardware can do. That's pretty good.

To do much faster, one needs a different algorithm. An exhaustive test, even one that works with factors, needs to test almost every number in the range for primality. (61% of the odd numbers) That suggests I need to sieve. And sieving already dominates my time. What I am thinking of is a) if there is one solution, can I use it to find another solution? b) is there some way of generating solutions from "seeds" - other numbers from which a solution can be derived? c) if I exclude a candidate solution, is there some way to infer that another candidate can be excluded? Otherwise I would be nibbling around the edges.
 
  • #77
Vanadium 50 said:
An exhaustive test, even one that works with factors, needs to test almost every number in the range for primality. (61% of the odd numbers)

I'm not sure what you mean by this. The solution I'm using that does factorization (to compare with the combinatorial one) iterates through all numbers equal to 2 modulo 4 in the range (starting with 6); that's 25% of the numbers in the range. It only tries to factor the ones for which n + 1 and n/2 + 2 are prime, doing a lookup into the pre-computed primes, (and the factoring algorithm, besides using the pre-computed list of primes, exits immediately if it finds a repeated prime factor, so it only completely factors the squarefree numbers).
 
  • #78
What I mean is that so many numbers need to be tested by primality, there is likely little to be gained by stopping the sieve early and using primality testing (e.g. Adelman, Miller-Rabin) on any larger numbers.
 
  • #79
If I recall correctly, they say most Project Euler programs should be able to run in less than 2 minutes. This is where some thinking is necessary to find the 'better way'.

I'm just now reading through this and haven't read through all the suggestions yet, so this may have been suggested: could you build a set of numbers which have failed the test? Check to see if the next number to check is a multiple of one of those numbers.
 
  • #80
One last thought. Most Project Euler questions do not lend themselves well to parallelization, partly by design. This one does.

Improvement is 20%. The reason is, as discussed earlier, memory bandwidth is a limiting factor. More processes get us closer to that wall.
 
Last edited:
  • #81
Update update: switched from an i5-6500 (4C/4T) to a 3900X (12C/24T). So with 4-10x the computing power, the incremental improvement is...<drum roll>... 40%.
 
Back
Top