Sum of All Positive Integers n where d+n/d is Prime to 100M

  • Thread starter Thread starter Arman777
  • Start date Start date
  • Tags Tags
    Euler
Click For Summary
The discussion focuses on solving Project Euler Problem 357, which involves finding the sum of all positive integers n not exceeding 100 million such that for every divisor d of n, the expression d + n/d is prime. Participants share code snippets for generating divisors and testing for primality, noting that the current implementation is slow for larger numbers. Suggestions include using the Sieve of Eratosthenes to generate a set of primes for faster membership testing, as set lookups are significantly quicker than list lookups. Profiling the code is emphasized to identify bottlenecks and optimize performance. The conversation highlights the importance of efficient algorithms in computational problems.
  • #31
You need to pay more attention to what people are writing.

Vanadium 50 said:
Can you think of a property of the prime factorization that would help you?
Vanadium 50 said:
Nothing says d is the right variable to loop over
Vanadium 50 said:
you have posted a solution based on exhaustive tests, and three times that's not the most efficient way to solve the problem
 
Physics news on Phys.org
  • #32
As I said I couldn't find it. What should I loop over then ?
 
  • #33
I'll tell you in a few days. It is important that you think for yourself. You have plenty of hints.
 
  • #34
Another hint. How many Sophie Germain primes are there under 50,000,000? Why do I care? Why do I care how fast I can find them?
 
Last edited:
  • #35
Vanadium 50 said:
Another hint. How many Sophie Germain primes are there under 50,000,000? Why do I care? Why do I care how fast I can find them?
I find 229568 primes.

I looked at the definiton etc. However I did not quite understand how can we use them. For instance the solution of the number + 1 must be prime. These primes are,

(2,3,7,11,23,31,43,59,71,79,83)

However the Sophie German primes,

(2, 3, 5, 11, 23, 29, 41, 53, 83)

I couldn't make the connection. Some of them are in the solution but some of them are not.

Vanadium 50 said:
Why do I care? Why do I care how fast I can find them?
Its important to optimize the codes..?

I added the condition of num / 2 + 2 = prime. My time now reduced to half, 24 second.
 
Last edited:
  • #36
If the only factors of your number are 2 and p, then p is prime and 2p+1 is prime. How does that relate to Sophie Germain primes. How does that relate to your solution?
 
  • #37
Vanadium 50 said:
If the only factors of your number are 2 and p, then p is prime and 2p+1 is prime. How does that relate to Sophie Germain primes. How does that relate to your solution?
I see it now.
[1 , 2, p , 2* p]

However in this case its not certain that p+2 should be prime ? Also It's the same as num/2 + 2 = prime condition .
 
  • #38
Arman777 said:
However in this case its not certain that p+2 should be prime ?

That is correct. This is a necessary condition, not a sufficient condition. It is also only necessary when there are exactly two factors. (Why?)

Looking back - if you are spending your time factorizing, how would you rewrite your code so you weren't?
 
  • #39
Python:
import math
import time
from itertools import compress

start = time.perf_counter()

def prime_number(n):
    ''' Prime numbers up to n'''
    sieve = bytearray([True]) * (n//2+1)
    for i in range(1,int(n**0.5)//2+1):
        if sieve[i]:
            sieve[2*i*(i+1)::2*i+1] = bytearray((n//2-2*i*(i+1))//(2*i+1)+1)
    return  {2,*compress(range(3,n,2), sieve[1:])}   #Using set to increase the search time

list_of_primes = prime_number(10**8)  # listing prime numbers up to 10**8
def test_condition (num):
    ''' Testing the condition of d+n/d by taking the input as array of divisor and num'''
    for i in range(1, int(num**0.5) + 1):
        if num % i == 0:
            if (i + num /i) not in list_of_primes:
                return False
    return True

Sum = 0
for num in list_of_primes:
    if num % 4 != 0 or num % 9 != 0:
        if (num - 1) / 2 + 2 in list_of_primes:
            if test_condition (num-1) == True:
                Sum += num - 1

print(Sum+1)
end = time.perf_counter()
print("Time : ", end-start, "sec")

In this code, I am testing for, (num-1) /2 + 2 cases and if it's not true then the loop will break.

It's certain that (num -1) should be even number and all even numbers are divisible by 2. Hence all numbers that will satisfy the condition (given by the problem) should also satisfy (num - 1)/2 + 2

Vanadium 50 said:
if you are spending your time factorizing, how would you rewrite your code so you weren't?

This is the only thing that I can up with. If the (num-1)/2 + 2 is false then the loop will break. For instance, let's take the num = 19 case. [1,2,3,6,9,18] are the divisors of num-1. And (num-1)/2+2 = 11. However, it has 6 and 3 so we cannot add it to the Sum.

After checking (num-1)/2 + 2 == prime condition, we should definately factor the num-1. Same as for num = 31.

Without factorizing, we cannot be certainly sure about if num-1 should be added to Sum or not.

This condition helps me to improve the run time. Now by code works 2x faster. But I don't think I can further improve the run time.

I am also bored by the problem..
 
  • #40
The point I was trying to get you to see was that you can avoid factoring by not looping over the number, but looping over prime factors. All solutions have no repeated prime factors, which allows you to cut candidate solutions way, way down. If there are exactly two prime factors, one mmust be 2 (for n>1), so you are dealing with Sophie Germain primes as candidates.

The 100,000,000 limit means you need to only consider up to 8 prime factors.

Arman777 said:
I am also bored by the problem..

That's a good way to get people to not help you in the future.
 
  • #41
Vanadium 50 said:
If there are exactly two prime factors, one must be 2 (for n>1), so you are dealing with Sophie Germain primes as candidates.

I write code to see how many of the Sophie German Primes adds up to the sum.
Here are my results

Total number of Sophie German primes below 50 million, 229568
Sophie German primes who passed the test_condition, 3808


Also, I looked for the total number of primes that passes the test and for below 50 million that is 23025 and for 100 million 39626

For the only 2 factor case, we can find the solutions in a short amount of time. However, as we can see they only cover a small percentage of the total solution number.

Vanadium 50 said:
All solutions have no repeated prime factors, which allows you to cut candidate solutions way, way down.

I think I understand your point. However, I think that the time will not change much even we try that. First, we need to find all the factors of a number and then we should check somehow it repeats or not. However, I already did that by dividing 4 and 9, etc without needing to factor the number. Also even in that case you still need to check if the sum of d + num/d = prime or not.

If you can write a python code for your algorithm then maybe we can compare times and see the result.
 
  • #42
I thought you were bored.

I get 229568, but the number who pass is 30301. The first few I get are: 6 10 22 58 82 358 382 478 562 838 862 1282 1318 1618 2038 2062 2098 2458 2578 2902 2962 3862 4258 4282 4678 5098 5938 6598 6658 6718 6778 7078 7642 7702 8038 8542 8962

It takes 20 ms to cover all the answers with exactly two factors.
 
  • #43
Vanadium 50 said:
I get 229568, but the number who pass is 30301
I am getting the same answer as you do now. It seems like I made a mistake on my previous code.
Vanadium 50 said:
It takes 20 ms to cover all the answers with exactly two factors.
Thats nice and normal.. since you have to only check num + 2 case for num in sophie_german_primes. How about the whole test. How it takes to run ?
Vanadium 50 said:
I thought you were bored.
Kind of but its still fun to do somehow.
 
  • #44
You're going to need to decide - are you bored or not? I'm not `willing to put any effort into a conversation if the other party will just say "Too bad. I'm bored. Goodbye."

Here's why looping over factors will be faster - and you have to do two factors (related to Sophie Germain primes) all the way up to 8 factors. Looping over n, you need to test 100,000,000 numbers. I only need to test nonsquare numbers, i.e. numbers with no repeated factors, and there are about 61,000,000 of them. So I am already ahead.

You can say "I don't need to test all the numbers. Only the even ones, only the ones that are one less than a prime, etc." True. But neither do I. That factor of 0.61 persists.

But that's peanuts. The real gain is that while you have to factor numbers, I don't. My numbers are already factored. You're spending 90% of your tine factoring, and I don't have to do it at all. So my code (when I finish writing it) should be ~15x faster than yours.
 
  • #45
Vanadium 50 said:
~15x faster than yours

That's not possible in python. I am using one of the fastest prime generators and that only itself takes 2.5 sec to produce to prime numbers up to 10**8... you can mostly be 5-3 times faster. Which I am not sure that even possible. I am curious about your algorithm. Also, you should start your time before generating the primes not after and I only accept if its faster in python not in C. C is already faster than the python in the core.

Vanadium 50 said:
Here's why looping over factors will be faster - and you have to do two factors (related to Sophie Germain primes) all the way up to 8 factors. Looping over n, you need to test 100,000,000 numbers. I only need to test nonsquare numbers, i.e. numbers with no repeated factors, and there are about 61,000,000 of them. So I am already ahead.

Python:
        if (num - 1) / 2 + 2 in list_of_primes:
line in my code already checks for the 2 factor case and if its wrong it immidetly exits the loop.

I think this is much faster or equally faster than the generating some primes and then testing each of them that num + 2 satisfies or not.

Vanadium 50 said:
I only need to test nonsquare numbers,

I can also add that condition to my code. And I did before but it does not change much.

Vanadium 50 said:
You're going to need to decide - are you bored or not? I'm not `willing to put any effort into a conversation if the other party will just say "Too bad. I'm bored. Goodbye."

We can continue to discuss
 
  • #46
I am of course only talking about the algorithmic time, not the set-up time (like finding all the primes).

Arman777 said:
I think this is much faster or equally faster than the generating some primes

You see, this is why they have classes on algorithm design. It's not about your feelings. (Or my feelings, but as an inexperienced programmer your feelings should carry less weight) I have shown why this is faster: I am replacing factorization with multiplication (and avoiding 39% of the work). Factorization is slow, multiplication is fast.
 
  • Like
Likes StoneTemplePython and Arman777
  • #47
Okay then what can I say .. when you finish writing your code you can share it.
 
  • #48
Vanadium 50 said:
I am of course only talking about the algorithmic time, not the set-up time (like finding all the primes).
Okay then. I wrote a new code by implementing your idea on sophie-german primes now my code runs in 8.35 sec

Python:
import math
import time
from itertools import compressdef prime_number(n):
    ''' Prime numbers up to n'''
    sieve = bytearray([True]) * (n//2+1)
    for i in range(1,int(n**0.5)//2+1):
        if sieve[i]:
            sieve[2*i*(i+1)::2*i+1] = bytearray((n//2-2*i*(i+1))//(2*i+1)+1)
    return  {2,*compress(range(3,n,2), sieve[1:])}   #Using set to increase the search time

list_of_primes = prime_number(10**8)  # listing prime numbers up to 10**8
square_numbers = {i**2 for i in range(2, 10**4)}
for i in square_numbers:
    for j in list_of_primes:
        if (j-1) % i == 0 and j in list_of_primes:
            list_of_primes.remove(j)
        else:
            break

list_of_primes = list_of_primes - square_numbers
sophie_german_primes = prime_number(50 * 10**6)
sg_prime = {i for i in sophie_german_primes if 2*i + 1 in list_of_primes}

def test_condition (num):
    ''' Testing the condition of d+n/d by taking the input as array of divisor and num'''
    for i in range(1, int(num**0.5) + 1):
        if num % i == 0:
            if (i + num /i) not in list_of_primes:
                return False
    return True

start = time.perf_counter()

Sum = 0
for num in sg_prime:
    if num + 2 in list_of_primes:
        Sum += 2*num

new_list_primes = list_of_primes - {2*i+1 for i in sg_prime}

for num in new_list_primes:
    if (num - 1) / 2 + 2 in list_of_primes:
        if test_condition (num-1) == True:
            Sum += num - 1

print(Sum+1)
end = time.perf_counter()
print("Time : ", end-start, "sec")
Vanadium 50 said:
your feelings should carry less weight)
:/

It seems that, this type of algorithm makes my code faster by an amount of 2.5x
 
  • #49
A nonsquare number is not a number that is not a square. It's a number that has no repeated prime factors. So 8 is not a square, but it's also not a nonsquare.

My code has a bug in it; it finds about 1% too many solutions (not too few!). Most likely, I am doing one test twice and another test not at all. Even so, it takes 20 ms to test all the numbers with 2 prime factors, 100 ms to test all numbers with 3 and 10 ms to test all with 4. That's most of them, so I confident the testing time will end up under a quarter of a second.

If I replace my dumb sieve with something like primegen, I should be able to sieve in under half a second. (It sieves a billion primes in a second or two on a modern CPU).

So while I admit I don't yet have a solution in under one second, it's pretty clear that there is one, and looping over factors gets one there.
 
  • #50
Vanadium 50 said:
and looping over factors gets one there.
It seems so yes.
Vanadium 50 said:
Even so, it takes 20 ms to test all the numbers with 2 prime factors, 100 ms to test all numbers with 3 and 10 ms to test all with 4. That's most of them, so I confident the testing time will end up under a quarter of a second.
Thats really great actually. So I think we can to an end. It seems that you have the fastest algorithm
Vanadium 50 said:
My code has a bug in it; it finds about 1% too many solutions (not too few!).
And I am sure you can fix it.
 
  • #51
Vanadium 50 said:
My numbers are already factored.

When you say this, do you mean you are iterating over combinations of prime factors (from the pre-computed list of primes), or that you are using the pre-computed list of primes (actually a hash table in this case) to allow fast factorization?

I ask because I have tried both methods in Python and the latter method is running much, much faster, which seems counterintuitive to me, but combinatorial code in Python in general seems to run slow when the number of items in the underlying set gets this large. So it's possible that it's a Python quirk I'm dealing with and not that the combinatorial method is inherently slower.
 
  • #52
PeterDonis said:
do you mean you are iterating over combinations of prime factors

This. I do this in several loops. One for the case with exactly 2 factors, one with exactly 3, etc.

I'd post the code, but I am converting from iterators to just plain loops now, so absolutely nothing works.
 
  • #53
Vanadium 50 said:
I do this in several loops. One for the case with exactly 2 factors, one with exactly 3, etc.

Ok, that's basically the same method I'm using for the combinatorial case. But are you doing some kind of pre-filtering to avoid having to generate all the combinations?
 
  • #54
Here's what I am doing for the n = 4 case, as an example:

C++:
for(int i = 0; (i < nprimes-2) && (primes[i] < cbrt(UPPERBOUND/2.0)); i++) {
  a = primes[i];
  for(int j = i+1; primes[j] < sqrt(0.5*UPPERBOUND/a); j++) {
  b = primes[j];
    for(int k = j+1; 2*a*b*primes[k] < UPPERBOUND; k++) {
      c = primes[k];
      if(is_prime[2*a*b*c+1] && is_prime[a*b*c+2] &&
         is_prime[a + 2*b*c] && is_prime[b + 2*a*c] && is_prime[c + 2*a*b] &&
         is_prime[2*a + b*c] && is_prime[2*b + a*c] && is_prime[2*c + a*b] ) {
          answer += 2*a*b*c;
       }
    }
  }
}

primes is just an array of prime numbers, starting with 3, and is_prime is an array of bools.

Here's the idea. We know 2 is a factor, so set up a quad of primes in increasing order {2,a,b,c}. Then test the quad all at once. There are 2n factors to test, but half of those tests are redundant, so we have here 8 tests to do.

I don't have any pretesting, but I do calculate the largest bounds. For example, what is the largest a can be? It's when a, b and c are consecutive and nearly equal. How big can b be? It's when b and c are nearly equal (and I account for 2 and a already being factors).

Overall, my strategy has been not to think of this as testing properties of numbers but to think of this as testing properties of sets of numbers.

Moving from iterators to plain old loops fixed the over-counting problem. Usually iterators simplify things, but here they were just making things more complicated. I get n = 2 has 30301 solutions, n = 3 has 9009, and n = 4 has 312. That covers 97% of the answer and takes about 200 ms.
 
  • Like
Likes Klystron
  • #55
Vanadium 50 said:
My code has a bug in it; it finds about 1% too many solutions (not too few!). Most likely, I am doing one test twice and another test not at all.

I don't know C++ . Maybe you are testing for the same cases such as (2,3,7,5) and (2,7,5,3) etc. Or they might be a problem in the range of the prime numbers
 
  • #56
Arman777 said:
Maybe you are testing for the same cases such as (2,3,7,5) and (2,7,5,3) etc. Or they might be a problem in the range of the prime numbers

Highly unlikely. The structure of the loops (look at the code) makes the former unlikely, and the latter would give too few solutions, not too many. While rewriting from scratch fixed the problem so a) this is moot and b) we will never know, the most likely possibility is a typo in the testing line.

In any event, I converted to plain for loops, and here is the output:

CPU time used to find primes: 810 ms 30301 two-factor candidates found 9009 three-factor candidates found 312 four-factor candidates found 3 five-factor candidates found CPU time used: 990 ms And the answer is <deleted>

So I am under - just - my one second goal.
 
Last edited by a moderator:
  • #57
There are no 6-7-8 factors ? because you said at max we should have 8 factors ?
 
  • #58
There could be solutions with higher n - but there aren't.

What's happening is that the number of candidate solutions is falling (e.g. for n = 8 there are only four possible values of a) and at the same time the conditions they need to pass is growing (exponentially).
 
  • Like
Likes Arman777
  • #59
P.S. I just found out that Carl Pomerance (yes, that Carl Pomerance) proved that you need to go to at least 2 x 109 to need n > 5. Since the problem only goes to 108, we don't need to even check 6, 7, and 8.
 
Last edited:
  • #60
Vanadium 50 said:
I don't have any pretesting, but I do calculate the largest bounds.

Yes, I see that and it's basically the same thing I'm doing with the combinatorial version in Python. But coding it with straight Python for loops, basically the equivalent of your C++ code, is still slower by about an order of magnitude than the factoring version (where I'm using the pre-computed dictionary of primes to speed up prime factorization, and also filtering out as many numbers as possible before factoring to minimize its impact).

This is very disappointing and confusing to me, as while Python is expected to be slow, it's not supposed to be that slow, nor is it supposed to invert one's usual intuition about things like, as you put it, factorization is slow but multiplication is fast.

(I'm also confused and disappointed by the poor performance of Python generators for this problem. First, the generator version of the Sieve of Eratosthenes is about an order of magnitude slower at getting all of the primes up to 10^8 than the brute force version using a pre-allocated array. And second, the generator version of the combinatorial code is at least an order of magnitude slower than the for loop version--i.e., two orders of magnitude slower than the factorization version. I'm still trying to figure out what's going on with all this.)
 

Similar threads

  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 19 ·
Replies
19
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 28 ·
Replies
28
Views
4K
  • · Replies 8 ·
Replies
8
Views
2K
Replies
4
Views
2K
  • · Replies 10 ·
Replies
10
Views
2K
  • · Replies 97 ·
4
Replies
97
Views
9K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 14 ·
Replies
14
Views
4K