Optimizing Prime Pair Combinations for Project Euler Problem 60

Arman777 · Aug 29, 2018

https://projecteuler.net/problem=60

I wrote a piece of code to solve the problem but it still takes too much time to solve it Any ideas how can I improve my code, like adjustments that I can make ?

Code:

import itertools
import time

start = time.perf_counter()
def prime(N):
    if N == 0 or N == 1 or N == 5:
        return False
    for i in range(2, int(N ** 0.5) + 1):
        if N % i == 0:
            return False
    return Truedef dig_sum(j, n):  # j is the number, n is the element in the list as tuple
    d = 0
    if j == 0:   
        for k in n:
            d += sum(int(digit) for digit in str(k))
        if d%3 == 0:
            return False
        return True
    else:
        s = sum(int(digit) for digit in str(j))
        for k in n:
            w = s + sum(int(digit) for digit in str(k))
            if w % 3 == 0:
                return False
        return Truedef con_test(j, n):  # j is the number, n is the element in the list as tuple
    if j == 0:
        w = int(str(n[0]) + str(n[1]))
        w1 = int(str(n[1]) + str(n[0]))
        if prime(w) == False or prime(w1) == False:
            return False
        return True
    else:
        for i in n:
            w = int(str(j) + str(i))
            w1 = int(str(i) + str(j))
            if prime(w) == False or prime(w1) == False:
                return False
        return Truedef st(n):            #to get rid of the repeated elements
    H = [tuple(sorted(i)) for i in n]
    G = set(H)
    return GPrimes = [i for i in range(3,8000) if prime(i) == True]   #Prime range 

Com_two = list(itertools.combinations(Primes,2))

G2 = [i for i in Com_two if dig_sum(0,i) == True and con_test(0,i) == True]

G2r = st(G2)

G3 = [(i,*j) for i in Primes for j in G2r if dig_sum(i,j) == True and con_test(i,j) == True ]

G3r = st(G3)

G4 = [(i,*j) for i in Primes for j in G3r if dig_sum(i,j) == True and con_test(i,j) == True ]

G4r = st(G4)

G5 = [(i,*j) for i in Primes for j in G4r if dig_sum(i,j) == True and con_test(i,j) == True ]

F = [(sum(j),j) for j in G5]

end = time.perf_counter()
Time = end-start
print(Time)
print(F)

When I set Primes range to 8000 it took 18 min to give an answer I checked the solution for the 4 set of primes and it gives me the right result.(If you want to check print (G4r))

Any ideas how can I make my program faster ?

jedishrfu · Aug 29, 2018

You should try timing each function individually to see how much time they take and then from there optimize the specific functions.

Its often said that program execution spends 80% executing 20% of the code. Try to find that 20% code and optimize it.

Arman777 · Aug 29, 2018

jedishrfu said:

You should try timing each function individually to see how much time they take and then from there optimize the specific functions.

Its often said that program execution spends 80% executing 20% of the code. Try to find that 20% code and optimize it.

I find the answer at by setting prime limit 10.000 . Yey for that it took 41 min. I ll try to I guess it mostly does it on the G2 level and then decreases cause the options are decreasing but G2 level is the starting point. I also think that its probably due to the algorithm . I mean I am not sure you understand what I am trying to do but I think there must be a mathematical trick.

jedishrfu · Aug 29, 2018

Yeah, I’m not sure what the code is doing but when faced with something like this that has performance issues, I try to identify what function is being the time hog and go from there.

jim mcnamara · Aug 29, 2018

Consider profiling, ex: cProfile. This shows you how much time is spent by each function/method. Trying to guess is not the best choice unless you are very experienced.

How to profile, options - https://devopedia.org/profiling-python-code

verty · Aug 30, 2018

Use a sieve to sieve primes up to say 10 million. I think your time hog is that prime check procedure.

Arman777 · Aug 30, 2018

jedishrfu said:

Yeah, I’m not sure what the code is doing

Its my bad that I didnt explain it but I guess I don't need to from now on ( unless you want to).

jim mcnamara said:

Consider profiling, ex: cProfile. This shows you how much time is spent by each function/method. Trying to guess is not the best choice unless you are very experienced.

How to profile, options - https://devopedia.org/profiling-python-code

I finally manage to run it . I am saying finally cause it was hard for me to find how to run it. I add the image.

Functions have names and you see the time that took to complete them.

Line 63 took 168 sec and that's the G3 = [] line other then that I run this code for prime range 5000.

verty said:

Use a sieve to sieve primes up to say 10 million. I think your time hog is that prime check procedure.

İt seems that I need a better prime tester and a better con_test function.

227.948833517
[]

Code:

         96268183 function calls in 227.950 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.018    0.018  227.950  227.950 pps.py:1(<module>)
  5775553    9.231    0.000   34.385    0.000 pps.py:16(dig_sum)
  2093904    0.437    0.000    0.437    0.000 pps.py:20(<genexpr>)
 26176944    6.086    0.000    6.086    0.000 pps.py:25(<genexpr>)
 40268129    9.080    0.000    9.080    0.000 pps.py:27(<genexpr>)
  2891920    5.823    0.000  191.264    0.000 pps.py:33(con_test)
        3    0.001    0.000    0.007    0.002 pps.py:49(st)
        3    0.004    0.001    0.006    0.002 pps.py:50(<listcomp>)
        1    0.001    0.001    0.006    0.006 pps.py:55(<listcomp>)
        1    0.068    0.068    9.512    9.512 pps.py:59(<listcomp>)
        1    1.642    1.642  168.858  168.858 pps.py:63(<listcomp>)
        1    0.549    0.549   48.872   48.872 pps.py:67(<listcomp>)
  3886258  185.447    0.000  185.447    0.000 pps.py:7(prime)
        1    0.009    0.009    0.675    0.675 pps.py:71(<listcomp>)
        1    0.000    0.000    0.000    0.000 pps.py:73(<listcomp>)
        1    0.000    0.000  227.950  227.950 {built-in method builtins.exec}
        2    0.001    0.000    0.001    0.000 {built-in method builtins.print}
    12885    0.002    0.000    0.002    0.000 {built-in method builtins.sorted}
 15162571    9.552    0.000   25.154    0.000 {built-in method builtins.sum}
        2    0.000    0.000    0.000    0.000 {built-in method time.perf_counter}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Arman777 · Aug 30, 2018

I changed my prime code to Miller–Rabin primality test ( I didnt write the code myself I copied ) and I find the result in 8 min 30 sec. I also need to change the con_test() function for improved result ..

Vanadium 50 · Aug 30, 2018

verty said:

I think your time hog is that prime check procedure.

This runs counter to the advice earlier in the thread that the OP should learn to profile. He won't learn that if we tell him what the answer is. This is especially true if we guess.

jedishrfu · Aug 30, 2018

That's great!

One rule of thumb that you should consider is adding a comment about where you got the copied code. In a professional environment, this may not be allowed and you might need to rewrite it (or get someone in your company untouched by seeing the code) in order to make it original code to protect your company from copyright laws.

verty · Aug 30, 2018

Vanadium 50 said:

This runs counter to the advice earlier in the thread that the OP should learn to profile. He won't learn that if we tell him what the answer is. This is especially true if we guess.

True. He should learn to profile his code and I see he did just that. So I'll leave it up to him now.

rcgldr · Aug 30, 2018

One issue is that Python is interpretative, so that will consume a lot of time compared to a compiled language like C or C++. I'm wondering how much faster the same algorithm would be in C or C++.

Arman777 · Aug 30, 2018

rcgldr said:

One issue is that Python is interpretative, so that will consume a lot of time compared to a compiled language like C or C++. I'm wondering how much faster the same algorithm would be in C or C++.

thats really good question Sadly I don't know C...And I don't think anyone here will try to convert it. I couldn't find a faster way to do the con_test(), so that's also another thing that can make the program run faster.

jedishrfu · Aug 30, 2018

You could try converting it to Julia. Its hot right now in Data Science and is arguably much faster than Python.

Python compiles its code into byte codes and follow on executions use that saving some conversion time. I'd run your algorithm a couple of times using the 'time' command if you were on Linux or MacOS.

Arman777 · Aug 30, 2018

This is the last version with

jedishrfu said:

You could try converting it to Julia. Its hot right now in Data Science and is arguably much faster than Python.

Python compiles its code into byte codes and follow on executions use that saving some conversion time. I'd run your algorithm a couple of times using the 'time' command if you were on Linux or MacOS.

I don't know it too.But can it make a really huge difference ? Like minutes

jedishrfu · Aug 30, 2018

https://julialang.org/benchmarks/

See for yourself Julia vs the world.

Arman777 · Aug 30, 2018

jedishrfu said:

https://julialang.org/benchmarks/

See for yourself Julia vs the world.

Thats interesting ...

jedishrfu · Aug 30, 2018

You have to take it with a grain of salt. Performance tests are notoriously difficult to get right. Developers will do anything to get bragging rights for speed especially in situations where money is involved. In this case, there’s a Julia consulting company being formed by the original creators so of course they’ll want Julia to win.

As an example, the developers might optimize the underlying code to do especially well for specific test cases in order to beat a competitor. The database engine wars were like that. It meant that one database product would outperform a competitor one year and the competitor would fire back on the next release. To the ordinary user both products would perform well enough to not really matter but products are first selected based on these numbers and so there’s a sales opportunity and followon upgrades that turns into big money especially for enterprise databases.

fluidistic · Aug 31, 2018

You guys are missing an important thing. Project Euler is notoriously known for problems that aren't approachable via brute force (though you certainly can solve a few of them that way). What I mean is that, it is both a mathematical and programming challenge. Usually most problems can be solved under 2 minutes using Python. If this isn't the case, then the approach is to be reconsidered. So before going blindly into profiling and things of the sort, the very first thing to think about, is whether the mathematical approach is reasonably well suited, keeping in mind that brute force approaches are to be discarded.

Another thing. Since the OP actually did get the correct result, he has access to the solution to this problem posted by others. And since Python is very popular, he can directly compare his code and therefore his method, to others.

Arman777 · Aug 31, 2018

fluidistic said:

You guys are missing an important thing. Project Euler is notoriously known for problems that aren't approachable via brute force (though you certainly can solve a few of them that way). What I mean is that, it is both a mathematical and programming challenge. Usually most problems can be solved under 2 minutes using Python. If this isn't the case, then the approach is to be reconsidered. So before going blindly into profiling and things of the sort, the very first thing to think about, is whether the mathematical approach is reasonably well suited, keeping in mind that brute force approaches are to be discarded.

Another thing. Since the OP actually did get the correct result, he has access to the solution to this problem posted by others. And since Python is very popular, he can directly compare his code and therefore his method, to others.

I think that my code would work under 1 min in C/C++ . I didnt use brute force , but I also couldn't find a better alogrithm under then this. My code works like this.

First I made a set of 2 primes that satisfies two conditions.The first condition was the digit sum cannot be 3. The other condition was the concentrating test. After finding these all 2 prime sets, I tried to add the third prime to the set and again I checked these 2 conditions are satisfied. After that, I tried to add the fourth prime and so on.

The best python solution that I noticed is by ximera2 :and he uses prime dictionary and then using it (and also other sort of thing he manages to reduces time 26.5sec)

İn C and Java I saw 2 sec solution codes. I saw another python solution using graph theory and finding it in seconds.

I don't think my algorithm is bad but I am not a good coder so this is best I can do.

Arman777 · Aug 31, 2018

jedishrfu said:

You have to take it with a grain of salt. Performance tests are notoriously difficult to get right. Developers will do anything to get bragging rights for speed especially in situations where money is involved. In this case, there’s a Julia consulting company being formed by the original creators so of course they’ll want Julia to win.

As an example, the developers might optimize the underlying code to do especially well for specific test cases in order to beat a competitor. The database engine wars were like that. It meant that one database product would outperform a competitor one year and the competitor would fire back on the next release. To the ordinary user both products would perform well enough to not really matter but products are first selected based on these numbers and so there’s a sales opportunity and followon upgrades that turns into big money especially for enterprise databases.

I see your point ..the code looks like python actually but the speed is near C that's amazing actually. I searched a bit and actually its normal to be too slow compared to C or Java etc.

fluidistic · Aug 31, 2018

Arman777 said:

I think that my code would work under 1 min in C/C++ . I didnt use brute force , but I also couldn't find a better alogrithm under then this. My code works like this.

First I made a set of 2 primes that satisfies two conditions.The first condition was the digit sum cannot be 3. The other condition was the concentrating test. After finding these all 2 prime sets, I tried to add the third prime to the set and again I checked these 2 conditions are satisfied. After that, I tried to add the fourth prime and so on.

The best python solution that I noticed is by ximera2 :and he uses prime dictionary and then using it (and also other sort of thing he manages to reduces time 26.5sec)

İn C and Java I saw 2 sec solution codes. I saw another python solution using graph theory and finding it in seconds.

I don't think my algorithm is bad but I am not a good coder so this is best I can do.

I do not find your code very Pythonic in that it doesn't really deal with lists (you do use list() and list comprehension, but that's about it). Then the redundant if ... == True. Which can be replaced by if ...
Then, way too many loops and if conditions, this looks like C or something like that, which Python is not terribly fast at.
Plus, I find the code hard to understand, I do not have the time to understand it even though it would be nice.
Now, by reading the EP question, the first thing that comes to mind is... should I start by taking 5 primes from 1,1,1,1,1 and check whether they satisfy the condition? Or can I use a math trick and take 4 of them as 3, 7, 109, 673, and the 5th one from 1 (or M >1) ?
I do not know what your code is doing there, but if it starts from 5 low prime numbers and gradually increase them as to find the answer, this is clearly not what is intended. IMO, you should dissect and fully understand the quickest ways people have found to tackle the problem. Then it should be obvious to you, what you were doing wrong. And I'm sure it's related to the math of the problem, not that much about the programming, which undoubtedly could be improved.

Edit before posting: After rereading your post, I do not think starting by a list of 2 primes that satisfies the condition is the way to go, especially when the problem statement provided 4 of them. But that's just the tip of the iceberg here. There are hidden mathematical tricks that you did not use, that you could have used, for sure.

Vanadium 50 · Aug 31, 2018

fluidistic said:

Or can I use a math trick and take 4 of them as 3, 7, 109, 673, and the 5th one from 1 (or M >1)

It is true that the smallest 5-plet includes a 4-plet (five of them, in fact). Is it true that the smallest 5-plet includes the smallest 4-plet? I don't see how that immediately follows.

Arman777 · Aug 31, 2018

fluidistic said:

Edit before posting: After rereading your post, I do not think starting by a list of 2 primes that satisfies the condition is the way to go, especially when the problem statement provided 4 of them. But that's just the tip of the iceberg here. There are hidden mathematical tricks that you did not use, that you could have used, for sure.

I didnt understand it. Provided 4 numbers are just example and have nothing to do with the solution.

fluidistic said:

I do not know what your code is doing there, but if it starts from 5 low prime numbers and gradually increase them as to find the answer, this is clearly not what is intended.

I don't think that would work. Its just simply brute froce with 5 loops which, it would take much more time to test ( if they are prime or not and the concatenate part).

Think about it, 5 numbers and you have to test each two pair that's like 10 combination then you have 5 loops and each range should be 3000. which that's like 3000**5 ?? Theres no possible way that it will give the result under 1 min.

fluidistic said:

Then it should be obvious to you, what you were doing wrong. And I'm sure it's related to the math of the problem, not that much about the programming, which undoubtedly could be improved.

Yes I know this is not the perfect algorithm but that's okay...I am not a math professor or a computer scientist. I am doing this for hobby and this algorithm was the only one that I could have think of.

StoneTemplePython · Aug 31, 2018

OP is still a very new programmer-- he's just quite the Project Euler enthusiast. I am one of the biggest boosters of Julia on this forum, but it's an immature language (version ##\lt 1.0##) and sometimes limited support for questions. This is quite difficult if you are new to programming. I wouldn't recommend it to OP at this time.

Arman777 said:

I saw another python solution using graph theory and finding it in seconds.

This is what I'd suggest OP focuses on. Learning more powerful algorithmic techniques as part of solving these problems... will go a long way. And yes techniques related to graph theory are in this realm. It'll be a struggle to learn these algorithms, but a very productive one.

The nice thing about Project Euler is you can put in a lot of work to get a solution and as a result you get some insights for the problem. Then you see how other people solved it, and as a result get considerably more insights.

edit:
Julia 1.0 was just released:
http://news.mit.edu/2018/mit-developed-julia-programming-language-debuts-juliacon-0827

(still too an immature a language for me to recommend to OP at this time, but I hadn't realized this and need to upgrade on my machine.)

fluidistic · Aug 31, 2018

Vanadium 50 said:

It is true that the smallest 5-plet includes a 4-plet (five of them, in fact). Is it true that the smallest 5-plet includes the smallest 4-plet? I don't see how that immediately follows.

Right, I do not know either if that follows. Since I know the brute force approach is not the way to go, there must be some "mathematical tricks" or some mathematical analysis to perform right before starting to code. I just stated how I would approach the problem myself.
If the code takes more than 2 minutes to get the solution, usually it means there are clever tricks that could have been used. Many people consider the problem unsolved if it takes more than around that time.

Vanadium 50 · Aug 31, 2018

I think some realism is in order. The OP is neither a strong programmer nor a strong mathematician, and doesn't seem to be interested in becoming either one. It seems to be more along the lines of "can I whip something together to solve this for fun", which is fine. We can point out tools and tricks, but we shouldn't make this too serious, lest we lose him.

That said, if I am wrong and he is serious, he should immediately put down the keyboard and pick up a copy of Polya's Art of Problem Solving.

Brute force doesn't seem like an intrinsically bad approach. If we're going to use it, we need to pare down the number of cases we are trying. The way I would think about it is as follows:

1. Find all the primes between 2 and 10,000 or so. Make this number changeable in case it's too small. Remove 5 from the list because we know it won't ever work.

2. Loop over all pairs (there will be about 750,000) testing to see if they form a valid doublet. Keep track of how many doublets each number forms.

Rationale: I hypothesize that numbers that belong to multiple n-plets are uncommon. Any number that participates in a 5-plet is a member of no fewer than four doublets, so I only have to look at those numbers down the road.

Optimization 1: A valid solution will have only numbers congruent to 1 mod 3 or 2 mod 3 (possibly including 3 itself) and never mix them. So we should make two lists, one starting 3, 7, 13, 19... and the other starting 3, 11, 17, 23... . This drops the number of tests in half, and will speed us up enormously when we start the brute force.

Optimization 2: Checking to see if a number is definitely prime is slow. Checking to see if a number is probably prime, e.g. by checking small divisors or any of a number of more sophisticated tests will be faster, and I don't really care if a percent or two of composites gets mixed in.

3. Now do the brute force method, using both lists, but again only within a list using numbers that participate in at least 4 doublets. Once you have candidate 5-plets, retest them using a true primality test.

Arman777 · Sep 1, 2018

Vanadium 50 said:

Optimization 1: A valid solution will have only numbers congruent to 1 mod 3 or 2 mod 3 (possibly including 3 itself) and never mix them. So we should make two lists, one starting 3, 7, 13, 19... and the other starting 3, 11, 17, 23... . This drops the number of tests in half, and will speed us up enormously when we start the brute force.

I did this and others suggestions are actually similar to what I am doing.
The first code that I shared on my first post is taking 41 min to solve the question .Now, I copied a faster prime cheker code which its taken from https://rosettacode.org/wiki/Miller–Rabin_primality_test#Python and then using the Vanadium suggestion I reduced to time down to 130±10 sec

I am not sure it can go down further then this but I am happy about it.

Here the last version

Python:

import itertools
import time
import random

start = time.perf_counter()

_mrpt_num_trials = 1

def prime(n):
    assert n >= 2
    if n == 2:
        return True
    if n % 2 == 0:
        return False
    s = 0
    d = n-1
    while True:
        quotient, remainder = divmod(d, 2)
        if remainder == 1:
            break
        s += 1
        d = quotient
    assert(2**s * d == n-1)
    def try_composite(a):
        if pow(a, d, n) == 1:
            return False
        for i in range(s):
            if pow(a, 2**i * d, n) == n-1:
                return False
        return True # n is definitely composite
    for i in range(_mrpt_num_trials):
        a = random.randrange(2, n)
        if try_composite(a):
            return False
    return Truedef con_test(j, n):  # j is the number, n is the element in the list as tuple
    if j == 0:
        w = int(str(n[0]) + str(n[1]))
        w1 = int(str(n[1]) + str(n[0]))
        if prime(w) == False or prime(w1) == False:
            return False
        return True
    else:
        for i in n:
            w = int(str(j) + str(i))
            w1 = int(str(i) + str(j))
            if prime(w) == False or prime(w1) == False:
                return False
        return True
def st(n):            #to get rid of the repeated elements
    H = [tuple(sorted(i)) for i in n]
    G = set(H)
    return GPrimes = [i for i in range(3,9000) if prime(i) == True]   #Prime range

P1 = [ p for p in Primes if p % 3 != 2]

Com_two = list(itertools.combinations(P1,2))

G2 = [i for i in Com_two if con_test(0,i) == True]

G2r = st(G2)

G3 = [(i,*j) for i in P1 for j in G2r if con_test(i,j) == True ]

G3r = st(G3)

G4 = [(i,*j) for i in P1 for j in G3r if con_test(i,j) == True ]

G4r = st(G4)

G5 = [(i,*j) for i in P1 for j in G4r if con_test(i,j) == True ]

F = [(sum(j),j) for j in G5]

end = time.perf_counter()
Time = end-start
print(Time)
print(F)

verty · Sep 1, 2018

Are you starting from the start each time you find a new prime?

Arman777 · Sep 1, 2018

verty said:

Are you starting from the start each time you find a new prime?

what do you mean ? I am finding two pair then adding third prime by applying the rules.

verty · Sep 1, 2018

Arman777 said:

what do you mean ? I am finding two pair then adding third prime by applying the rules.

Okay, I have to be honest and say I don't think it'll save you much time, but I think you are reading the primes from P1 multiple times. It could be that you want to do that, I don't know. By the way, does your code prove that you have found the group with the lowest sum?

Arman777 · Sep 1, 2018

verty said:

but I think you are reading the primes from P1 multiple times.

Before it was the "Prime" array and now its the "P1" array . When I use the P1 array I no longer need to use sum function which reduces the time 8.5 min to 2 min.

verty said:

By the way, does your code prove that you have found the group with the lowest sum?

It does not prove no, it just shows the half of it but this half gives me the answer. So I wanted to share this half. And also other half gives us nothing ( no 5 prime pair that satisfy the condition)

Vanadium 50 · Sep 2, 2018

Arman777 said:

I changed my prime code to Miller–Rabin primality test ( I didnt write the code myself I copied )

You also didn't check the code yourself.

If you simply print(len(Primes)) after finding them, you will discover it varies (!) from 1240 to 1250 or so for 10,000 primes. 1228 is the correct answer.

Arman777 · Sep 2, 2018

Vanadium 50 said:

You also didn't check the code yourself.

If you simply print(len(Primes)) after finding them, you will discover it varies (!) from 1240 to 1250 or so for 10,000 primes. 1228 is the correct answer.

I didnt understand what you mean.

Edit: Yes I did, there's a reason for that. It take random numbers and tests that accordingly to determine if the number is prime or not.

Python:

_mrpt_num_trials = 5

This part is shows us how many random number do we want. If its higher the solution gets more accurate In my code it was 1 so its normal that my code will give different results each time. I select it 1 to make it much faster and still it gives the solution.

Vanadium 50 · Sep 2, 2018

Arman777 said:

It take random numbers and tests that accordingly to determine if the number is prime or not.

No, it doesn't. If it did, it would give 1228 primes.

Arman777 said:

I select it 1 to make it much faster and still it gives the solution.

Yes, but it's not testing to see if the pairs actually are prime or not. So it's not doing either what you say it is doing or what the question asks. Yes, it gives the correct answer, but so does print("26033"). Do you consider that a solution?

Optimizing Prime Pair Combinations for Project Euler Problem 60

Attachments

Similar threads

Hot Threads

Recent Insights