The worst case for trial division factorisation of n?

Phillips101 · Dec 15, 2009

What's the worst case for the factorisation of n using trial division? Worst case in terms of arithmetic operations that is.

Many places tell me that it's n=pq with p and q prime and close to each other (and hence close to root(n) ), but I can't prove it.

Help would be appreciated.

CRGreathouse · Dec 15, 2009

Assuming you test for divisibility by 2, 3, 5, 7, 11, ... or 2, 3, 4, 5, 6, ... or something similar (wheel mod 30?), the runtime is determined by size of the second-largest prime factor. Thus the longest runtime is when the number has two prime factors and they're very close -- perhaps even equal. Of course a real test might check for powers first.

robert Ihnot · Dec 15, 2009

Obviously if the two numbers are close then starting with small factors will consume a lot of time, since the answer is near the square root of the product, as CRGreathouse explains.

Fermat developed a method to deal with close prime factors say pq. He looks at the difference of squares
n^2-m^2 =pq.

He picks a larger number near the square root of pq and then subtracts off pq. Thus we have n^2-pq = ? He continues in this way always looking for a square difference. And once he finds it, he has solved the original equation.

In the case of 97x103 = 9991, The first choice is 100^2. 100^2-9991 = 9. So having two squares we have found the factors (100+3)(100-3) = 9991.

Phillips101 · Dec 15, 2009

CRGreathouse said:

Assuming you test for divisibility by 2, 3, 5, 7, 11, ... or 2, 3, 4, 5, 6, ... or something similar (wheel mod 30?), the runtime is determined by size of the second-largest prime factor. Thus the longest runtime is when the number has two prime factors and they're very close -- perhaps even equal. Of course a real test might check for powers first.

What do you mean by 'the run time is determined by'? Do you mean that the factor it would take the longest to find would be the second largest? And if so, why exactly does this imply having only two factors would be quickest? I don't intuitively see why counting up to 17 5000 times will be quicker than counting to a prime approximately equal to 17^5000. (In terms of purely arithmetic operations - obviously there are practical considerations but I'm ignoring them).

robert Ihnot said:

Obviously if the two numbers are close then starting with small factors will consume a lot of time, since the answer is near the square root of the product, as CRGreathouse explains.

Fermat developed a method to deal with close prime factors say pq. He looks at the difference of squares
n^2-m^2 =pq.

He picks a larger number near the square root of pq and then subtracts off pq. Thus we have n^2-pq = ? He continues in this way always looking for a square difference. And once he finds it, he has solved the original equation.

In the case of 97x103 = 9991, The first choice is 100^2. 100^2-9991 = 9. So having two squares we have found the factors (100+3)(100-3) = 9991.

I've come across this in a couple of places now. I think I'll do something on this for comparison purposes.

CRGreathouse · Dec 15, 2009

Phillips101 said:

What do you mean by 'the run time is determined by'? Do you mean that the factor it would take the longest to find would be the second largest?

The second-largest factor would be the last to find, but what I meant was exactly what I wrote: the runtime is a function of the second-largest prime factor S. If you're sieving with all numbers (or odds or numbers relatively prime to k for some k), numbers tested is proportional to S; if with primes, it's proportional to S/log S.* In both cases time needed for each division is roughly a function of S -- S^2 for basecase or better for more advanced methods.

Phillips101 said:

And if so, why exactly does this imply having only two factors would be quickest?

I would hope you're able to figure that out! Consider the case of N = pqr and N' = p'q' for p < q < r and p' < q' prime, where N and N' are of similar sizes.* Literally, this requires oracle access to a primality test. But primality tests are generally cheap. If you're working with small numbers, you may have extra cost beyond finding the second-smallest factor, but usually not too much.

CRGreathouse · Dec 15, 2009

It's worth mentioning that there are numbers that are bad for trial division *and* Fermat. If p is about 2q, p and q prime, then you need something like q (or q/log q) steps to factor pq with trial division, and about q steps with Fermat. This is because the distance from p to q is about as far as the distance from q to 1, even though p is "only" bigger by one bit.

Phillips101 · Dec 15, 2009

CRGreathouse said:

The second-largest factor would be the last to find, but what I meant was exactly what I wrote: the runtime is a function of the second-largest prime factor S. If you're sieving with all numbers (or odds or numbers relatively prime to k for some k), numbers tested is proportional to S; if with primes, it's proportional to S/log S.* In both cases time needed for each division is roughly a function of S -- S^2 for basecase or better for more advanced methods.

I'm probably coming across as quite dim here, but I still don't see how this answers my question.

You have a number N. You try dividing it by every number less than sqrt(N) until you get a hit. If no hits, then it's prime. If you get a hit, redefine N as N/hit, and go again. This way, you have roughly p(1)+p(2)+p(3)+...+p(k-1) +sqrt(p(k)) trial divisions, where the p(i) are all the prime factors of N (and in ascending order), and the sqrt(p(k)) is added onto the end because you still have to make sure that p(k) is actually prime.

I don't see how the 'number of numbers tested', ie p(1)+p(2)+p(3)+...+p(k-1) +sqrt(p(k)) or your runtime function, is proportional to p(k-1)? If you can explain that, then I agree that it implies to get the most iterations needed, the runtime function need be as large as possible so n=pq with p&q close!

EDIT: Unless of course you don't reset your count every time, and say - 7 is a factor. Is 7 a factor of n/7? And then continue counting upwards afterwards. In this case, you would get a prime factor, and redefine your stopping point at sqrt(n/prime factor). So, imagine you have all but one prime factors, say p(k), but you need to verify that it's prime. To do this, you would count upwards from p(k-1) to sqrt(p(k)), testing each number. Of course, p(k) is prime so you will indeed count all the way up to sqrt(p(k)). If however you have already passed it, you will stop. This happens if p(k-1)>sqrt(p(k)) => p(k)<p(k-1)^2.

So, in this case, to give us as many things to try as possible, we should pick p(k) such that it is greater than p(k-1)^2 ! But, in this case, we have that p(k)<<n => sqrt(p(k))<<sqrt(n), whereas if p(k-1) is near sqrt(n) (say well within a factor of 2 so p(k-1) is much closer to sqrt(n) than sqrt(p(k)) was in the previous case), then the number of things tried will be near sqrt(n), and hence greater than if p(k) was large. Hence p(k-1)~=p(k)~=sqrt(n). And as a consequence of this, we need there to be no other prime factors.

Is my very warped and probably not very legible logic correct?

(of course, if you had n as a prime number, then it would take sqrt(n) divisions anyway - but I suppose I'm ignoring this case).

CRGreathouse · Dec 15, 2009

Phillips101 said:

I'm probably coming across as quite dim here, but I still don't see how this answers my question.

You have a number N. You try dividing it by every number less than sqrt(N) until you get a hit. If no hits, then it's prime. If you get a hit, redefine N as N/hit, and go again. This way, you have roughly p(1)+p(2)+p(3)+...+p(k-1) +sqrt(p(k)) trial divisions, where the p(i) are all the prime factors of N (and in ascending order), and the sqrt(p(k)) is added onto the end because you still have to make sure that p(k) is actually prime.

Suppose you start with a number around 10^10. You start looking for factors, and you find one around 10^2, leaving you with a cofactor around 10^8. Now you search up to about 10^4 and then stop, having hit the square root of the cofactor. If there hadn't been a factor around 10^2, you would have searched up to 10^5 -- ten times higher.

ramsey2879 · Dec 17, 2009

CRGreathouse said:

Suppose you start with a number around 10^10. You start looking for factors, and you find one around 10^2, leaving you with a cofactor around 10^8. Now you search up to about 10^4 and then stop, having hit the square root of the cofactor. If there hadn't been a factor around 10^2, you would have searched up to 10^5 -- ten times higher.

I believe this is one reason why you would search for smaller factors first. But doesn,t the Fermat search (looking for the difference of two squares) find the factors closest to the square root first?

CRGreathouse · Dec 17, 2009

ramsey2879 said:

But doesn,t the Fermat search (looking for the difference of two squares) find the factors closest to the square root first?

Yes. But that's a bad way to search -- you're much more likely to find a small factor than a large factor. If your number is around 10^30 and you test a billion numbers around the square root, you have a 1-in-a-million chance to find a factor. In the first billion, the chance is 97.3%.

Of course there are improvements to Fermat's method that vastly improve it.

The worst case for trial division factorisation of n?

1. What is trial division factorisation and how does it work?

2. What is the worst case scenario for trial division factorisation?

3. How does the efficiency of trial division factorisation compare to other factorisation methods?

4. Can trial division factorisation be improved in any way?

5. Are there any real-world applications of trial division factorisation?

Similar threads

Hot Threads

Recent Insights