A Factorization Algorithm

  • Thread starter Playdo
  • Start date
  • #1
Playdo
88
0
Consider the following true statement. Given natural numbers n, m, s

(1) [tex]\sum_{i=0}^n a^i = \sum_{i=0}^s a^{(m+1)i} \sum_{i=0}^m a^i[/tex]


iff

(2) [tex]n+1=(m+1)(s+1)[/tex]

Now suppose that [tex]N[/tex] is a natural number where

(3) [tex]N = \sum_{i=0}^n a^i [/tex]

Now restrict [tex]a[/tex] to the natural numbers. Clearly for certain special numbers N we have a one to one map of factor pairs of [tex]N[/tex] and factor pairs of [tex]n+1[/tex] which is much smaller.

In order for this thereom to give non-trivial factorizations [tex]n+1[/tex] must not be prime. If this is true and (3) holds then [tex]N[/tex] can be factored easily.

If one thinks carefully about the algorithm for polynomial multiplication you can see that numbers defined by (3) above are quite special with respect to the base [tex]a[/tex].

In order to grasp the full implications of this theorem we need to think about two common algorithms.

1) The first is the base [tex]a[/tex] expansion algorithm used to write a natural number as a sum of powers of a. For example the number [tex]14[/tex] can be written as a sum of powers of [tex]3[/tex] as [tex]14 = 3^2+3+2[/tex]. In general this algorithm can be used to write

(4) [tex] N_a = c_k a^k + c_{k-1} a^{k-1} + ... + c_1 a + c_0[/tex]

Where some of the [tex]c_j[/tex] may be zero and all are less than [tex]a[/tex]. I will denote this algorithm by the capital letter [tex]\beta[/tex] so that [tex]\beta (N,a)[/tex] is equivalent to (4) above.

2) The second algorithm is polynomial multiplication. Now this algorithm proceeds by steps. We first multiply every term by every other term, then we collect terms. Denote this by [tex]P[/tex]. The difference between the result of this algorithm and the base [tex]a[/tex] expansion algorithm is best understood by looking closely at a third related algorithm.

3) The multiplication algorithm for natural numbers is really equivalent to polynomial multiplication taken one step further by carrying so that none of the digits are greater than the base and then evaluation of the resulting polynomial at the base value [tex]a[/tex]. Denote this carry step by [tex]C[/tex].

It follows that if [tex]N = ST[/tex] then [tex]\beta (N,a) = C(P(\beta (S,a),\beta (T,a)))[/tex] and under certain special conditions [tex]\beta(N,a)=P(\beta(S,a),\beta(T,a))[/tex]. One of these special conditions is the thereom stated above. This happens because of two things. First all coefficients of the summation terms on the right are equal to one in the base and the powers that do appear do not cause any duplication of powers, thus there is no carrying.

In the posts that follow I will try to highlight the implications of this. One of the first is that though it would appear that we are looking at a very restricted class of factorizations, in reality a shift operator can often be applied to [tex]\beta(N,a)[/tex] to arrive at a new base where coefficients are all one and the theorem applies. This is typically possible when [tex]\beta(N,a)[/tex] is not the result of carrying, or more formally there exists a pair of factors of [tex]N[/tex] such that the base [tex]a[/tex] expansion and subsequent multiplication produces a polynomial in [tex]a[/tex] with precisely the same coefficients as the base [tex]a[/tex] expansion of [tex]N[/tex].

Clearly this all plays around in an uncomfortable way with our notions of variables and constants. However if you look at it carefully you will see that there is merit here. Furthermore there exist shift operators that can transform certain base [tex]a[/tex] expansions into the form given in (1) above. I will talk about those next.
 
Last edited:

Answers and Replies

  • #2
Playdo
88
0
Shift Operators

Given natural numbers n, m, s

(1) [tex]\sum_{i=0}^n a^i = \sum_{i=0}^s a^{(m+1)i} \sum_{i=0}^m a^i[/tex]


iff

(2) [tex]n+1=(m+1)(s+1)[/tex]

Now suppose we look let [tex]n=3[/tex] then [tex]n+1=4[/tex] the only non-trivial solution of (2) above is [tex]m=s=1[/tex]. So (1) above becomes

(3) [tex]\sum_{i=0}^3 a^i = \sum_{i=0}^1 a^{2i} \sum_{i=0}^1 a^i[/tex]

The expansion gives.

(4) [tex]a^3+a^2+a+1 = (a^2+1)(a+1)[/tex]

Thinking of the left hand side as a polynomial in [tex]a[/tex], for every natural number value of [tex]a[/tex] there is a corresponding natural number [tex]N[/tex] which is easily factored using (4) above. The following table lists the first seven numbers and there factors.

[tex]\begin{array}{ c|c|c|c |} a&N&S&T\\
\hline0&1&1&1\\
\hline1&4&2&2\\
\hline2&15&5&3\\
\hline3&40&10&4\\
\hline4&85&17&5\\
\hline5&156&26&6\\
\hline6&259&37&7\\
\hline7&400&50&8\\
\end{array}[/tex]

The easiest related polynomial can be found using the values in the fourth column. Notice that the second differences of these values are all equal to [tex]2[/tex]. So if we wish a relationship where [tex]k=a-1[/tex] we must solve the system

[tex]f(k)=\int\int C_0 dk[/tex]
[tex]f(0)=2[/tex]
[tex]f(6)=50[/tex]

[Note: the choice of initial conditions here is somewhat arbitrary, we only require that the resulting coefficients be greater than or equal to zero.]

Solving this system is simple enough and yields

[tex]f(k)=k^2+C_1k+C_0[/tex]
[tex]f(0)=2[/tex]
[tex]f(6)=50[/tex]

[tex]C_0 = 2[/tex]
[tex]C_1=2[/tex]

Working this back into (4) above gives

(5) [tex]k^3+4k^2+6k+4=(k^2+2k+2)(k+2)[/tex]

Thinking about the base [tex]k[/tex] expansion algorithm suppose we have

(6) [tex]\beta(N,k)=k^3+4k^2+6k+4[/tex]

Then so long as [tex]k[/tex] is one of those special natural numbers for which [tex]\beta(N,k)=P(\beta(S,k),\beta(T,k))[/tex] for some factor pair of [tex]N[/tex] we can find a translation of the polynomial (5) which transforms it into the form (4). There is a vector space approach to this but we will not look at it now. In general we want to look at translations of (5) where [tex]k=a-s[/tex] for some integer [tex]s[/tex]. It should be clear that we can find desirable values for [tex]s[/tex] by solving [tex](a-s)^3+4(a-s)^2+6(a-s)+4=1[/tex] when [tex]a=0[/tex] that is

(7) [tex]-s^3+4s^2-6s+3=0[/tex]

I used Excel to see that [tex]s=1[/tex] is the solution but Newton's Method is a good first start in general.

So what does this suggest? Firstly there are deep and magnificent connections between the multiplicative structure of the natural numbers and that of the polynomials taken over the natural numbers. Important questions remain to be answered. I cannot claim to be able to answer them and perhaps some higher level of abstraction has already done so. One question that is important is this. Given a natural number [tex]N[/tex] does there always exist a base [tex]a[/tex] such that at least one factor pair of [tex]N[/tex] both have base [tex]a[/tex] expansions which multiply without the need to carry? If this is true then the ideas laid out here can factor any number. A second important question here is how fast are the algorithms determined by this construct. Clearly for (1) above we have the luxury of factoring a much smaller number than the one we started with. But in cases where we do not start with all of the coefficients equal to one, how long will it take us to find solutions to the translation equation (7)? Finally, if there exist composite natural numbers for which all factor pairs result in carrying for all possible bases is there an extension of the ideas here that describes efficient algorithms for undoing the carrying process? I will try to answer some of these questions as time goes on, right now I am hoping to find my first real job and looking for it is taking up most of my time.

Cheers

LAC
 
Last edited:
  • #3
Playdo
88
0
A Reducibility Theorem

For every composite natural number [tex]N[/tex] there must be a reducible polynomial [tex]p(x)[/tex] over the natural numbers and a natural number [tex]a[/tex] for which [tex]N=p(a)[/tex] and at least one factor pair [tex]s[/tex] and [tex]t[/tex] of [tex]p[/tex] evaluated at [tex]a[/tex] satisfy [tex]N=p(a)=s(a)t(a)[/tex].

Proof

Let [tex]\beta[/tex] be the base [tex]a[/tex] expansion algorithm, [tex]P[/tex] be polynomial multiplication, and [tex]C[/tex] be the carrying algorithm that is applied in standard multiplication of natural numbers over a base.

Since [tex]N[/tex] is composite it has at least one non-trivial factor pair. It can be shown that [tex]\beta (N,a) = C(P(\beta(S,a),\beta(T,a)))[/tex] in all cases. Therefore [tex]C^{-1}(\beta(N,a))[/tex] is a subset of the polynomials over the natural numbers evaluated at [tex]a[/tex] and contains only reducible polynomials. Clearly [tex]P(\beta(S,a),\beta(T,a))[/tex] must be an element of this set.
 
Last edited:
  • #4
Playdo
88
0
This last thereom is somewhat deceptive because the carrying algorithm [tex]C[/tex] actually has many steps and possible paths so that there is actually a graph that must be searched. Although we generally proceed by carrying from smallest power to largest, when we start with the base [tex]a[/tex] expansion of [tex]N[/tex] we do not know which of the many possible reductions corresponds to the original polynomial multiplication coefficients in general. The point of (1) in the first post is that it guarantees that no carrying happens at all so [tex]C[/tex] is and identify operator under those circumstances. The complexity of [tex]C[/tex] and [tex]C_{-1}[/tex] are both important generally. We can assume that a base [tex]a[/tex] expansion of any factor pair starts us off with cofficients that are less than [tex]a[/tex]. Consider the elementary example

[tex](b_1a+b_0)(c_1a+c_0)=b_1c_1a^2+(c_1b_0+b_1c_0)a+b_0c_0[/tex]

If we have [tex]\beta(N,a)=C_2a^2+C_1a+C_0[/tex] equating coefficients gives the system

[tex]C_2=b_1c_1[/tex]
[tex]C_1=c_1b_0+b_1c_0[/tex]
[tex]C_0=c_0b_0[/tex]

This system is indeterminate having three equations and four unknowns. However if [tex]C_0=1[/tex] then by the rules of the game the system becomes

[tex]C_2=b_1c_1[/tex]
[tex]C_1=c_1+b_1[/tex]

And is quite easily solvable. The shift operator sa mention above does precisely this. Also if [tex]C_0[/tex] is confirmed to be prime the system is easily solvable.

But suppose that

[tex]\beta(N,a)=C_3a^3+C_1a+C_0[/tex]

Now the carrying algorithm must be inverted and the system that must be solved looks like

[tex]C_3=r_2[/tex]
[tex]C_2=b_1c_1+r_1-ar_2[/tex]
[tex]C_1=c_1b_0+b_1c_0+r_0-ar_1[/tex]
[tex]C_0=c_0b_0-ar_0[/tex]

Really three equations in five unknowns leaving a very uncomfortable two dimensional search ahead of us. Some advance can be made by noting that under the rules of the game [tex]c_0b_0 \leq (a-1)^2[/tex] so that [tex]0 \leq r_0 \leq a-1[/tex]. Furthermore chosing [tex]a[/tex] relatively prime to [tex]N[/tex] offers some advantages if the reduced residue system of [tex]a[/tex] is easily obtainable. Coming from the other side factoring [tex]C_2+ar_2[/tex] can also break the forutne cookie wide open so to speak. These arguments are know by every high school pre-cal student. The more interesting set of questions centers around particular natural numbers.
 
Last edited:
  • #5
Playdo
88
0
The Power of One

The preceding arguments show that the number one is our ally when it appears in certain places in an expansion. We like to see the number one appear in the leading and/or the trailing coefficient. It is also good see primes appear in the leading and trailing coefficients. So when does this happen? The trailing coefficient is actually calculable as [tex]N modulo a[/tex]. We need to solve the equation [tex] N \equiv 1 mod a[/tex].

Clearly [tex] N \equiv 1 mod N-1[/tex] but no meaningful factors of [tex]N[/tex] can be expanded in this base. We need to find bases less than the square root of [tex]N[/tex]. But look suppose [tex]a_1[/tex] is such a base then [tex]N=a_1x+1[/tex] and is is quite clear that [tex]a_1|N-1[/tex]. So perhaps we should factor [tex]N-1[/tex]? Assuming [tex]N[/tex] was odd we now have to find [tex]a_2|(N-1)/2 -1[/tex]. In this fashion we can reduce the number to be factored until it is clearly trivial.

Example: Factor 201

I first try to factor 200 by removing powers of two to obtain [tex]200=2^825[/tex]. now supposing that twenty-five is still too large I try to factor 24 by removing powers of two and obtain [tex]24=2^33[/tex]. So now I can write twenty-five in terms of any of the cators of twenty-four as [tex]{2*12+1, 4*6+1, 6*4+1, 8*3+1, 12*2+1}[/tex]. Because I want all factor pairs of 25 to be grater than whatever base I choose I consider those significantly less than the square root of twenty-five. So I set up the equation

(1) [tex]25 = 2*12+1= 2z+1 = (2x+1)(2y+1)[/tex]

This can be simplified to obtain

(2) [tex] 12 =2xy+x+y[/tex]

(3) [tex] y = \frac {12-x} {2x+1}[/tex]

Starting with [tex]x=1[/tex] and working upward I find that [tex]x=2[/tex] gives a whole number solution and twenty five is factored as five times five.

So the factorization of 200 is [tex]200=2^35^2[/tex]. Applying this to the factorization of 201 I note that the square root of 201 is smaller than 15 but larger than 14. Set up the equation

(4) [tex] 201 = 2*100+1 = 2z+1= (2x+1)(2y+1)[/tex]

(5) [tex]y = \frac {100-x} {2x+1}[/tex]

Luckily for us w find that [tex]x=1[/tex] if a whole number solution. And 201 = 3*67. If we had gotten greedy and skipped the base two expansion we would have missed the factor three, or would we?

Suppose w had chosen the base five, giving

(6) [tex] 201 = 5*40+1 = 5z+1= (5x+s)(5y+t)[/tex]

Now there are several pairs (s,t) such that st is congruent to one mod five (four pairs to be precise) they are {(4,4),(2,3),(3,2) (1,1)}. Since x and y remain free until later we can consider (2,3) and (3,2) to be the same choice. So picking at random we have only one third chance of getting it right in which case we also need to start with x=0 or we miss the boat all together.

These difficulties suggest that we want to use bases that take care of the possible small prime factors right off or else simply divide by the first so many small primes. This is usually recommended even by fellows like Pollard.

How is this related to the formulation that started the thread?

Consider the following true statement. Given natural numbers n, m, s

(1) [tex]\sum_{i=0}^n a^i = \sum_{i=0}^s a^{(m+1)i} \sum_{i=0}^m a^i[/tex]


iff

(2) [tex]n+1=(m+1)(s+1)[/tex]

If you can, extend this summation to cover cases where

[tex]\sum a^kb^i = \sum a^i \sum b^k[/tex]

Remember the solution must resolve to (1) above when a=b. Good luck!
 
Last edited:
  • #6
Playdo
88
0
Of course it's clear that

[tex]\sum_{i=0}^n (ab)^i = \sum_{i=0}^s (ab)^{(m+1)i} \sum_{i=0}^m (ab)^i[/tex]

And it follows that

[tex]\sum_{i=0}^n a^ib^i = \sum_{i=0}^s a^{(m+1)i}b^{(m+1)i} \sum_{i=0}^m a^ib^i[/tex]

Implying that both a and b divide

[tex]\sum_{i=1}^n a^ib^i = \sum_{i=1}^s a^{(m+1)i}b^{(m+1)i} \sum_{i=1}^m a^ib^i[/tex]

So that ab must be factor of

[tex]\sum_{i=0}^{n-1} a^ib^i[/tex]

If it is still true that n-1+1 is composite then we can write

[tex]\sum_{i=1}^{n-1} a^ib^i = \sum_{i=1}^{s_1} a^{(m_1+1)i}b^{(m_1+1)i} \sum_{i=1}^{m_1} a^ib^i[/tex]

iff

[tex]n =(m_1+1)(s_1+1)[/tex]

this reduction can continue to happen step by step as long as the necessary conditions exist. These things are very easy to see but I am looking for something more complicated I think. I could be wrong. That might seem a waste of time but it says we can build factorizations of larger numbers from factorizations of smaller numbers using (1) above so long as the upper limit of summation n and n+1 are both composite. This does not mean much for very small n but when n becomes larger the number of primes in any randomly chosen interval is much reduced.
 
Last edited:
  • #7
Playdo
88
0
"Now there are several pairs (s,t) such that st is congruent to one mod five (four pairs to be precise) they are {(4,4),(2,3),(3,2) (1,1)}. Since x and y remain free until later we can consider (2,3) and (3,2) to be the same choice. So picking at random we have only one third chance of getting it right in which case we also need to start with x=0 or we miss the boat all together.

These difficulties suggest that we want to use bases that take care of the possible small prime factors right off or else simply divide by the first so many small primes. This is usually recommended even by fellows like Pollard."

The sequence of bases that is formed as follows has special properties with regard to size of reduced residue systems locally in the natural numbers.

(1a) [tex]\{\Pi_{i=1}^k p_i | k = 1 ... L\}[/tex]

Here L is determined by the natural number N we wish to factor. This sequence has very special properties. Firstly the bases [tex]a[/tex] are products of the first k primes, so by first confirming that N and a are relatively prime we have cleared out those small factors. If it turns out other wise (the Euclidean Algorithm is fast by the way) we simply remove the common factor and start again.
Because the reduced residue system is closed under multiplication modulo a we can factor the trailing coefficient of any base a coefficient in a finite number of steps simply by looking at the tables for that base. Consider base 30, where thirty is the product of 2,3 and 5. Suppose N is congruent to 1 mod 30 then any two factor pairs of N must be congruent to one of the following ordered pairs modulo thirty.

(2a) {(1,1),(7,13),(11,11),(17,23),(19,19),(29,29)}

There would be eight except symmetry reduces this slightly. The size of the reduced residue system for elements of (1a) above is simply to calculate as Euler's totient

(3a) [tex]\phi(p_kp_{k-1}...3*2)=\Pi_{i=1}^k (p_i-1)[/tex]

Furthermore one can look at the set (2a) as the kernel of the map

(4a) [tex]mod 30 :rrs_{30} \otimes rrs_{30} \Rightarrow rrs_{30}[/tex]

Interestingly we only need to store the kernel of this map in it's full form ordered by first entries as

(5a) {(1,1),(7,13),(11,11),(13,7),(17,23),(19,19),(23,17),(29,29)}

Because from this we can obtain the relevant factorization of any reduced residue modulo thirty by simply multiplying the first element of each ordered pair by the residue. Suppose rather than being congruent to one modulo thirty our natural number was congruent to seven mod thirty. Then we obtain the possible factorizations as mentioned

(6a) {(7,1),(19,13),(17,11),(1,7),(29,23),(13,19),(11,17),(23,29)}

Reordering and reducing by symmetry we get

(7a) {(1,7),(13,19),(11,17),(23,29)}

These coorespond to factorization equations for example as

(8a) [tex]N=30z+7=(30x+13)(30y+19)[/tex]

The search area here is calculable so the number of steps in the whole algorithm can be accurately estimated.

Now the sequence in (3a) is also special because of the relationship between reduced residue systems of consecutive elements. Consider 6=2*3 and 30=2*3*5. It is clear that every element of the rrs mod thirty is relatively prime to six. So we can generate the rrs mod thirty from the rrs mod 6 as follows. Simply produce the table

[tex]\begin{array}{ 1|c|c|c|c |} x/r&1&2&3&4\\
\hline1&7&13&19&25\\
\hline5&11&17&23&29\\
\end{array}[/tex]

The elements are calculated as 6x+r. Clearly 25 is not in rrs mod thirty but we can quickly remove this in this case because it is not relatively prime to 5 which will be a factor of the new base (30). One must remain so we have quickly obtained the set rrs mod 30 ={1,7,11,13,17,19,23,29}. In general the elements of the array that must be removed can actually be generated from the rrs of the smaller base by simply multiplying the rrs by the new prime factor under standard multiplication. In the above example we want

(9a) [tex]5\{1, 5\} = \{5,25\}[/tex]

So in general using sequence (1a)

(10a) [tex]rrs_{k+1}=\{x \Pi_{i=1}^k p_i+rrs_k |x = 1...p_{k+1}-1\} - \{p_{k+1}rrs_k\}[/tex]

These arguments define a class of sieves that can approach the efficiency one would get if we knew all primes less than the square root of a natural number. The beauty of course is we do not need to know those primes. So we trade some of the work necessary to find those primes for a task we can do in a knowable number of steps.

Some theorems that can be proved as excercises.

1) Show that [tex]\phi(p_kp_{k-1}...3*2)<\phi(n) \forall n > p_kp_{k-1}...3*2[/tex]

2) Show that if k is a square free natural number [tex]\phi(k)\leq\phi(n) \forall n > k[/tex]

Notice that there are infinitely many 'ladders' of square free natural numbers that can be used to build sieves just as the sequence (1a) was used and that these sieves are 'locally' minimal as a choice of base for factoring natural numbers by these methods.

The final step in this whole process is finding a way to put the approach looked at in the earlier posts of this thread and this last approach into one comprehensive theory.
 
Last edited:
  • #8
AKG
Science Advisor
Homework Helper
2,566
4
Playdo said:
Clearly for certain special numbers N we have a one to one map of factor pairs of N and factor pairs of n+1 which is much smaller.
Why is that? If N is of the form 1 + ... + an, then how do we know all factor pairs of N contain one factor of the form 1 + ... + am? In other words, why is it clear that if a number has a base-a expansion 111...111 that any factor pair of that number has at least one of the factors also being in the form 111...111 (but with fewer 1's normally, of course)?

Also, for your interest, [itex]\beta[/itex] is not a capital letter. Capital Beta looks just like the captial roman B. When using LaTeX, if you write a greek letter in lower case, like so \pi, then you'll get the lower case [itex]\pi[/itex] and if you write it like \Pi then you'll get [itex]\Pi[/itex]. Also, when using LaTeX in the same line as your text, use "itex" tags instead of "tex". So you'd write [ itex ]\phi ^2 - \phi - 1 = 0[ /itex ] to have the markup right in line [itex]\phi ^2 - \phi - 1 = 0[/itex]. To have it bigger and on a separate line:

[ tex ]\phi ^2 - \phi - 1 = 0[ /tex ]

[tex]\phi ^2 - \phi - 1 = 0[/tex]
Playdo said:
For every composite natural number [tex]N[/tex] there must be a reducible polynomial [tex]p(x)[/tex] over the natural numbers and a natural number [tex]a[/tex] for which [tex]N=p(a)[/tex] and at least one factor pair [tex]s[/tex] and [tex]t[/tex] of [tex]p[/tex] evaluated at [tex]a[/tex] satisfy [tex]N=p(a)=s(a)t(a)[/tex].
The last part, that at least one factor pair s and t satisfies N = p(a) = s(a)t(a) seems redundant from the fact that p is reducible and N is composite. I don't quite understand the point of your proof, the result seems trivial.

Let N be composite, say N = nm for n, m > 1. If you define 0 to be a natural, let a be any natural less than or equal to min{n,m}. Otherwise, let a be any natural strictly less than min{n,m}. Let p(x) = (x + (n-a))(x + (m-a)).

I'm in a rush right now, but I'll look at posts 2, 4, and 5 later.
 
  • #9
Playdo
88
0
Thanks AKG for the Latex help. The thereom itself in the first post is correct. Whenever the conditions are met some subset of factor pairs of [itex]N[/itex] is mapped in a one to one and onto the full set of factor pairs of [itex]n+1[/itex]. I realize there is formatting that needs to be done to these posts. I will get to it as time goes on. But I do see your response being similar to others I have seen about the first theorem. It is not at all the way number theory teaches us to think about factorization. But in fact for special [itex]N[/itex] it accomplishes the desirable task of lettings us try to factor something much smaller. The reducibility theorem does not say that every composite [itex]N[/itex] can be put into a form satisfying the first theorem. Rather, is simply says we can know that there must exist a base for which the expansion when thought of as a polynomial in a is reducible. You'll see what I mean if you look at all the posts together, but I warn you I have never seen anyone take the time so far to think through these things carefully. The whole project is much more intricate than it first appears. Later I will start doing posts where I use these ideas to shed light on results in elementary number theory. It gets very interesting. You see until now numbe theory is mostly treated as collection of almost unrelated theories. Very few books try to identify unifying principles.

Oh here are some important and little understood questions. The distribution of primes is known to satisfy the Prime Number Theorem, this is a limit theorem. For any finite [itex]n[/itex], [itex]\pi(n)[/tex] can only be approximated at best by the real valued functions currently used. In reality the number of primes seems to jump around [itex]\frac {x} {log x}[/itex]. Distances between primes also seem to be randomly distributed but generally beomce larger as we go further out. The facts are that the distribution of primes is formed in complement with the multiplication tables of natural numbers, the are a discrete phenomenon that cannot be precisely modeled using anything save perhaps a Fourier series of some type. I know everyone is looking at Reimann and it is nice, but trying to stay within the descrete nature of the problem as much as possible is the avenue I am pursuing.

Are you a person who can visualize things? Some math folks think in euqations. But visualize this.

Every prime is the seed of a wave form that crosses the positive axis (n) only when p divides n in the strict natural number sense. Now if n=30 we are looking at the place on the positive axis where the wave forms of 2 3 and 5 are all zero at the same time. There is a purpose to what I am saying here. The prime numbers can only exist in wholes in the pattern. that is once a prime is discovered think of it as this wave form. Every place it touches the axis no prime can exist there. As we go further out more and more primes are discovered each with a unique period. These periodic waves 'punch wholes' in the number line every place n is composite. The places where no wave form touched the axis are prime. Amazingly we can reach forward a little bit so that we can actually find larger primes by knowing primes.

Let me see if I can write this function down (Lagrange did something similar to this I think).

(A) [tex]\Psi(n)= K \Pi_{i=1}^{k| p_k<\sqrt n} \sin^2\frac {n\pi}{p_i}[/tex]

Now clearly only those terms formed by prime divisors of n will be zero. K is a scaling factor. The function is positive definite. If you pick the set of primes ahead of time then you see that for instance if we use 2 3 and 5

(B) [tex]\Psi_{30}(n)=K\sin^2\frac {n\pi}{2} \sin^2\frac {n\pi}{3} \sin^2\frac {n\pi}{5}[/tex]

[Note: Plot this for natural numbers n using Excel and notice the pattern that arises.]

If you plot (B) for n between 7 and 49, every non-zero result is prime (Why is this true? it is true, but why?). if we add a term for eleven and plot for n between 11 and 121 same thing every non-zero value corresponds to prime n. We can do this but it saves us no time checking each one. What we want to do is start at seven, then use it to find every prime up to 49, now add terms for each of these. Now the interval of interst is 47 to 47^2 etc. you see that if we are good at this it gets faster. Still we are talking about checking every number apparently. However in a future post I will show how to use the modular concepts mentioned above to speed this puppy up massively, to aobtain a 'small' set of candidates fomr the set {p_k, ..., p_k^2} and to determine just how many numbers we need to look at to factor a number N.

Here is how it works, for a number with only two large prime divisors, the single divisor that is less than the square root of N is relatively speaking quite close to root N. The class of functions defined in (A) above will require that I know all of the primes up to the fourth root of N or the root of root N. For your typical RSA numbers N may have 1000 digits. Root N will have something like 500 ish digits, and root root N will have about 250 digits. So to even try to attack an RSA we would need to know all the primes up to say 10^250, write out the rrs modulo the product of these primes in the neighbor hood of root N. (Of course this is impossible.) Then once we have applied the modular sieve, we test each of those numbers using the euclidean algorithm compared to reasonable sized products of those primes. We might even like to categorize intervals near root N using some of these probability methods which tell us likelihoods of a factor in those areas etc. In the posts that follow I will develop this algorithm. The beauty of it of course is that we can give upper bounds on time for the algorithm to complete. I am not at all claiming it is polynomial time but all that will have to be looked at. Using a distributed programming model over the internet I think time wise we might be quite surprised.

NOTE: This thread is work in progress, I am using it as a scratchpad to investigate how a suite of observations and ideas I have had over the last ten years might relate to one another to produce factorization algorithms, and the usefulness of those algorithms. The over arching goal is to be true to the traditional confines of elementary number theory as much as possible, this is an ethic I am holding myself to mostly because I do not believe that the methods have been fully appreciated or developed in our mad rush to conquer the universe using ever more powerful forms of abstraction and functional reasoning.
 
Last edited:
  • #10
Hurkyl
Staff Emeritus
Science Advisor
Gold Member
14,967
19
My attention span isn't what it's used to be. It's hard for me to follow your posts, Playdo, without some sort of summary or outline that describes the plot. :frown:

P.S. if we write N=PQ in base P, then we don't have to do any carrying when we multiply P and Q.
 
  • #11
Hurkyl
Staff Emeritus
Science Advisor
Gold Member
14,967
19
Somehow, I missed your latest post when I wrote my previous reply!


If you're interested in the general problem if finding all the primes up to a given size, you might consider looking for info on wheel sieves -- I think they're the current state of the art.


If, to factor N, your method requires you to know all of the primes up to N^(1/4), then I'm sorry to say that in its current form, it is not useful for factoring a very large number. Any of the sieving algorithms will have factored N in much less time than it takes to simply write down that list of primes... let alone compute it! (Of course, I have that sneaky feeling that I've just delivered a set-up line! :smile:)
 
Last edited:
  • #12
shmoe
Science Advisor
Homework Helper
1,994
1
A few random comments:

Playdo said:
Implying that both a and b divide

[tex]\sum_{i=1}^n a^ib^i = \sum_{i=1}^s a^{(m+1)i}b^{(m+1)i} \sum_{i=1}^m a^ib^i[/tex]

This is not an equality.

Playdo said:
So that ab must be factor of

[tex]\sum_{i=0}^{n-1} a^ib^i[/tex]

only if ab divides 1.

Playdo said:
These arguments define a class of sieves that can approach the efficiency one would get if we knew all primes less than the square root of a natural number. The beauty of course is we do not need to know those primes. So we trade some of the work necessary to find those primes for a task we can do in a knowable number of steps.

Theose arguments 1a-10a don't appear to be anything other than trial division by every number less than sqrt(N) that is relatively prime with 2*3*5*...pk or however far you went. Convince me this is better than making a list of primes up to sqrt(N) and dividing by these.

Playdo said:
2) Show that if k is a square free natural number [tex]\phi(k)\leq\phi(n) \forall n > k[/tex]

This is false. 5 is square free but phi(5)=4>2=phi(6). k doesn't even have to be prime to be a counterexample, phi(21)=12>10=phi(22).

Hurkyl said:
My attention span isn't what it's used to be. It's hard for me to follow your posts, Playdo, without some sort of summary or outline that describes the plot.

Agreed.

Why not implement this in some way to test it out? You'll have a much easier time convvincing people it's worthwhile if you have a working program. Or just write down the steps you'd use to factor a number completely in one cleanly formatted post. You don't have to explain the "why" of the steps, just give the bare bones algorithm. An estimate for the runtime would go a long way as well (you've mentioned a few places where you can count the steps- please do it)
 
  • #13
Playdo
88
0
shmoe said:
This is not an equality[\quote]

The caveat is [itex]n+1=(m+1)(s+1)[/itex] in which case it definitely is an equality as [itex]a^ib^i = (ab)^i[/itex] so technically it simply starts to ask the implications of the base in (1) of the first post being composite. A general useful goal in factoring is finding a way to exchange factorization of a large number for the factorization of a smaller one. If you keep that in mind you can chain together any number of apparently unrelated algorithms and in whatever order, so long as at the link you can get back and forth (think in pictures). I admit its a bunch of scratch paper so far. It will improve over time.

Your issue with the totient, your are right I made a mistake. I will fix that shortly. In fact primes have maximal totients compared to all numbers less than them. The result applies when [itex]\tau(k)>1[/itex]. That will be corrected. In fact it can be easily seen to be true by examining the formula for the totient function (You have to rewrite it, most books have it in a form where the terms are [itex](1-\frac{1}{p_k})[/itex]. I will go back and clarify that when I get a chance.


Over time I will work out examples. My understanding is that the two most popular sieve forms are modular sieves and quadratic sieves, the last two RSA numbers to be had fell to these and three months worth of massive parallel computing.

The story line here is simple. It is an adventure, nothing more. But there is a specific problem that is being looked at using a few basic concepts. The issue is how are composites and primes related? Clearly no number can be prime unless the primes below its root fail to divide it. Why there would exist algorithms that are faster than this is of interest. Behind all of this is a question of complexity as Chaitin defines it. Eventually I will flesh this thread out, but you may have to wait for a while. I don't get paid to do this, it is a hobby that I started enjoying when I first became intersted in mathematics. Although I do know some of the latest in cryptography and have a good backgorund in number theory and math in general, I am not pursuing this in the interst of learning what 'they' say is possible. This is experimental mathematics done in the spirit of the early number theorists. We have a toolbox of mathematical tricks and a set of objects to be investigated. I will take pleasure in revealing a picture that is eventually understandable to people outside the ivory tower. The important thing for me is that this is experimental mathematics and it is recreational. I do not expect to win a fields medal here. I do it because I like it, and therefore I do it the way I want to do it. That said I know the difference between a meaningful result and triviality. Sometimes even things that appear trivial require something more then 'it's obvious that' which is so popular among mathematicians and have started to pervade our textbooks to the point that many proofs cannot be followed confidently even by those with great skill. IMHO you might as well not even include a proof if you are only going to give half of it.
 
Last edited:
  • #14
Playdo
88
0
Hurkyl said:
My attention span isn't what it's used to be. It's hard for me to follow your posts, Playdo, without some sort of summary or outline that describes the plot. :frown:

P.S. if we write N=PQ in base P, then we don't have to do any carrying when we multiply P and Q.

Yes but to write N=PQ in base P or Q you have to know P or Q in advance. This defeats the whole purpose of a factorization algorithm. But I agree as something worth noting, P base P is 10 so if Q<P then the base P expansion of N is Q0. If Q is greater than P then the base P expansion of N is 10*Q. Of course it says we can simply divide N by P to obtain Q.
 
  • #15
Playdo
88
0
Hurkyl said:
Somehow, I missed your latest post when I wrote my previous reply!


If you're interested in the general problem if finding all the primes up to a given size, you might consider looking for info on wheel sieves -- I think they're the current state of the art.


If, to factor N, your method requires you to know all of the primes up to N^(1/4), then I'm sorry to say that in its current form, it is not useful for factoring a very large number. Any of the sieving algorithms will have factored N in much less time than it takes to simply write down that list of primes... let alone compute it! (Of course, I have that sneaky feeling that I've just delivered a set-up line! :smile:)


Certain classes of sieves use mappings that for special numbers are faster than if we just knew the number of primes less than root N. But in general sieves can only hope to approach the speed of knowing all the primes less than root N apriori. The point is that we do not know those primes less than root N for large N because they are very large. If we did knwo them we simply do the euclidean algorithm on N and the product of the primes less than root N. It takes much memory but the time involved can be stimated and we are done. But since for very large N we cannot know all the primes less than root N, we must find ways of either cleverly using more abstract spaces than the natural numbers under multiplication or we have to find methods for developing proxy sets for the set of primes less than N. Modular sieves do precisely this last. Notice that for the sequence of products of primes b[k] = 2, 6, 30, 210 ,2310 etc. [itex]\phi(b[k])-k[/itex] is a decent approximation of the number of primes less than b[k] not already included as factors of b[k]. Now the theorem that starts this thread highlights certain very special natural numbers N for which a subset of divisors base a can be mapped onto the set of divisors of n+1 base 10. It would stand to reason that we might want to generalize this by thinking of the conditional [itex]n+1=(m+1)(s+1)[/itex] not simply in base ten but in any base whatever. So suppose we are looking at base a and base b, where in base b the condition holds. But how does summation work using indexes from other bases than ten?
 
Last edited:
  • #16
shmoe
Science Advisor
Homework Helper
1,994
1
Playdo said:
shmoe said:
This is not an equality

The caveat is [itex]n+1=(m+1)(s+1)[/itex] in which case it definitely is an equality as [itex]a^ib^i = (ab)^i[/itex] so technically it simply starts to ask the implications of the base in (1) of the first post being composite.

I was aware of the condition n+1=(m+1)(s+1). what I quoted:

[tex]\sum_{i=1}^n a^ib^i = \sum_{i=1}^s a^{(m+1)i}b^{(m+1)i} \sum_{i=1}^m a^ib^i[/tex]

is still not an equality. If you meant the indicies to start at 0 and not 1, then it's an equality, but then your statement that a and b divide the above is not in general true.

Playdo said:
Your issue with the totient, your are right I made a mistake. I will fix that shortly. In fact primes have maximal totients compared to all numbers less than them. The result applies when [itex]\tau(n)>1[/itex]

What do you mean tau to be here? tau(k) is usually the number of positive divisors of k, but this is surely not what you mean.

Playdo said:
I will go back and clarify that when I get a chance.

Whatever you do, don't go back and start editing your posts long after they've been put up. I can't speak for anyone else, but I won't bother going back to look for what you've changed days after the fact.


Playdo said:
Over time I will work out examples. My understanding is that the two most popular sieve forms are modular sieves and quadratic sieves, the last two RSA numbers to be had fell to these and three months worth of massive parallel computing.

The general number field sieve is the current champ I believe. I don't know what you mean by a "modular sieve" for factoring integers.

Playdo said:
Sometimes even things that appear trivial require something more then 'it's obvious that' which is so popular among mathematicians and have started to pervade our textbooks to the point that many proofs cannot be followed confidently even by those with great skill. IMHO you might as well not even include a proof if you are only going to give half of it.

Every author has to walk the line between insulting the reader with simple details (hence needlessly bogging down the text) and leaving out so much that it's too hard to follow for their intended audience. There's no level of detail that will satisfy everyone.
 
  • #17
shmoe
Science Advisor
Homework Helper
1,994
1
Playdo said:
Notice that for the sequence of products of primes b[k] = 2, 6, 30, 210 ,2310 etc. [itex]\phi(b[k])-k[/itex] is a decent approximation of the number of primes less than b[k].

This is actually an extremely poor approximation. The usual notation for your b[k] is p#, where p would be the kth prime (p# is known as the primorial) . That is 2#=2, 3#=6, 5#=30, etc. p# is asymptotic to e^p (follows from the prime number theorem). pi(e^p) is asymptotic to e^p/p (prime number theorem again). phi(p#) is asymptotic to e^p * e^(-gamma)/log(p), where gamma is euler's constant (subtracting off the k has no effect on this asymptotic, e^p/log(p) is much much larger than p, let alone k). This last one comes from Mertens asymptotic for the product of (1-1/p) ranging over primes up to x.

The result is pi(p#)/phi(p#)->0 as p-> infinity, not a good approximation at all. In vague words, most of the numbers less than p# are not prime because they have a prime divisor greater than p, these aren't removed when you sieve out by primes less than p. This is just a consequence of the exponential rate of growth of p# that makes the composites removed by sieving out by primes less than p somewhat inconsequential.

You don't have to go far to see this approximation get bad.
pi(11#)=pi(2310)=343 but phi(11#)-5=475
pi(13#)=pi(30030)=3248 but phi(13#)-6=5754
 
  • #18
Hurkyl
Staff Emeritus
Science Advisor
Gold Member
14,967
19
But in general sieves can only hope to approach the speed of knowing all the primes less than root N apriori.
That's not true at all.

If we did knwo them we simply do the euclidean algorithm on N and the product of the primes less than root N. It takes much memory but the time involved can be stimated and we are done.
Yes, we can. An extremely generous lower bound on the effort required is that it this approach would requires O(N / lg N) iterations of the Euclidean algorithm:

There are O(N / lg N) primes less than the square root of N. Each of them is at least as big as 2. The product of those primes is 2^O(N / lg N). The number of iterations required by the Euclidean algorithm is on the order of the logarithm of the arguments. Thus, we require O(N / lg N) iterations.


So, even if you were handed on a silver platter the product of the primes less than the root of N, and you had a special computer that was capable of performing the necessary arithmetic, you still can't beat the sieving algorithms when looking at very large numbers.

(For reference, see Wikipedia for the running time of the GNFS)


Incidentally, performing one giant GCD is not the most efficient way to trial divide by all of your primes -- even if you got their product for free. I think you get the best performance when you partition your primes so that the product within each partition is roughly the same size as N.


None of the high powered factoring algorithms are based on division at all:

The Pollard rho method iteratively applies a polynomial map. When that map, viewed modulo some prime factor of N, starts cycling, that factor can be extracted through a gcd computation. (The elliptic curve method is better, and I think it does essentially the same thing on an elliptic curve, but I don't have reference material handy to make sure)

The sieves are based on the fact that numbers that squaring is a 4-1 (or more) function modulo N. Given a pair of numbers that square to the same thing (and aren't negations of each other), you can generally extract a prime factor of N through a gcd computation.
 
Last edited:
  • #19
shmoe
Science Advisor
Homework Helper
1,994
1
Hurkyl said:
Yes, we can. An extremely generous lower bound on the effort required is that it this approach would requires O(N / lg N) iterations of the Euclidean algorithm:

There are O(N / lg N) primes less than the square root of N. Each of them is at least as big as 2. The product of those primes is 2^O(N / lg N). The number of iterations required by the Euclidean algorithm is on the order of the logarithm of the arguments. Thus, we require O(N / lg N) iterations.

That's not very generous! You missed out on a square root.

The product of primes less than sqrt(N) is asymptotic to e^sqrt(N), so you're looking at sqrt(N) operations. However this requires the ability to do computations with an obscenely large number. For a sanity check, how large can we take p before p# gets beyond the storage space of a typical desktop computer?

(just a note for comparisons with the gnfs, the bound given in wiki has n the number of digits of your number, the N above is the number itself)
 
  • #20
Hurkyl
Staff Emeritus
Science Advisor
Gold Member
14,967
19
Oh hah, good catch. I mentally applied the logarithm too soon, and was thinking about N/2 instead of the square root of N. :frown:


Do you know the constant out front? I'll assume it's 1. I would say 4 gigabytes is a good estimate for the physical memory of a desktop. It's higher than most people have now, but not all that much... and it will probably be the norm soon anyways.

So, we have 2^35 bits to play with. So, solving 2^(2^35) = e^sqrt(N)...

2^(2^35) = e^sqrt(N) = 2^(sqrt(N) lg e)
2^35 = sqrt(N) lg e
N = 2^70 / (lg e)^2
N ~ 2^69 ~ 10^21

So, N could be 21 decimal digits, at most.


If we use hard drive space -- let's be generous and assume a terabyte of storage. With 2^43 bits to play with, that works out to

N ~ 2^85 ~ 10^26

So 26 decimal digits, at most.


An RSA number of that size can be factored fairly instantly, right?
 
Last edited:
  • #21
shmoe
Science Advisor
Homework Helper
1,994
1
shmoe said:
For a sanity check, how large can we take p before p# gets beyond the storage space of a typical desktop computer?

To answer myself, p# is about e^p, so has p/log(2) digits in base-2. Divide by 2^33 to get the number of gigabytes. If p is about 10^12, this is about 168 gigs.

edit- hah- Hurkyl's edit beat my crude post! End result- if you had to store the product of primes less than sqrt(N) to factor N, N can't be very large.

The constant in front is 1, if you mean in p#~e^p. It's a logarithm away from the prime number theorem.
 
Last edited:
  • #22
shmoe
Science Advisor
Homework Helper
1,994
1
Hurkyl said:
An RSA number of that size can be factored fairly instantly, right?

Yep. Using the built in factor function of PARI/GP it takes about 3.3 seconds to factor one hundred 26-digit numbers made up of 13 digit primes on an Athlon64 3000+. The built in algorithm uses a combination of methods, and I admit I didn't build my 26-digit numbers to specifically try to foil these like you would normally do with RSA numbers (they were made from random 13 digits primes with no special considerations), but it gives a rough idea of what can be done.

For comparison, the same test took 75 seconds on one hundred 42 digit numbers made from two 21 digit primes.
 
  • #23
shmoe
Science Advisor
Homework Helper
1,994
1
shmoe said:
The constant in front is 1, if you mean in p#~e^p. It's a logarithm away from the prime number theorem.

Sorry, this is wrong, a silly mistake too. logp#~p does not imply the asymptotic I wrote above. This won't affect the number of digit computations above since we were interested in log(p#) anyways.

It does change my post 17, though not the conclusion. We can still say pi(p#)~p#/log(p#)~p#/p and phi(p#)~(p#)*e^(-gamma)/log(p). Hence pi(p#)/phi(p#)~e^(-gamma)*log(p)/p->0 as p->0.

The moral- you can still get the correct answer if you bugger up, and knowing the answer is correct is no excuse for such sloppiness.
 
  • #24
Playdo
88
0
Alphabets and Ordinal Numbers

Sorry Schmoe, I forgot that I had the index greater than 0. That's just lack of attention to detail leading to a bad argument on my part. after you divide out the ab you get a statement that may be true if the new smaller upper index is one less than a composite.

So we all agree thee are a known grab bag of algorithms and techniques and preferred notations. Tell me what you think of the following perspective based upon the clear implication of the theorem in post 1.

!

Think of a natural number, say two for example, not as a symbol in a particular base but as an abstract idea. Two means a set of order two, in base ten we write this as 2, in base two we write it as 10, for every base greater than base 2 we can write 2. So we could think of all of the base a arithmetics as collections of word built upon alphabets. The alphabet base 2 is {0,1}, the alphabet base 10 is {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}.

Now given the set of finite ordinal numbers we can use the base a alphabets to assign words (symbols). The thing to observe is that the symbol 99, does not exist in the dictionary for base 2. In fact it does not exist in the dictionary for any base less than 10.

Realizing that we cannot come up with infinitely many symbols for the natural numbers I have to try to show this truth with the symbols we already have.

Now look, the first theorem in the first post says that the symbol 1111 appears in every dictionary for every base and it gives us a way of mapping the ordinal position of all of them into base ten. How is this useful?

As I pointed out the formula allows factorization of larger numbers by finding similar ones with similar multiplicative properties. But look what else it does. Let S[1111] be the set of all base a symbols 1111 mapped into base ten. This set looks like (a=1 is allowed even though it is not a base) {4,15,40,297,...}. Suppose I wanted to factor 15. Simple enough to do normally, but notice I could use S[1111] to simply notice that 40 is divisible by ten. Having the base a connector function for S[1111] I could go up, factor 40 in base 10. Having done this some factor pair of 40 is mapped directly onto a desirable factor pair of 15. This seems wrong but think of this. A large RSA number is going ot have many digits that are nines, so I cannot find a word in a base lower than ten that will be the same. But in bases larger than 10 there are infinitely many words whose digits are precisely those of the RSA number base 10 and we can generate them and map them back into base ten to obtain S[RSA]= {RSA, ,,,}

The reason we have a chance here is that maps like I depicted in the first theorem do not preserve the entire multiplicative structure. Meaning elements of S[RSA] can have many more than one factor pair containing two large primes. If we find [itex]s \in S[RSA][/itex] where [itex]\tau(s)[/itex] is sufficiently large, we can factor this using only small primes and then by reconstructing every possible factor pair find the one that maps precisely back to the factor pairs of RSA. There are some complicating details here's (some of which I have laid out in the earlier posts) and some things to be proved, like how much larger in general do we need to go to factor RSA using this method. And, what is the limit on tau that makes using this method a waste of time. For instance, if we found s to have only a few small prime factors with very large multiplicity, constructing the possible factor pairs might have too many steps. However if we found s to have small multiplicites and a relatively small number of unique prime divisors this method might save a lot of time.

Q1 Is it always possible to factor every integer using this method?
Q2 What is the expected value of the number of steps involved?

Notice also that this method requires knowledge of the number of ways that factor pairs can be formed from a set of prime divisors. In this way it is related to the partition problem, which I believe Ramanujan and Hardy solved around WWI. So there you go. We have our little piece of real estate in the universe of mathematical truth. Some of the property is very expensive algorithmically speaking and some of it is very cheap. Making better algorithms can help cut costs and provide better encryption. In fact I would think that a question of primary interest to cryptographers would be proving that the time required for any factorization algorithm lies between bounds for a given natural number.
 
  • #25
shmoe
Science Advisor
Homework Helper
1,994
1
Playdo said:
So we all agree thee are a known grab bag of algorithms and techniques and preferred notations.

Yes, and when you use these notations to mean something else it gets confusing. So I ask again since you use it again in the above post, what do you mean by [tex]\tau(n)[/tex]? It didn't look like the usual usage before (or the statement you made about phi(k) was wrong).

Playdo said:
A large RSA number is going ot have many digits that are nines, so I cannot find a word in a base lower than ten that will be the same.

You can always just write the RSA number in base 2.

Your sequence S[1111], etc. are growing very fast when you have a lot of digits (as in RSA moduli). I wouldn't hold my breath hoping any of these are composed entirely of small factors. You might want to look up "Dickman's function" and how it relates to "smooth numbers".

Playdo said:
Notice also that this method requires knowledge of the number of ways that factor pairs can be formed from a set of prime divisors. In this way it is related to the partition problem, which I believe Ramanujan and Hardy solved around WWI

The number of 'factor pairs' is just the number of divisors, which is dead easy if you have the prime factorization. How is this related to the partitions of a number?

Playdo said:
Making better algorithms can help cut costs and provide better encryption.

What encryption scheme benefits from improved factoring algorithms? Certainly not RSA, whose security needs factoring to be difficult.


Am I expecting too much that you will actually try to implement this algorithm? I'd be happy to give you some numbers that are products of two large primes (whatever size you like) for you to test it out on. The next best thing to proving an algorithm is correct combined with a detailed runtime analysis would be a pile of sample calculations.
 
  • #26
Playdo
88
0
shmoe said:
Yes, and when you use these notations to mean something else it gets confusing. So I ask again since you use it again in the above post, what do you mean by [tex]\tau(n)[/tex]? It didn't look like the usual usage before (or the statement you made about phi(k) was wrong).



You can always just write the RSA number in base 2.

Your sequence S[1111], etc. are growing very fast when you have a lot of digits (as in RSA moduli). I wouldn't hold my breath hoping any of these are composed entirely of small factors. You might want to look up "Dickman's function" and how it relates to "smooth numbers".



The number of 'factor pairs' is just the number of divisors, which is dead easy if you have the prime factorization. How is this related to the partitions of a number?



What encryption scheme benefits from improved factoring algorithms? Certainly not RSA, whose security needs factoring to be difficult.


Am I expecting too much that you will actually try to implement this algorithm? I'd be happy to give you some numbers that are products of two large primes (whatever size you like) for you to test it out on. The next best thing to proving an algorithm is correct combined with a detailed runtime analysis would be a pile of sample calculations.

tau is the number of divisors of N, so yes I am using it incorrectly. RSA does not benefit from faster factoring algorithms, but finding alerts us that current RSA numbers may not be as safe as we suspect. So there is a benefit.

Really, you can write those base 2? Clearly, obviously, without any shadow of doubt, I also know that we can write any natural number in base two and that it will have only zeros and ones in its expansion.

The point which you have clearly not seen is that standard multiplication is polynomial multiplication evaluated at a base a and simplified so that no coefficients are greater than the base. This means that there is probably something we can learn about factoring natural numbers by studying very careful the base a expansion algorithm and polynomial multiplication and the relationship between symbols used for the various bases.

Also let us admit that having larger numbers we still have the same relative proportion of numbers divisible by small primes. Take any interval n>0, n,...n+5, four of them are divisible by 2 or 3. Unless S[1111] has some special relationship to the primes themselves it will most likely contain many numbers that are completely factorable by small primes. the number of factor pairs is not the number of divisors. For example suppose I have

2*2*3*5 = 60

The number of divisors is the order of the set {1,2,3,4,5,6,10,12,15,20,30,60} which is twelve. The number of factor pairs is the order of the set {(1,60),(2,30),(3,20),(4,15),(5,12),(6,10)} which is six. Now given a prime factorization I must point out to you that arriving at the specific divisors and the number of divisors is not dead easy. In just this example it is calculated as the sum of the binomial coefficients of degree four minus the number of duplicates we get from having a multiplicity of two for the prime divisor two, this then is divided by two because for example 2*3=3*2=6 it is redundant to count both orderings of 3*2.

But if you think it is so easy, please write of the complete factorization of say ten consecutive natural numbers and show a formula that calculates the total number of divisors (tau(n)) from it. If I have learned anything about open questions in number theory is that they are almost never dead simple even if they can be written down simply.

Consider this: given any polynomial equation taken over the natural numbers say [itex]x^2+8x+11=59[/itex] base ten. The equation must be true if a solution exists in every base a. So change bases to base eleven to obtain [itex]x^2+8x=44[/itex]. Immediately we see that x even is a strong possibility. If x <10<11 it will have the same symbol in both bases. So as simple as that I have a sieve.

[tex]2^2+8*2=4+15=21\neq44 base 11[/itex]
[tex]2^4+8*4=15+2;10=3;15=44 base 11[/itex]

So simple so easy, yet it shows that the real relationship we should want to understand looking at the natural numbers and diophantine equations is the relationship between the base a notations.

Consider Fermat

[tex]z^n=y^n+x^n[/itex] It is claimed not n, z,x, y exist in the natural numbers (at least) to satisfy this if n>2. Weils solved this os we have the benefit of that, but let's suppose there is a solution. Clearly z>y>x WLOG so try writing that equation in base x.

[tex]\sum_{i=0}^{n_z}\alpha_ix^i = \sum_{i=0}^{n_y}\beta_ix^i+x^n[/itex]

Now if you understand the base x algorithm you see why I wrot eit this way. But here look now we have gotten rid of z and y really, we can just try to answer the question using properties of base x expansions. And we get

[tex]\sum_{i=0}^{n_z}\alpha_ix^i - \sum_{i=0}^{n_y}\beta_ix^i = x^n[/itex]

Or putting this into the coefficient notation we usually use (I let ; mean concatentation of strings)

[itex]\alpha_{n_z};...;\alpha_{n_y} - \beta_{n_y};...;\beta_0-\alpha_0 = 10000;...;0000[/itex] n-1 zeros are found on the right side.

From another perspective this is

[itex]\alpha_{n_z};...;\alpha_0=\beta_{n_y};...;\beta_n+1;...;\beta_0[/itex]

if [itex] n_z \geq n_y \geq n[/itex] (which is true) then we cannot get to the left hand side of that equality without carrying, but the only opportunity the addition on the right gives us is [itex]\beta_n+1=x[/itex] and that which is carried is one so it must be that we keep carrying until we end up wit the same number of digits, that means [itex]\beta_j+1=x \forall j \geq n[/itex]. But that means [itex]n_z=n_y+1[/itex] and this turns out to be a strong limitation on the numbers that could participate in a solution to Fermat.

Now this is not he whole story but there are consequences you can find here related entirely to the fact that the equation must be true in all base a expansions if it is true base ten (ie find just one base for which the equation cannot hold, that is for which you can easily show it cannot hold and there is your contradiction theorem proved). Furthermore rather than putting z^n and y^n base x more possible knowledge is gained by using z and y base x and raising them to the nth power. These things are very difficult to study with general n and look like a forest to most people, but if you look hard enough you start to see patterns that distinguish classes and may allow interesting results. The problem we face to day is that unless someone important says it is a goo dline of inquiry you get zero support. not little support, no absolutely no support. So trying to do anything that goes against the status quo in thought or requires you to invent knew notation and concepts contrary to that status quo is doomed to academic failure. Therefore you should give up the study of number theory immediately.
 
Last edited:
  • #27
shmoe
Science Advisor
Homework Helper
1,994
1
Playdo said:
tau is the number of divisors of N, so yes I am using it incorrectly.

This is what I said the usual usage was. My objection is you seemed to claim if k has tau(k)>1 (and maybe k squarefree as well) then you have phi(n)>phi(k) for all n>k, but this isn't true as my example showed. Maybe I'm not clear on what you were saying as it spanned a couple of posts.

Playdo said:
RSA does not benefit from faster factoring algorithms, but finding alerts us that current RSA numbers may not be as safe as we suspect. So there is a benefit.

Oh, absolutely. That's not the meaning I was reading from what you said earlier, no problem.

Playdo said:
Really, you can write those base 2? Clearly, obviously, without any shadow of doubt, I also know that we can write any natural number in base two and that it will have only zeros and ones in its expansion.

*You* raised the issue that there were 9's appearing in an rsa number, not me. I'm just pointing out that this is from your arbitrary choice of base 10.

Playdo said:
The point which you have clearly not seen is that standard multiplication is polynomial multiplication evaluated at a base a and simplified so that no coefficients are greater than the base. This means that there is probably something we can learn about factoring natural numbers by studying very careful the base a expansion algorithm and polynomial multiplication and the relationship between symbols used for the various bases.

Sorry, why do you say I am missing this? I have not in anyway condemned what you're trying to do, otherwise I wouldn't keep asking you to provide an implementation of your algorithm, or at least clearly outline it for us (any chance of this happening?)

Playdo said:
Also let us admit that having larger numbers we still have the same relative proportion of numbers divisible by small primes. Take any interval n>0, n,...n+5, four of them are divisible by 2 or 3. Unless S[1111] has some special relationship to the primes themselves it will most likely contain many numbers that are completely factorable by small primes.

Look into the probability a number of a given size is smooth (to whatever bounds) and provide some justification for your loose claims please. They will likely have *some* small prime factors, but the probability of being composed *entirely* of these small factors is rather low. The probability that a 100 digit number is 10^10-smooth (that is all it's prime divisors are less than 10^10) will be in the neighbourhood of 10^(-10) iirc.* In short- removing the small primes will help factor a random big number, but you probably still have *lots* of work to do.

*my memory tells me that the probability x will be x^(1/a) smooth approaches a^(-a) as x gets large, which is an approximation of the Dickman function. Note that 10^10 is probably not what you'd consider a small set of primess to do trial division with. I can't recall how good an approximation the a^(-a) was to Dickman's, I do know there are ways to get more accurate probabilities (more work is involved of course)

Playdo said:
the number of factor pairs is not the number of divisors. For example suppose I have

2*2*3*5 = 60

The number of divisors is the order of the set {1,2,3,4,5,6,10,12,15,20,30,60} which is twelve. The number of factor pairs is the order of the set {(1,60),(2,30),(3,20),(4,15),(5,12),(6,10)} which is six. Now given a prime factorization I must point out to you that arriving at the specific divisors and the number of divisors is not dead easy. In just this example it is calculated as the sum of the binomial coefficients of degree four minus the number of duplicates we get from having a multiplicity of two for the prime divisor two, this then is divided by two because for example 2*3=3*2=6 it is redundant to count both orderings of 3*2.

But if you think it is so easy, please write of the complete factorization of say ten consecutive natural numbers and show a formula that calculates the total number of divisors (tau(n)) from it. If I have learned anything about open questions in number theory is that they are almost never dead simple even if they can be written down simply.

If [tex]n=p_1^{n_1}p_2^{n_2}\ldots p_k^{n_k}[/tex] is the prime factorization of n (the p_i's distinct) then [tex]\tau(n)=(n_1+1)\times(n_2+1)\times\ldots\times (n_k+1)[/tex]. The divisors are in 1-1 correspondance with k tuples [tex](a_1,\ldots,a_k)[/tex] and [tex]0\leq a_i \leq n_i[/tex], just write these out in lexigraphical order say and you can write out the divisors or factor pairs no problem. (ps. going between 'factor pairs' and divisors is trivial, please excuse my earlier abuse of the terminology)

It's late, so I'll be leaving the rest for when I have time. I will ask once more- can I expect you to actually implement your proposed algorithm in some way? A sure fire way to make people notice would be a practical test. This doesn't mean that you would have to knock off a current RSA number (as the computing power is prohibitive).
 
Last edited:
  • #28
Playdo
88
0
Ok Schmoe,

I will look up smooth, and I will try to talk within the bounds of the notation. It's not the notation I despise but I am a bit sensitive because I have met a lot of arrogant *******s in mathematics, a surprising number of whom have Phd's. I will work out some implimentations and some bounds on time. It sounds like you could help with the proper algorithmic notation, I am not very familiar with little o , big o aside from knowing that x+o(p) is insignificantly larger than x. Sound right? I will have to redo many of the posts later in the thread to make sure the notation and usages are correct.
 
  • #29
Playdo
88
0
shmoe said:
To answer myself, p# is about e^p, so has p/log(2) digits in base-2. Divide by 2^33 to get the number of gigabytes. If p is about 10^12, this is about 168 gigs.

edit- hah- Hurkyl's edit beat my crude post! End result- if you had to store the product of primes less than sqrt(N) to factor N, N can't be very large.

The constant in front is 1, if you mean in p#~e^p. It's a logarithm away from the prime number theorem.

Mathematica works with numbers up to several hundred digits on a desktop. Unless of course I have assumed that the large numbers coming out have been rounded somehow without Wolfram telling me about it in the function's description. That doesn't seem like his style though.
 
  • #30
Playdo
88
0
shmoe said:
This is what I said the usual usage was. My objection is you seemed to claim if k has tau(k)>1 (and maybe k squarefree as well) then you have phi(n)>phi(k) for all n>k, but this isn't true as my example showed. Maybe I'm not clear on what you were saying as it spanned a couple of posts.



Oh, absolutely. That's not the meaning I was reading from what you said earlier, no problem.



*You* raised the issue that there were 9's appearing in an rsa number, not me. I'm just pointing out that this is from your arbitrary choice of base 10.



Sorry, why do you say I am missing this? I have not in anyway condemned what you're trying to do, otherwise I wouldn't keep asking you to provide an implementation of your algorithm, or at least clearly outline it for us (any chance of this happening?)



Look into the probability a number of a given size is smooth (to whatever bounds) and provide some justification for your loose claims please. They will likely have *some* small prime factors, but the probability of being composed *entirely* of these small factors is rather low. The probability that a 100 digit number is 10^10-smooth (that is all it's prime divisors are less than 10^10) will be in the neighbourhood of 10^(-10) iirc.* In short- removing the small primes will help factor a random big number, but you probably still have *lots* of work to do.

*my memory tells me that the probability x will be x^(1/a) smooth approaches a^(-a) as x gets large, which is an approximation of the Dickman function. Note that 10^10 is probably not what you'd consider a small set of primess to do trial division with. I can't recall how good an approximation the a^(-a) was to Dickman's, I do know there are ways to get more accurate probabilities (more work is involved of course)



If [tex]n=p_1^{n_1}p_2^{n_2}\ldots p_k^{n_k}[/tex] is the prime factorization of n (the p_i's distinct) then [tex]\tau(n)=(n_1+1)\times(n_2+1)\times\ldots\times (n_k+1)[/tex]. The divisors are in 1-1 correspondance with k tuples [tex](a_1,\ldots,a_k)[/tex] and [tex]0\leq a_i \leq n_i[/tex], just write these out in lexigraphical order say and you can write out the divisors or factor pairs no problem. (ps. going between 'factor pairs' and divisors is trivial, please excuse my earlier abuse of the terminology)

It's late, so I'll be leaving the rest for when I have time. I will ask once more- can I expect you to actually implement your proposed algorithm in some way? A sure fire way to make people notice would be a practical test. This doesn't mean that you would have to knock off a current RSA number (as the computing power is prohibitive).

Yes I agree with the tau formula for sure, the fact that it is multiplicative is interesting too. I get the k-tuples now. You first list them and then order them and then wrap the list over upon itself so that [1,1,1...,1]X[p_k,p_k,...,p_k]=1*N (my notation there is crap but you get the point). So my apologies on that one, that is dead easy(trivial).

Are you familiar with the function [itex]\sigma(n)[/itex]?
 
  • #31
shmoe
Science Advisor
Homework Helper
1,994
1
Playdo said:
Are you familiar with the function [itex]\sigma(n)[/itex]?

It's usually the sum of the divisors. more generally, [tex]\sigma_m(n)[/tex] is the sum of the mth powers of the divisors, and you'd have [tex]\sigma_0(n)=\tau(n)[/tex]. [tex]\sigma_m[/tex] is multiplicative as well, and also has simple formula given the prime factorization of n (when m>0):

[tex]\sigma_m(n)\prod_{j=1}^{k}\frac{p_j^{(n_j+1)m}-1}{p_j^m-1}[/tex]

This will be in any half decent elementary number theory text, Hardy and Wright's is nice, Leo Mosers "an introduction to the theory of numbers" is good and is available online:
http://www.trillia.com/moser-number.html
 
  • #32
shmoe
Science Advisor
Homework Helper
1,994
1
Missed this earlier today:

Playdo said:
I will look up smooth, and I will try to talk within the bounds of the notation.

The notations that have survived over the years generally capture useful concepts that you've found yourself talking about anyways, like smooth numbers. It's well worth your time to learn the basic jargon when applicable (make up your own is fine where none previously exists), communication goes much more smoothly if we don't need to constantly translate things. It also makes it much simpler for you to look up previous results if you know the terminology to search for.

Playdo said:
It's not the notation I despise but I am a bit sensitive because I have met a lot of arrogant *******s in mathematics, a surprising number of whom have Phd's.

Some are surely arrogant, some are probably just exasperated. You have no idea how mind numbing it is when someone comes in claiming they can revolutionize number theory with their observation that all primes>3 are congruent to 1 or 5 mod 6 (course they don't know what a 'mod' is, and sometimes talk about 'bowling pins' instead). While this is a swell observation, much banging of the head on keyboard ensues as you try to explain to them that this is in fact a rather well known and if they took a little bit of time to become familiar with even the most basic literature, they would see they've got nothing new at all. How to still encourage someone to keep an interest in number theory while telling them what they have correct isn't new and what they have new isn't correct? Much as you try to not let this sort of thing influence your reaction to the next person who comes along, some of it builds up.

sorry, went off on a bit of a rant there, this isn't directed at you.

Playdo said:
It sounds like you could help with the proper algorithmic notation, I am not very familiar with little o , big o aside from knowing that x+o(p) is insignificantly larger than x. Sound right? I will have to redo many of the posts later in the thread to make sure the notation and usages are correct.

x+o(p) doesn't really mean much as written, how is p related to x? f(x)=o(g(x)) means g(x)/f(x)->0 as x->infinity (or zero, or whatever other point you were looking at). Roughly g(x) gets crushed by f(x).

These asymptotic notations will live in any number theory text with "analytic" in the title (and many others), but are rare in the "elementary" texts. Hardy and Wright's being one of the exceptions, see:

http://mathworld.wolfram.com/LandauSymbols.html
http://mathworld.wolfram.com/AsymptoticNotation.html

for basic definitions. You might want to try to find the "Handbook of Applied Cryptography" by Menzies, et al. I mention it because it's available online as well and has basic definitions and results of many things number theory related (few proofs though), and has piles and piles of algorithms you can look at to get an idea of how to write these sort of things in readable ways.

Playdo said:
Mathematica works with numbers up to several hundred digits on a desktop. Unless of course I have assumed that the large numbers coming out have been rounded somehow without Wolfram telling me about it in the function's description. That doesn't seem like his style though.

Yes it can, but the point was p# gets just enormous very fast, and your computer won't be able to even store p# in your ram when p is about 10^10, much less do any computations with it. For example, the product of all primes less than 10^5 has some 43,293 digits, less than 10^6 has 433,637 digits, roughly multiply by 10 for each 1 you add to the exponent. This is decimal digits, it gets impossible to work with quickly.

Since you have mathematica, you might want to take a monte carlo shot at the probability a number in a given range is smooth. Pick your smoothness bound, randomly generate a large number and divide out by all primes less than your bound bound. Run a primality test on what's left to see if any more work needs to be done to factor it (a pseudo primality test will be pretty quick up to a few hundred digits, trying to factor what's left might cause mathematica to explode). Repeat this for a few thousand large numbers,. What ratio had something non-prime left over after removing small primes? Repeat for different smoothness bounds, and different ranges of 'large" random numbers. I think you will be suprised at the results given what your intuition was pointing towards. (this sort of experimentation is what idle computer cycles were made for)
 
Last edited:
  • #33
Playdo
88
0
Well, sure it gets annoying (but recognizing base 6 as special can be start of along love affair with the natural numbers if we just have a little psychological and social savvy in our responses to sychophants and neophytes). And I am like anyone, there are times that I lose my patience. But every new novice is a different person that has the right to expect to be treated with the same consideration as everyone else. I know that patience is the better way because I have been on both sides of the issue. Perhaps if we had a more proactive public image it might help, and I think boards like this can go a long way either to help or hurt our cause.

So anyways, smooth numbers, numbers which have only prime divisors less than n are n-smooth, this corresponds to which types of FFT's can be used to factor them. Ok, nice, I'm guessing then there is some analytic advantage to thinking of everything in the real base e? DFT's I do know are convolutes allowing the writing of a product in terms of a sum. Riemann's famous function is the result of an observation that an infinite product including the primes can be expanded and reqorked into the zeta function. I am no expert in those things and I do not really want to be, but I don't want to say wrong things about them either.

The smooth numbers, yes that is like what I am saying with the one algorithm idea, that we can produce sequences that contain easily factorable numbers and then just map them back to the number we are intersted in. Really my interest in all of these things centers on a few ideas.

1) Understanding as clearly as possible the relationship between finite ordinals (natural numbers) and the base a representations of that set.

2) Understanding how polynomial multiplication proceeds.

3) Understanding how to combine those notions into a basic understanding that facilitates further study in number theory.

4) To pursue these goals within the confines of elementary number theory whenever possible.

I know that it has been said that there exists no general method for solving all diophantine equations, in fact I thought Dirichlet proved that. However, I have never seen anyone take the time to painstakingly investigate the implications of base a arithmetics in our quest for solutions. So what I am reaching for here is better understanding for myself, but also novel perspectives on the issue because I suspect that yes we do not have general methods but we can arrive at a set of tools that allows us to say when solutions exist and/or how long it will take to find out using our best methods. That's number theory.

Related to my Fermat comment (which is a classic problem you don't want to talk about if you intend to have a career) you can imagine how he may have starting thinking about it. The pythagorean theorem is a jewel of number theory and really important much beyond that. it's not quite quadratic reciprocity cool but pretty damn close when you start asking why because it is a consequence of euclidean gemoetry. But why is it suddenly not true for any integer greater than two? Trying to look at it using base a transformations is simply a search for a mechanism of inquiry, not an attempt to insult the many gifted people who have said things about it in the past.

Goldbach - every even number N can be written as the sum of two primes. Well, we have the set of numbers that add to N as ordered pairs. The reduced residue system must be closed in this sense since a+b=N with gcd(b,N)>1 implies gcd(a,N)>1 So we have pairs (1, N-1),... And we can estimate how many primes must be in the reduced residue system, but the argument itself suggests that at least one of those pairs contains only primes (unless the two primes are both factors of N, which I think we can handle a proof of when that happens, 'never' is my guess.). There is something there ot be understood as this is linked to the distribution of primes and rrs, but I have never seen a thorough investigation of it. All of these are my liberty in this thread, I won't go out of my way to offend anyone.

Thanks for the help,

Playdo
 
Last edited:
  • #34
Hurkyl
Staff Emeritus
Science Advisor
Gold Member
14,967
19
Playdo said:
I know that it has been said that there exists no general method for solving all diophantine equations
As you've said, this incompleteness theorem has been proven. The only assumption that is made is that your method is rigorous -- if it claims that there is no solution, that can actually be proven.

So, the only way a "novel perspective" can get around this incompleteness theorem is to give up hope that your perspective will actually be able to provide a proof when there aren't any solutions.


New methods that can give answers in new situations, and better answers in old situations, are valuable of course... but we should still be aware that a complete solution is not attainable.
 
  • #35
Playdo
88
0
Hurkyl said:
As you've said, this incompleteness theorem has been proven. The only assumption that is made is that your method is rigorous -- if it claims that there is no solution, that can actually be proven.

So, the only way a "novel perspective" can get around this incompleteness theorem is to give up hope that your perspective will actually be able to provide a proof when there aren't any solutions.


New methods that can give answers in new situations, and better answers in old situations, are valuable of course... but we should still be aware that a complete solution is not attainable.

Yup. Although if we allow ourselves the novelty of recognizing that in order to calculate any function we must use some type of algorithm, except perhaps an identity function (in which case there is one mental step performed) then the distinction between so called closed and open solutions is not so sharp. For instance we know the quadratic formula and we call it a solution to the zeros problem and it is with respect to the notation, but to actually calculate it for any given values of a,b and c requires several steps and multiplications. Am I really saving time in a quantitative sense by evaluating the formula when I might be able to simply take the modulus of the quadratic equation once the coefficients are known? We need to let off of the pride that has surrounded certian methods. Not so as to disrespect the advances made but to make sure that there are new understandings waiting just beyond the ring of our fire light. Chaitin, Godel, Liebnitz,... how many more names can you add, and how many of these hid important results all of their lives for fear of criticism that did not need to happen?

I like the atmoshpere we have going here now, hopefully we can keep it up.

More on that Goldbach's idea. Consider that sequence you guys seem to use {itex}p#{/itex}, the sequence of products of consecutive primes. Given even n golbach says we can write it as the sum of two odd primes. this means the primes we seek must be elements of the rrs modulo p#_k. so think of those which sum a+b=p#_k as ordered pairs (a,b) for the sequence of interest we can write all such ordered pairs.

I need to grab some things fomr an earlier post. BRB
 

Suggested for: A Factorization Algorithm

Replies
3
Views
391
Replies
2
Views
583
  • Last Post
Replies
1
Views
463
  • Last Post
Replies
5
Views
1K
  • Last Post
Replies
6
Views
1K
  • Last Post
Replies
1
Views
931
Replies
4
Views
992
Replies
13
Views
2K
Top