Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

A Factorization Algorithm

  1. Jun 30, 2006 #1
    Consider the following true statement. Given natural numbers n, m, s

    (1) [tex]\sum_{i=0}^n a^i = \sum_{i=0}^s a^{(m+1)i} \sum_{i=0}^m a^i[/tex]


    (2) [tex]n+1=(m+1)(s+1)[/tex]

    Now suppose that [tex]N[/tex] is a natural number where

    (3) [tex]N = \sum_{i=0}^n a^i [/tex]

    Now restrict [tex]a[/tex] to the natural numbers. Clearly for certain special numbers N we have a one to one map of factor pairs of [tex]N[/tex] and factor pairs of [tex]n+1[/tex] which is much smaller.

    In order for this thereom to give non-trivial factorizations [tex]n+1[/tex] must not be prime. If this is true and (3) holds then [tex]N[/tex] can be factored easily.

    If one thinks carefully about the algorithm for polynomial multiplication you can see that numbers defined by (3) above are quite special with respect to the base [tex]a[/tex].

    In order to grasp the full implications of this theorem we need to think about two common algorithms.

    1) The first is the base [tex]a[/tex] expansion algorithm used to write a natural number as a sum of powers of a. For example the number [tex]14[/tex] can be written as a sum of powers of [tex]3[/tex] as [tex]14 = 3^2+3+2[/tex]. In general this algorithm can be used to write

    (4) [tex] N_a = c_k a^k + c_{k-1} a^{k-1} + ... + c_1 a + c_0[/tex]

    Where some of the [tex]c_j[/tex] may be zero and all are less than [tex]a[/tex]. I will denote this algorithm by the capital letter [tex]\beta[/tex] so that [tex]\beta (N,a)[/tex] is equivalent to (4) above.

    2) The second algorithm is polynomial multiplication. Now this algorithm proceeds by steps. We first multiply every term by every other term, then we collect terms. Denote this by [tex]P[/tex]. The difference between the result of this algorithm and the base [tex]a[/tex] expansion algorithm is best understood by looking closely at a third related algorithm.

    3) The multiplication algorithm for natural numbers is really equivalent to polynomial multiplication taken one step further by carrying so that none of the digits are greater than the base and then evaluation of the resulting polynomial at the base value [tex]a[/tex]. Denote this carry step by [tex]C[/tex].

    It follows that if [tex]N = ST[/tex] then [tex]\beta (N,a) = C(P(\beta (S,a),\beta (T,a)))[/tex] and under certain special conditions [tex]\beta(N,a)=P(\beta(S,a),\beta(T,a))[/tex]. One of these special conditions is the thereom stated above. This happens because of two things. First all coefficients of the summation terms on the right are equal to one in the base and the powers that do appear do not cause any duplication of powers, thus there is no carrying.

    In the posts that follow I will try to highlight the implications of this. One of the first is that though it would appear that we are looking at a very restricted class of factorizations, in reality a shift operator can often be applied to [tex]\beta(N,a)[/tex] to arrive at a new base where coefficients are all one and the theorem applies. This is typically possible when [tex]\beta(N,a)[/tex] is not the result of carrying, or more formally there exists a pair of factors of [tex]N[/tex] such that the base [tex]a[/tex] expansion and subsequent multiplication produces a polynomial in [tex]a[/tex] with precisely the same coefficients as the base [tex]a[/tex] expansion of [tex]N[/tex].

    Clearly this all plays around in an uncomfortable way with our notions of variables and constants. However if you look at it carefully you will see that there is merit here. Furthermore there exist shift operators that can transform certain base [tex]a[/tex] expansions into the form given in (1) above. I will talk about those next.
    Last edited: Jun 30, 2006
  2. jcsd
  3. Jun 30, 2006 #2
    Shift Operators

    Given natural numbers n, m, s

    (1) [tex]\sum_{i=0}^n a^i = \sum_{i=0}^s a^{(m+1)i} \sum_{i=0}^m a^i[/tex]


    (2) [tex]n+1=(m+1)(s+1)[/tex]

    Now suppose we look let [tex]n=3[/tex] then [tex]n+1=4[/tex] the only non-trivial solution of (2) above is [tex]m=s=1[/tex]. So (1) above becomes

    (3) [tex]\sum_{i=0}^3 a^i = \sum_{i=0}^1 a^{2i} \sum_{i=0}^1 a^i[/tex]

    The expansion gives.

    (4) [tex]a^3+a^2+a+1 = (a^2+1)(a+1)[/tex]

    Thinking of the left hand side as a polynomial in [tex]a[/tex], for every natural number value of [tex]a[/tex] there is a corresponding natural number [tex]N[/tex] which is easily factored using (4) above. The following table lists the first seven numbers and there factors.

    [tex]\begin{array}{ c|c|c|c |} a&N&S&T\\

    The easiest related polynomial can be found using the values in the fourth column. Notice that the second differences of these values are all equal to [tex]2[/tex]. So if we wish a relationship where [tex]k=a-1[/tex] we must solve the system

    [tex]f(k)=\int\int C_0 dk[/tex]

    [Note: the choice of initial conditions here is somewhat arbitrary, we only require that the resulting coefficients be greater than or equal to zero.]

    Solving this system is simple enough and yields


    [tex]C_0 = 2[/tex]

    Working this back into (4) above gives

    (5) [tex]k^3+4k^2+6k+4=(k^2+2k+2)(k+2)[/tex]

    Thinking about the base [tex]k[/tex] expansion algorithm suppose we have

    (6) [tex]\beta(N,k)=k^3+4k^2+6k+4[/tex]

    Then so long as [tex]k[/tex] is one of those special natural numbers for which [tex]\beta(N,k)=P(\beta(S,k),\beta(T,k))[/tex] for some factor pair of [tex]N[/tex] we can find a translation of the polynomial (5) which transforms it into the form (4). There is a vector space approach to this but we will not look at it now. In general we want to look at translations of (5) where [tex]k=a-s[/tex] for some integer [tex]s[/tex]. It should be clear that we can find desirable values for [tex]s[/tex] by solving [tex](a-s)^3+4(a-s)^2+6(a-s)+4=1[/tex] when [tex]a=0[/tex] that is

    (7) [tex]-s^3+4s^2-6s+3=0[/tex]

    I used Excel to see that [tex]s=1[/tex] is the solution but Newton's Method is a good first start in general.

    So what does this suggest? Firstly there are deep and magnificent connections between the multiplicative structure of the natural numbers and that of the polynomials taken over the natural numbers. Important questions remain to be answered. I cannot claim to be able to answer them and perhaps some higher level of abstraction has already done so. One question that is important is this. Given a natural number [tex]N[/tex] does there always exist a base [tex]a[/tex] such that at least one factor pair of [tex]N[/tex] both have base [tex]a[/tex] expansions which multiply without the need to carry? If this is true then the ideas laid out here can factor any number. A second important question here is how fast are the algorithms determined by this construct. Clearly for (1) above we have the luxury of factoring a much smaller number than the one we started with. But in cases where we do not start with all of the coefficients equal to one, how long will it take us to find solutions to the translation equation (7)? Finally, if there exist composite natural numbers for which all factor pairs result in carrying for all possible bases is there an extension of the ideas here that describes efficient algorithms for undoing the carrying process? I will try to answer some of these questions as time goes on, right now I am hoping to find my first real job and looking for it is taking up most of my time.


    Last edited: Jul 1, 2006
  4. Jul 1, 2006 #3
    A Reducibility Theorem

    For every composite natural number [tex]N[/tex] there must be a reducible polynomial [tex]p(x)[/tex] over the natural numbers and a natural number [tex]a[/tex] for which [tex]N=p(a)[/tex] and at least one factor pair [tex]s[/tex] and [tex]t[/tex] of [tex]p[/tex] evaluated at [tex]a[/tex] satisfy [tex]N=p(a)=s(a)t(a)[/tex].


    Let [tex]\beta[/tex] be the base [tex]a[/tex] expansion algorithm, [tex]P[/tex] be polynomial multiplication, and [tex]C[/tex] be the carrying algorithm that is applied in standard multiplication of natural numbers over a base.

    Since [tex]N[/tex] is composite it has at least one non-trivial factor pair. It can be shown that [tex]\beta (N,a) = C(P(\beta(S,a),\beta(T,a)))[/tex] in all cases. Therefore [tex]C^{-1}(\beta(N,a))[/tex] is a subset of the polynomials over the natural numbers evaluated at [tex]a[/tex] and contains only reducible polynomials. Clearly [tex]P(\beta(S,a),\beta(T,a))[/tex] must be an element of this set.
    Last edited: Jul 1, 2006
  5. Jul 1, 2006 #4
    This last thereom is somewhat deceptive because the carrying algorithm [tex]C[/tex] actually has many steps and possible paths so that there is actually a graph that must be searched. Although we generally proceed by carrying from smallest power to largest, when we start with the base [tex]a[/tex] expansion of [tex]N[/tex] we do not know which of the many possible reductions corresponds to the original polynomial multiplication coefficients in general. The point of (1) in the first post is that it guarantees that no carrying happens at all so [tex]C[/tex] is and identify operator under those circumstances. The complexity of [tex]C[/tex] and [tex]C_{-1}[/tex] are both important generally. We can assume that a base [tex]a[/tex] expansion of any factor pair starts us off with cofficients that are less than [tex]a[/tex]. Consider the elementary example


    If we have [tex]\beta(N,a)=C_2a^2+C_1a+C_0[/tex] equating coefficients gives the system


    This system is indeterminate having three equations and four unknowns. However if [tex]C_0=1[/tex] then by the rules of the game the system becomes


    And is quite easily solvable. The shift operator sa mention above does precisely this. Also if [tex]C_0[/tex] is confirmed to be prime the system is easily solvable.

    But suppose that


    Now the carrying algorithm must be inverted and the system that must be solved looks like


    Really three equations in five unknowns leaving a very uncomfortable two dimensional search ahead of us. Some advance can be made by noting that under the rules of the game [tex]c_0b_0 \leq (a-1)^2[/tex] so that [tex]0 \leq r_0 \leq a-1[/tex]. Furthermore chosing [tex]a[/tex] relatively prime to [tex]N[/tex] offers some advantages if the reduced residue system of [tex]a[/tex] is easily obtainable. Coming from the other side factoring [tex]C_2+ar_2[/tex] can also break the forutne cookie wide open so to speak. These arguments are know by every high school pre-cal student. The more interesting set of questions centers around particular natural numbers.
    Last edited: Jul 1, 2006
  6. Jul 1, 2006 #5
    The Power of One

    The preceding arguments show that the number one is our ally when it appears in certain places in an expansion. We like to see the number one appear in the leading and/or the trailing coefficient. It is also good see primes appear in the leading and trailing coefficients. So when does this happen? The trailing coefficient is actually calculable as [tex]N modulo a[/tex]. We need to solve the equation [tex] N \equiv 1 mod a[/tex].

    Clearly [tex] N \equiv 1 mod N-1[/tex] but no meaningful factors of [tex]N[/tex] can be expanded in this base. We need to find bases less than the square root of [tex]N[/tex]. But look suppose [tex]a_1[/tex] is such a base then [tex]N=a_1x+1[/tex] and is is quite clear that [tex]a_1|N-1[/tex]. So perhaps we should factor [tex]N-1[/tex]? Assuming [tex]N[/tex] was odd we now have to find [tex]a_2|(N-1)/2 -1[/tex]. In this fashion we can reduce the number to be factored until it is clearly trivial.

    Example: Factor 201

    I first try to factor 200 by removing powers of two to obtain [tex]200=2^825[/tex]. now supposing that twenty-five is still too large I try to factor 24 by removing powers of two and obtain [tex]24=2^33[/tex]. So now I can write twenty-five in terms of any of the cators of twenty-four as [tex]{2*12+1, 4*6+1, 6*4+1, 8*3+1, 12*2+1}[/tex]. Because I want all factor pairs of 25 to be grater than whatever base I choose I consider those significantly less than the square root of twenty-five. So I set up the equation

    (1) [tex]25 = 2*12+1= 2z+1 = (2x+1)(2y+1)[/tex]

    This can be simplified to obtain

    (2) [tex] 12 =2xy+x+y[/tex]

    (3) [tex] y = \frac {12-x} {2x+1}[/tex]

    Starting with [tex]x=1[/tex] and working upward I find that [tex]x=2[/tex] gives a whole number solution and twenty five is factored as five times five.

    So the factorization of 200 is [tex]200=2^35^2[/tex]. Applying this to the factorization of 201 I note that the square root of 201 is smaller than 15 but larger than 14. Set up the equation

    (4) [tex] 201 = 2*100+1 = 2z+1= (2x+1)(2y+1)[/tex]

    (5) [tex]y = \frac {100-x} {2x+1}[/tex]

    Luckily for us w find that [tex]x=1[/tex] if a whole number solution. And 201 = 3*67. If we had gotten greedy and skipped the base two expansion we would have missed the factor three, or would we?

    Suppose w had chosen the base five, giving

    (6) [tex] 201 = 5*40+1 = 5z+1= (5x+s)(5y+t)[/tex]

    Now there are several pairs (s,t) such that st is congruent to one mod five (four pairs to be precise) they are {(4,4),(2,3),(3,2) (1,1)}. Since x and y remain free until later we can consider (2,3) and (3,2) to be the same choice. So picking at random we have only one third chance of getting it right in which case we also need to start with x=0 or we miss the boat all together.

    These difficulties suggest that we want to use bases that take care of the possible small prime factors right off or else simply divide by the first so many small primes. This is usually recommended even by fellows like Pollard.

    How is this related to the formulation that started the thread?

    Consider the following true statement. Given natural numbers n, m, s

    (1) [tex]\sum_{i=0}^n a^i = \sum_{i=0}^s a^{(m+1)i} \sum_{i=0}^m a^i[/tex]


    (2) [tex]n+1=(m+1)(s+1)[/tex]

    If you can, extend this summation to cover cases where

    [tex]\sum a^kb^i = \sum a^i \sum b^k[/tex]

    Remember the solution must resolve to (1) above when a=b. Good luck!
    Last edited: Jul 1, 2006
  7. Jul 1, 2006 #6
    Of course it's clear that

    [tex]\sum_{i=0}^n (ab)^i = \sum_{i=0}^s (ab)^{(m+1)i} \sum_{i=0}^m (ab)^i[/tex]

    And it follows that

    [tex]\sum_{i=0}^n a^ib^i = \sum_{i=0}^s a^{(m+1)i}b^{(m+1)i} \sum_{i=0}^m a^ib^i[/tex]

    Implying that both a and b divide

    [tex]\sum_{i=1}^n a^ib^i = \sum_{i=1}^s a^{(m+1)i}b^{(m+1)i} \sum_{i=1}^m a^ib^i[/tex]

    So that ab must be factor of

    [tex]\sum_{i=0}^{n-1} a^ib^i[/tex]

    If it is still true that n-1+1 is composite then we can write

    [tex]\sum_{i=1}^{n-1} a^ib^i = \sum_{i=1}^{s_1} a^{(m_1+1)i}b^{(m_1+1)i} \sum_{i=1}^{m_1} a^ib^i[/tex]


    [tex]n =(m_1+1)(s_1+1)[/tex]

    this reduction can continue to happen step by step as long as the necessary conditions exist. These things are very easy to see but I am looking for something more complicated I think. I could be wrong. That might seem a waste of time but it says we can build factorizations of larger numbers from factorizations of smaller numbers using (1) above so long as the upper limit of summation n and n+1 are both composite. This does not mean much for very small n but when n becomes larger the number of primes in any randomly chosen interval is much reduced.
    Last edited: Jul 1, 2006
  8. Jul 1, 2006 #7
    "Now there are several pairs (s,t) such that st is congruent to one mod five (four pairs to be precise) they are {(4,4),(2,3),(3,2) (1,1)}. Since x and y remain free until later we can consider (2,3) and (3,2) to be the same choice. So picking at random we have only one third chance of getting it right in which case we also need to start with x=0 or we miss the boat all together.

    These difficulties suggest that we want to use bases that take care of the possible small prime factors right off or else simply divide by the first so many small primes. This is usually recommended even by fellows like Pollard."

    The sequence of bases that is formed as follows has special properties with regard to size of reduced residue systems locally in the natural numbers.

    (1a) [tex]\{\Pi_{i=1}^k p_i | k = 1 ... L\}[/tex]

    Here L is determined by the natural number N we wish to factor. This sequence has very special properties. Firstly the bases [tex]a[/tex] are products of the first k primes, so by first confirming that N and a are relatively prime we have cleared out those small factors. If it turns out other wise (the Euclidean Algorithm is fast by the way) we simply remove the common factor and start again.
    Because the reduced residue system is closed under multiplication modulo a we can factor the trailing coefficient of any base a coefficient in a finite number of steps simply by looking at the tables for that base. Consider base 30, where thirty is the product of 2,3 and 5. Suppose N is congruent to 1 mod 30 then any two factor pairs of N must be congruent to one of the following ordered pairs modulo thirty.

    (2a) {(1,1),(7,13),(11,11),(17,23),(19,19),(29,29)}

    There would be eight except symmetry reduces this slightly. The size of the reduced residue system for elements of (1a) above is simply to calculate as Euler's totient

    (3a) [tex]\phi(p_kp_{k-1}...3*2)=\Pi_{i=1}^k (p_i-1)[/tex]

    Furthermore one can look at the set (2a) as the kernel of the map

    (4a) [tex]mod 30 :rrs_{30} \otimes rrs_{30} \Rightarrow rrs_{30}[/tex]

    Interestingly we only need to store the kernel of this map in it's full form ordered by first entries as

    (5a) {(1,1),(7,13),(11,11),(13,7),(17,23),(19,19),(23,17),(29,29)}

    Because from this we can obtain the relevant factorization of any reduced residue modulo thirty by simply multiplying the first element of each ordered pair by the residue. Suppose rather than being congruent to one modulo thirty our natural number was congruent to seven mod thirty. Then we obtain the possible factorizations as mentioned

    (6a) {(7,1),(19,13),(17,11),(1,7),(29,23),(13,19),(11,17),(23,29)}

    Reordering and reducing by symmetry we get

    (7a) {(1,7),(13,19),(11,17),(23,29)}

    These coorespond to factorization equations for example as

    (8a) [tex]N=30z+7=(30x+13)(30y+19)[/tex]

    The search area here is calculable so the number of steps in the whole algorithm can be accurately estimated.

    Now the sequence in (3a) is also special because of the relationship between reduced residue systems of consecutive elements. Consider 6=2*3 and 30=2*3*5. It is clear that every element of the rrs mod thirty is relatively prime to six. So we can generate the rrs mod thirty from the rrs mod 6 as follows. Simply produce the table

    [tex]\begin{array}{ 1|c|c|c|c |} x/r&1&2&3&4\\

    The elements are calculated as 6x+r. Clearly 25 is not in rrs mod thirty but we can quickly remove this in this case because it is not relatively prime to 5 which will be a factor of the new base (30). One must remain so we have quickly obtained the set rrs mod 30 ={1,7,11,13,17,19,23,29}. In general the elements of the array that must be removed can actually be generated from the rrs of the smaller base by simply multiplying the rrs by the new prime factor under standard multiplication. In the above example we want

    (9a) [tex]5\{1, 5\} = \{5,25\}[/tex]

    So in general using sequence (1a)

    (10a) [tex]rrs_{k+1}=\{x \Pi_{i=1}^k p_i+rrs_k |x = 1...p_{k+1}-1\} - \{p_{k+1}rrs_k\}[/tex]

    These arguments define a class of sieves that can approach the efficiency one would get if we knew all primes less than the square root of a natural number. The beauty of course is we do not need to know those primes. So we trade some of the work necessary to find those primes for a task we can do in a knowable number of steps.

    Some theorems that can be proved as excercises.

    1) Show that [tex]\phi(p_kp_{k-1}...3*2)<\phi(n) \forall n > p_kp_{k-1}...3*2[/tex]

    2) Show that if k is a square free natural number [tex]\phi(k)\leq\phi(n) \forall n > k[/tex]

    Notice that there are infinitely many 'ladders' of square free natural numbers that can be used to build sieves just as the sequence (1a) was used and that these sieves are 'locally' minimal as a choice of base for factoring natural numbers by these methods.

    The final step in this whole process is finding a way to put the approach looked at in the earlier posts of this thread and this last approach into one comprehensive theory.
    Last edited: Jul 1, 2006
  9. Jul 1, 2006 #8


    User Avatar
    Science Advisor
    Homework Helper

    Why is that? If N is of the form 1 + ... + an, then how do we know all factor pairs of N contain one factor of the form 1 + ... + am? In other words, why is it clear that if a number has a base-a expansion 111....111 that any factor pair of that number has at least one of the factors also being in the form 111....111 (but with fewer 1's normally, of course)?

    Also, for your interest, [itex]\beta[/itex] is not a capital letter. Capital Beta looks just like the captial roman B. When using LaTeX, if you write a greek letter in lower case, like so \pi, then you'll get the lower case [itex]\pi[/itex] and if you write it like \Pi then you'll get [itex]\Pi[/itex]. Also, when using LaTeX in the same line as your text, use "itex" tags instead of "tex". So you'd write [ itex ]\phi ^2 - \phi - 1 = 0[ /itex ] to have the markup right in line [itex]\phi ^2 - \phi - 1 = 0[/itex]. To have it bigger and on a separate line:

    [ tex ]\phi ^2 - \phi - 1 = 0[ /tex ]

    [tex]\phi ^2 - \phi - 1 = 0[/tex]
    The last part, that at least one factor pair s and t satisfies N = p(a) = s(a)t(a) seems redundant from the fact that p is reducible and N is composite. I don't quite understand the point of your proof, the result seems trivial.

    Let N be composite, say N = nm for n, m > 1. If you define 0 to be a natural, let a be any natural less than or equal to min{n,m}. Otherwise, let a be any natural strictly less than min{n,m}. Let p(x) = (x + (n-a))(x + (m-a)).

    I'm in a rush right now, but I'll look at posts 2, 4, and 5 later.
  10. Jul 1, 2006 #9
    Thanks AKG for the Latex help. The thereom itself in the first post is correct. Whenever the conditions are met some subset of factor pairs of [itex]N[/itex] is mapped in a one to one and onto the full set of factor pairs of [itex]n+1[/itex]. I realize there is formatting that needs to be done to these posts. I will get to it as time goes on. But I do see your response being similar to others I have seen about the first theorem. It is not at all the way number theory teaches us to think about factorization. But in fact for special [itex]N[/itex] it accomplishes the desirable task of lettings us try to factor something much smaller. The reducibility theorem does not say that every composite [itex]N[/itex] can be put into a form satisfying the first theorem. Rather, is simply says we can know that there must exist a base for which the expansion when thought of as a polynomial in a is reducible. You'll see what I mean if you look at all the posts together, but I warn you I have never seen anyone take the time so far to think through these things carefully. The whole project is much more intricate than it first appears. Later I will start doing posts where I use these ideas to shed light on results in elementary number theory. It gets very interesting. You see until now numbe theory is mostly treated as collection of almost unrelated theories. Very few books try to identify unifying principles.

    Oh here are some important and little understood questions. The distribution of primes is known to satisfy the Prime Number Theorem, this is a limit theorem. For any finite [itex]n[/itex], [itex]\pi(n)[/tex] can only be approximated at best by the real valued functions currently used. In reality the number of primes seems to jump around [itex]\frac {x} {log x}[/itex]. Distances between primes also seem to be randomly distributed but generally beomce larger as we go further out. The facts are that the distribution of primes is formed in complement with the multiplication tables of natural numbers, the are a discrete phenomenon that cannot be precisely modeled using anything save perhaps a fourier series of some type. I know everyone is looking at Reimann and it is nice, but trying to stay within the descrete nature of the problem as much as possible is the avenue I am pursuing.

    Are you a person who can visualize things? Some math folks think in euqations. But visualize this.

    Every prime is the seed of a wave form that crosses the positive axis (n) only when p divides n in the strict natural number sense. Now if n=30 we are looking at the place on the positive axis where the wave forms of 2 3 and 5 are all zero at the same time. There is a purpose to what I am saying here. The prime numbers can only exist in wholes in the pattern. that is once a prime is discovered think of it as this wave form. Every place it touches the axis no prime can exist there. As we go further out more and more primes are discovered each with a unique period. These periodic waves 'punch wholes' in the number line every place n is composite. The places where no wave form touched the axis are prime. Amazingly we can reach forward a little bit so that we can actually find larger primes by knowing primes.

    Let me see if I can write this function down (Lagrange did something similar to this I think).

    (A) [tex]\Psi(n)= K \Pi_{i=1}^{k| p_k<\sqrt n} \sin^2\frac {n\pi}{p_i}[/tex]

    Now clearly only those terms formed by prime divisors of n will be zero. K is a scaling factor. The function is positive definite. If you pick the set of primes ahead of time then you see that for instance if we use 2 3 and 5

    (B) [tex]\Psi_{30}(n)=K\sin^2\frac {n\pi}{2} \sin^2\frac {n\pi}{3} \sin^2\frac {n\pi}{5}[/tex]

    [Note: Plot this for natural numbers n using Excel and notice the pattern that arises.]

    If you plot (B) for n between 7 and 49, every non-zero result is prime (Why is this true? it is true, but why?). if we add a term for eleven and plot for n between 11 and 121 same thing every non-zero value corresponds to prime n. We can do this but it saves us no time checking each one. What we want to do is start at seven, then use it to find every prime up to 49, now add terms for each of these. Now the interval of interst is 47 to 47^2 etc. you see that if we are good at this it gets faster. Still we are talking about checking every number apparently. However in a future post I will show how to use the modular concepts mentioned above to speed this puppy up massively, to aobtain a 'small' set of candidates fomr the set {p_k, ..., p_k^2} and to determine just how many numbers we need to look at to factor a number N.

    Here is how it works, for a number with only two large prime divisors, the single divisor that is less than the square root of N is relatively speaking quite close to root N. The class of functions defined in (A) above will require that I know all of the primes up to the fourth root of N or the root of root N. For your typical RSA numbers N may have 1000 digits. Root N will have something like 500 ish digits, and root root N will have about 250 digits. So to even try to attack an RSA we would need to know all the primes up to say 10^250, write out the rrs modulo the product of these primes in the neighbor hood of root N. (Of course this is impossible.) Then once we have applied the modular sieve, we test each of those numbers using the euclidean algorithm compared to reasonable sized products of those primes. We might even like to categorize intervals near root N using some of these probability methods which tell us likelihoods of a factor in those areas etc. In the posts that follow I will develop this algorithm. The beauty of it of course is that we can give upper bounds on time for the algorithm to complete. I am not at all claiming it is polynomial time but all that will have to be looked at. Using a distributed programming model over the internet I think time wise we might be quite surprised.

    NOTE: This thread is work in progress, I am using it as a scratchpad to investigate how a suite of observations and ideas I have had over the last ten years might relate to one another to produce factorization algorithms, and the usefulness of those algorithms. The over arching goal is to be true to the traditional confines of elementary number theory as much as possible, this is an ethic I am holding myself to mostly because I do not believe that the methods have been fully appreciated or developed in our mad rush to conquer the universe using ever more powerful forms of abstraction and functional reasoning.
    Last edited: Jul 1, 2006
  11. Jul 1, 2006 #10


    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    My attention span isn't what it's used to be. It's hard for me to follow your posts, Playdo, without some sort of summary or outline that describes the plot. :frown:

    P.S. if we write N=PQ in base P, then we don't have to do any carrying when we multiply P and Q.
  12. Jul 1, 2006 #11


    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    Somehow, I missed your latest post when I wrote my previous reply!

    If you're interested in the general problem if finding all the primes up to a given size, you might consider looking for info on wheel sieves -- I think they're the current state of the art.

    If, to factor N, your method requires you to know all of the primes up to N^(1/4), then I'm sorry to say that in its current form, it is not useful for factoring a very large number. Any of the sieving algorithms will have factored N in much less time than it takes to simply write down that list of primes... let alone compute it! (Of course, I have that sneaky feeling that I've just delivered a set-up line! :smile:)
    Last edited: Jul 1, 2006
  13. Jul 2, 2006 #12


    User Avatar
    Science Advisor
    Homework Helper

    A few random comments:

    This is not an equality.

    only if ab divides 1.

    Theose arguments 1a-10a don't appear to be anything other than trial division by every number less than sqrt(N) that is relatively prime with 2*3*5*...pk or however far you went. Convince me this is better than making a list of primes up to sqrt(N) and dividing by these.

    This is false. 5 is square free but phi(5)=4>2=phi(6). k doesn't even have to be prime to be a counterexample, phi(21)=12>10=phi(22).


    Why not implement this in some way to test it out? You'll have a much easier time convvincing people it's worthwhile if you have a working program. Or just write down the steps you'd use to factor a number completely in one cleanly formatted post. You don't have to explain the "why" of the steps, just give the bare bones algorithm. An estimate for the runtime would go a long way as well (you've mentioned a few places where you can count the steps- please do it)
  14. Jul 3, 2006 #13
    Last edited: Jul 3, 2006
  15. Jul 3, 2006 #14
    Yes but to write N=PQ in base P or Q you have to know P or Q in advance. This defeats the whole purpose of a factorization algorithm. But I agree as something worth noting, P base P is 10 so if Q<P then the base P expansion of N is Q0. If Q is greater than P then the base P expansion of N is 10*Q. Of course it says we can simply divide N by P to obtain Q.
  16. Jul 3, 2006 #15

    Certain classes of sieves use mappings that for special numbers are faster than if we just knew the number of primes less than root N. But in general sieves can only hope to approach the speed of knowing all the primes less than root N apriori. The point is that we do not know those primes less than root N for large N because they are very large. If we did knwo them we simply do the euclidean algorithm on N and the product of the primes less than root N. It takes much memory but the time involved can be stimated and we are done. But since for very large N we cannot know all the primes less than root N, we must find ways of either cleverly using more abstract spaces than the natural numbers under multiplication or we have to find methods for developing proxy sets for the set of primes less than N. Modular sieves do precisely this last. Notice that for the sequence of products of primes b[k] = 2, 6, 30, 210 ,2310 etc. [itex]\phi(b[k])-k[/itex] is a decent approximation of the number of primes less than b[k] not already included as factors of b[k]. Now the theorem that starts this thread highlights certain very special natural numbers N for which a subset of divisors base a can be mapped onto the set of divisors of n+1 base 10. It would stand to reason that we might want to generalize this by thinking of the conditional [itex]n+1=(m+1)(s+1)[/itex] not simply in base ten but in any base whatever. So suppose we are looking at base a and base b, where in base b the condition holds. But how does summation work using indexes from other bases than ten?
    Last edited: Jul 3, 2006
  17. Jul 3, 2006 #16


    User Avatar
    Science Advisor
    Homework Helper

    I was aware of the condition n+1=(m+1)(s+1). what I quoted:

    [tex]\sum_{i=1}^n a^ib^i = \sum_{i=1}^s a^{(m+1)i}b^{(m+1)i} \sum_{i=1}^m a^ib^i[/tex]

    is still not an equality. If you meant the indicies to start at 0 and not 1, then it's an equality, but then your statement that a and b divide the above is not in general true.

    What do you mean tau to be here? tau(k) is usually the number of positive divisors of k, but this is surely not what you mean.

    Whatever you do, don't go back and start editing your posts long after they've been put up. I can't speak for anyone else, but I won't bother going back to look for what you've changed days after the fact.

    The general number field sieve is the current champ I believe. I don't know what you mean by a "modular sieve" for factoring integers.

    Every author has to walk the line between insulting the reader with simple details (hence needlessly bogging down the text) and leaving out so much that it's too hard to follow for their intended audience. There's no level of detail that will satisfy everyone.
  18. Jul 3, 2006 #17


    User Avatar
    Science Advisor
    Homework Helper

    This is actually an extremely poor approximation. The usual notation for your b[k] is p#, where p would be the kth prime (p# is known as the primorial) . That is 2#=2, 3#=6, 5#=30, etc. p# is asymptotic to e^p (follows from the prime number theorem). pi(e^p) is asymptotic to e^p/p (prime number theorem again). phi(p#) is asymptotic to e^p * e^(-gamma)/log(p), where gamma is euler's constant (subtracting off the k has no effect on this asymptotic, e^p/log(p) is much much larger than p, let alone k). This last one comes from Mertens asymptotic for the product of (1-1/p) ranging over primes up to x.

    The result is pi(p#)/phi(p#)->0 as p-> infinity, not a good approximation at all. In vague words, most of the numbers less than p# are not prime because they have a prime divisor greater than p, these aren't removed when you sieve out by primes less than p. This is just a consequence of the exponential rate of growth of p# that makes the composites removed by sieving out by primes less than p somewhat inconsequential.

    You don't have to go far to see this approximation get bad.
    pi(11#)=pi(2310)=343 but phi(11#)-5=475
    pi(13#)=pi(30030)=3248 but phi(13#)-6=5754
  19. Jul 3, 2006 #18


    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    That's not true at all.

    Yes, we can. An extremely generous lower bound on the effort required is that it this approach would requires O(N / lg N) iterations of the Euclidean algorithm:

    There are O(N / lg N) primes less than the square root of N. Each of them is at least as big as 2. The product of those primes is 2^O(N / lg N). The number of iterations required by the Euclidean algorithm is on the order of the logarithm of the arguments. Thus, we require O(N / lg N) iterations.

    So, even if you were handed on a silver platter the product of the primes less than the root of N, and you had a special computer that was capable of performing the necessary arithmetic, you still can't beat the sieving algorithms when looking at very large numbers.

    (For reference, see Wikipedia for the running time of the GNFS)

    Incidentally, performing one giant GCD is not the most efficient way to trial divide by all of your primes -- even if you got their product for free. I think you get the best performance when you partition your primes so that the product within each partition is roughly the same size as N.

    None of the high powered factoring algorithms are based on division at all:

    The Pollard rho method iteratively applies a polynomial map. When that map, viewed modulo some prime factor of N, starts cycling, that factor can be extracted through a gcd computation. (The elliptic curve method is better, and I think it does essentially the same thing on an elliptic curve, but I don't have reference material handy to make sure)

    The sieves are based on the fact that numbers that squaring is a 4-1 (or more) function modulo N. Given a pair of numbers that square to the same thing (and aren't negations of each other), you can generally extract a prime factor of N through a gcd computation.
    Last edited: Jul 3, 2006
  20. Jul 3, 2006 #19


    User Avatar
    Science Advisor
    Homework Helper

    That's not very generous! You missed out on a square root.

    The product of primes less than sqrt(N) is asymptotic to e^sqrt(N), so you're looking at sqrt(N) operations. However this requires the ability to do computations with an obscenely large number. For a sanity check, how large can we take p before p# gets beyond the storage space of a typical desktop computer?

    (just a note for comparisons with the gnfs, the bound given in wiki has n the number of digits of your number, the N above is the number itself)
  21. Jul 3, 2006 #20


    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    Oh hah, good catch. I mentally applied the logarithm too soon, and was thinking about N/2 instead of the square root of N. :frown:

    Do you know the constant out front? I'll assume it's 1. I would say 4 gigabytes is a good estimate for the physical memory of a desktop. It's higher than most people have now, but not all that much... and it will probably be the norm soon anyways.

    So, we have 2^35 bits to play with. So, solving 2^(2^35) = e^sqrt(N)...

    2^(2^35) = e^sqrt(N) = 2^(sqrt(N) lg e)
    2^35 = sqrt(N) lg e
    N = 2^70 / (lg e)^2
    N ~ 2^69 ~ 10^21

    So, N could be 21 decimal digits, at most.

    If we use hard drive space -- let's be generous and assume a terabyte of storage. With 2^43 bits to play with, that works out to

    N ~ 2^85 ~ 10^26

    So 26 decimal digits, at most.

    An RSA number of that size can be factored fairly instantly, right?
    Last edited: Jul 3, 2006
  22. Jul 3, 2006 #21


    User Avatar
    Science Advisor
    Homework Helper

    To answer myself, p# is about e^p, so has p/log(2) digits in base-2. Divide by 2^33 to get the number of gigabytes. If p is about 10^12, this is about 168 gigs.

    edit- hah- Hurkyl's edit beat my crude post! End result- if you had to store the product of primes less than sqrt(N) to factor N, N can't be very large.

    The constant in front is 1, if you mean in p#~e^p. It's a logarithm away from the prime number theorem.
    Last edited: Jul 3, 2006
  23. Jul 3, 2006 #22


    User Avatar
    Science Advisor
    Homework Helper

    Yep. Using the built in factor function of PARI/GP it takes about 3.3 seconds to factor one hundred 26-digit numbers made up of 13 digit primes on an Athlon64 3000+. The built in algorithm uses a combination of methods, and I admit I didn't build my 26-digit numbers to specifically try to foil these like you would normally do with RSA numbers (they were made from random 13 digits primes with no special considerations), but it gives a rough idea of what can be done.

    For comparison, the same test took 75 seconds on one hundred 42 digit numbers made from two 21 digit primes.
  24. Jul 4, 2006 #23


    User Avatar
    Science Advisor
    Homework Helper

    Sorry, this is wrong, a silly mistake too. logp#~p does not imply the asymptotic I wrote above. This won't affect the number of digit computations above since we were interested in log(p#) anyways.

    It does change my post 17, though not the conclusion. We can still say pi(p#)~p#/log(p#)~p#/p and phi(p#)~(p#)*e^(-gamma)/log(p). Hence pi(p#)/phi(p#)~e^(-gamma)*log(p)/p->0 as p->0.

    The moral- you can still get the correct answer if you bugger up, and knowing the answer is correct is no excuse for such sloppiness.
  25. Jul 5, 2006 #24
    Alphabets and Ordinal Numbers

    Sorry Schmoe, I forgot that I had the index greater than 0. That's just lack of attention to detail leading to a bad argument on my part. after you divide out the ab you get a statement that may be true if the new smaller upper index is one less than a composite.

    So we all agree thee are a known grab bag of algorithms and techniques and preferred notations. Tell me what you think of the following perspective based upon the clear implication of the theorem in post 1.


    Think of a natural number, say two for example, not as a symbol in a particular base but as an abstract idea. Two means a set of order two, in base ten we write this as 2, in base two we write it as 10, for every base greater than base 2 we can write 2. So we could think of all of the base a arithmetics as collections of word built upon alphabets. The alphabet base 2 is {0,1}, the alphabet base 10 is {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}.

    Now given the set of finite ordinal numbers we can use the base a alphabets to assign words (symbols). The thing to observe is that the symbol 99, does not exist in the dictionary for base 2. In fact it does not exist in the dictionary for any base less than 10.

    Realizing that we cannot come up with infinitely many symbols for the natural numbers I have to try to show this truth with the symbols we already have.

    Now look, the first theorem in the first post says that the symbol 1111 appears in every dictionary for every base and it gives us a way of mapping the ordinal position of all of them into base ten. How is this useful?

    As I pointed out the formula allows factorization of larger numbers by finding similar ones with similar multiplicative properties. But look what else it does. Let S[1111] be the set of all base a symbols 1111 mapped into base ten. This set looks like (a=1 is allowed even though it is not a base) {4,15,40,297,....}. Suppose I wanted to factor 15. Simple enough to do normally, but notice I could use S[1111] to simply notice that 40 is divisible by ten. Having the base a connector function for S[1111] I could go up, factor 40 in base 10. Having done this some factor pair of 40 is mapped directly onto a desirable factor pair of 15. This seems wrong but think of this. A large RSA number is going ot have many digits that are nines, so I cannot find a word in a base lower than ten that will be the same. But in bases larger than 10 there are infinitely many words whose digits are precisely those of the RSA number base 10 and we can generate them and map them back into base ten to obtain S[RSA]= {RSA, ,,,}

    The reason we have a chance here is that maps like I depicted in the first theorem do not preserve the entire multiplicative structure. Meaning elements of S[RSA] can have many more than one factor pair containing two large primes. If we find [itex]s \in S[RSA][/itex] where [itex]\tau(s)[/itex] is sufficiently large, we can factor this using only small primes and then by reconstructing every possible factor pair find the one that maps precisely back to the factor pairs of RSA. There are some complicating details heres (some of which I have laid out in the earlier posts) and some things to be proved, like how much larger in general do we need to go to factor RSA using this method. And, what is the limit on tau that makes using this method a waste of time. For instance, if we found s to have only a few small prime factors with very large multiplicity, constructing the possible factor pairs might have too many steps. However if we found s to have small multiplicites and a relatively small number of unique prime divisors this method might save alot of time.

    Q1 Is it always possible to factor every integer using this method?
    Q2 What is the expected value of the number of steps involved?

    Notice also that this method requires knowledge of the number of ways that factor pairs can be formed from a set of prime divisors. In this way it is related to the partition problem, which I believe Ramanujan and Hardy solved around WWI. So there you go. We have our little piece of real estate in the universe of mathematical truth. Some of the property is very expensive algorithmically speaking and some of it is very cheap. Making better algorithms can help cut costs and provide better encryption. In fact I would think that a question of primary interest to cryptographers would be proving that the time required for any factorization algorithm lies between bounds for a given natural number.
  26. Jul 5, 2006 #25


    User Avatar
    Science Advisor
    Homework Helper

    Yes, and when you use these notations to mean something else it gets confusing. So I ask again since you use it again in the above post, what do you mean by [tex]\tau(n)[/tex]? It didn't look like the usual usage before (or the statement you made about phi(k) was wrong).

    You can always just write the RSA number in base 2.

    Your sequence S[1111], etc. are growing very fast when you have alot of digits (as in RSA moduli). I wouldn't hold my breath hoping any of these are composed entirely of small factors. You might want to look up "Dickman's function" and how it relates to "smooth numbers".

    The number of 'factor pairs' is just the number of divisors, which is dead easy if you have the prime factorization. How is this related to the partitions of a number?

    What encryption scheme benefits from improved factoring algorithms? Certainly not RSA, whose security needs factoring to be difficult.

    Am I expecting too much that you will actually try to implement this algorithm? I'd be happy to give you some numbers that are products of two large primes (whatever size you like) for you to test it out on. The next best thing to proving an algorithm is correct combined with a detailed runtime analysis would be a pile of sample calculations.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook