Determining efficiency of multiplication algorithm

ktoz · Jan 20, 2013

Hi

I'm writing a BigNum library for Javascript and have come up with a multiplication algorithm which I think is pretty efficient (a modification of shift/add), but I don't understand O(log ...) notation, so don't really know how to compare my method with something like the Schönhage–Strassen algorithm which has a complexity of O(N log N log log N).

I've determined, that in the worst case, numbers of bit width w require w/2 iterations with my method. Each iteration performs a number of individual steps, but I don't know whether to count those sub steps, or only the iteration count.

Is w/2 enough to define efficiency? Or do I have to get more granular and define every bit manipulation?

Thanks for any help

Adyssa · Jan 20, 2013

This explanation is from my textbook for analyzing time efficiency of (non-recursive) algorithms.

1) Decide on a parameter indicating the input size
2) Identify the algorithm's basic operation
3) Setup a sum expressing the number of times the basic operation is executed
4) Using sum manipulations and standard formulas find a closed form formula for the count and/or establish it's order of growth.

So, first you want to be able to measure the size of your input. For example, if you have an array of integers to represent your bignum, then your input size is the size of the array.

Second, locate the basic operation, generally this is inside the inner loop of the algorithm.

Third, express the number of times this basic operation is executed as a sum, i.e.

[tex]\sum_{i=0}^{n} a_{i}[/tex]

where n is the number of iterations and a is the basic operation.

Four, manipulate the sum to obtain a nice formula.

Here's an example of a matrix multiplication algorithm:

Code:

MatrixMultiply: A*B=C

for (i = 0, i < n, i++)
{
  for (j = 0; j < n; j++)
  {
    for (k = 0; k < n; k++)
    {
      C[i][j] += A[i][k] * B[k][j]
    }
  }
}

It should be clear that we have 3 loops that iterate n times each (from 0 to n-1) and the basic operation is a multiplication and an addition. Let's say the cost is 1 to keep it simple. A sum to represent the number of basic operations for each matrix element i,j is:

[tex]\sum_{k=0}^{n} 1[/tex]

and the total number of basic operations can be represented by:

[tex]\sum_{i=0}^{n} \sum_{j=0}^{n} \sum_{k=0}^{n} 1[/tex]

It should be clear that the innermost sum:[tex]\sum_{k=0}^{n} 1[/tex]

is equal to n, because we sum 1 from k = 0 to k = n. We can reduce the triple sum to:

[tex]\sum_{i=0}^{n} \sum_{j=0}^{n} n[/tex]

and with the same reasoning, reduce this double sum to:

[tex]\sum_{i=0}^{n} n^{2}[/tex]

and further reduce it to:

[tex]n^{3}[/tex]

So the cost of this algorithm, as a function of input size is C(n) = c*n³ where c is the cost of the basic operation, and you would say that the order of growth is O(n³) - the constant c is irrelevant for Big Oh notation because the size of n³ will eclipse any constant c for large n.

This is a bit simplified from the text, but that should give you an example of how to go about calculating the complexity of your algorithm.

ktoz · Jan 20, 2013

Adyssa said:

So the cost of this algorithm, as a function of input size is C(n) = c*n³ where c is the cost of the basic operation, and you would say that the order of growth is O(n³) - the constant c is irrelevant for Big Oh notation because the size of n³ will eclipse any constant c for large n.

Thanks Adyssa. Now let's see if I understand it...

The primary advantage my algorithm imparts is due to the fact that in base 2
v << m + n = v << (m + 1) - n

This is put into practice to reduce the number of add operations. For example, using traditional shift/add

15 * 15
1111 * 1111
-------------------------------------------
11110 + 1111 = 101101 (30 + 15 = 45)
1011010 + 1111 = 1101001 (90 + 15 = 105)
11010010 + 1111 = 11100001 (210 + 15 = 225)
-------------------------------------------
3 shifts, 3 addsUsing subtraction method

15 * 15
1111 * 1111
-------------------------------------------
11110000 - 1111 = 11100001 (240 - 15 = 225)
-------------------------------------------
4 shifts, 1 subtract

So, in general, for bitwidth w, simple shift/add yields worst case of
w - 1 shifts, w - 1 adds

subtraction method yields
w shifts, 1 subtract

Ratio between two methods equals
(w + 1) / (2w - 2)

As w approches infinity, (w + 1) / (2w - 2) approaches 1/2, or twice as fast as shift/add

So, if bits are used as the unit of measure, would that mean shift/subtract would be O(w + 1)?

rcgldr · Jan 20, 2013

Normally shifting isn't used with extended precision multiplication, and the general goal is to reduce the number of multiplies at the expense of increased additions or subtractions (since addition or subtraction is usually faster than multiplication).

Wiki mentions The Karatsuba algorithm was the first multiplication algorithm asymptotically faster than the quadratic "grade school" algorithm.

http://en.wikipedia.org/wiki/Karatsuba_multiplication

Note that the full precision of each word in memory is not used in order to avoid having to deal with carry operations as often. In the wiki article, the numbers are broken up into 31 bit components for usage with 32 bit registers and memory.

From wiki article: As a rule of thumb, Karatsuba is usually faster when the multiplicands are longer than 320–640 bits.

Then there is Toom_Cook. The wiki article doesn't mention at what size numbers that this method is better than classic or Karatsuba.

Algotirhms like Schönhage–Strassen further enhance this by using FFT (Fast Fourier Transform) combined with finite field (rings) math modulo 2ⁿ - 1, which is not only a prime number, but also has a "primitive" b, a number than when multiplied by itself will produce every number in the ring except zero, until bⁿ, in which case bⁿ mod (2ⁿ - 1) = 1. From wiki article: In practice the Schönhage–Strassen algorithm starts to outperform older methods such as Karatsuba and Toom–Cook multiplication for numbers beyond 2^(2^15) to 2^(2^17) (10,000 to 40,000 decimal digits).

A couple of links for number theoretic transform:

Number-theoretic_transform

http://apfloat.org/ntt.html

Normally division is avoided by using an optimized algorithm to invert a number (calculates 1/x), then doing an extended precision multiply. In the case of a series, if possible, the denominator is calculated separately via multiplications, so that inversion of the denominator product is only done once at the end of the algorithm.

These methods get increasling complex, and only save time if the numbers you're working with are fairly large. If the numbers you're working with are not large, then a simpler approach may be more appropriate.

ktoz · Jan 20, 2013

rcgldr said:

Algotirhms like Schönhage–Strassen further enhance this by using FFT (Fast Fourier Transform) combined with finite field (rings) math modulo 2ⁿ - 1, which is not only a prime number, but also has a "primitive" b, a number than when multiplied by itself will produce every number in the ring except zero, until bⁿ, in which case bⁿ mod (2ⁿ - 1) = 1. From wiki article: In practice the Schönhage–Strassen algorithm starts to outperform older methods such as Karatsuba and Toom–Cook multiplication for numbers beyond 2^(2^15) to 2^(2^17) (10,000 to 40,000 decimal digits).

I've been reading those for the last few days. Basically, I'm trying to figure out how to shoehorn my algorithm into the steps outlined in Adyssa's post, so I can understand how shift-subtract compares to those others.

a = multiplicand
b = multiplier
i = b.length - 1;
w = bitwidth of b

while (i > -1)
{
r = GetShift(b);
a = Subtract(Shift(a, r.shift), a);
i = r.index;
}

From Adyssa's post, I'm guessing "GetShift" qualifies as a constant, because it only makes a single pass through b
Worst case, the number of partial shifts in "Shift" equals w
Worst case, the number of partial subtracts in "Subtract" equals w

So, as I interpret Adyssa's post, that would give an "O" value of O(2 N). This is most likely wrong, though, since Electronic usage under Wikipedia multiply, states that shift add has a complexity of Θ(n^2). My second post above clearly shows that shift/subtract approaches 1/2 the number of steps as shift/add. So would that make it Θ(n^2 / 2)?

Regardless of the "O" value, the actual code seems to be pretty zippy. Orders of magnitude faster than some other Javascript BigNum libraries.

rcgldr · Jan 20, 2013

It's not clear to me how your shift subtract method will be used to implement an extended precision multiply. Assume you had 8 bit registers / memory and stored 4 bits or 1 "nibble" of a number per memory location or register. So if you have 32 bit numbers A and B, you split them up into 4 nibbles and you want to determine the product:

(A0 x 2^12 + A1 x 2^8 + A2 x 2^4 + A3) x (B0 x 2^12 + B1 x 2^8 + B2 x 2^4 + B3)

where does your subtract method come into play, and is it significantly different than the "divide and conquer" approach as used by the Karatsuba method?

ktoz · Jan 21, 2013

rcgldr said:

It's not clear to me how your shift subtract method will be used to implement an extended precision multiply. Assume you had 8 bit registers / memory and stored 4 bits or 1 "nibble" of a number per memory location or register. So if you have 32 bit numbers A and B, you split them up into 4 nibbles and you want to determine the product:

(A0 x 2^12 + A1 x 2^8 + A2 x 2^4 + A3) x (B0 x 2^12 + B1 x 2^8 + B2 x 2^4 + B3)

where does your subtract method come into play, and is it significantly different than the "divide and conquer" approach as used by the Karatsuba method?

I haven't really gotten down in the weeds studying advanced methods like Karatsuba, just a cursory read at Wikipedia. The only "divide and conquer" in shift-subtract (SS) is that it stores large numbers in 32 bit chunks. Nothing unexpected there. It derives it's main advantage in the handling of runs of "on" bits. Where shift-add multiplication (SA) performs (run length - 1) adds, SS performs exactly 1 subtract regardless of run length.

In a real world test using a 200,000 bit random number, SS performed 22.35 percent fewer calculations than SA. Don't know how that translates to "O" notation, but a 20+ percent efficiency boost, is respectable.

rcgldr · Jan 21, 2013

ktoz said:

I haven't really gotten down in the weeds studying advanced methods like Karatsuba, just a cursory read at Wikipedia.

It's not that complicated, reduce the situation to two 8 bit numbers store as 4 bit values. Karatsuba reduces the number of multiplies to 3 by using additional adds and subtracts:

(A0 x 2^4 + A1) x (B0 x 2^4 + B1):

X0 = (A0 x B0)
X1 = (A0 + B1) x (A1 + B0)
X2 = (A1 x B1)

(A0 x 2^4 + A1) x (B0 x 2^4 + B1) = X0 x 2^8 + (X1 - X0 - X2) x 2^4 + X2

For larger numbers, perform this algorithm recursively, where the initial A0, A1, ... B0, B1, ... terms are n bits, then on the recursive call, n/2 bits, then on the next layer of recursion n/4 bits, ... .

ktoz said:

In a real world test using a 200,000 bit random number, SS performed 22.35 percent fewer calculations than SA. Don't know how that translates to "O" notation, but a 20+ percent efficiency boost, is respectable.

So you're doing extended precision multiplies by shifting (1 bit by n bit multiplies)? It would be quicker to use the actual hardware multiply, using long hand method (note the product size (number of bits) is double the size of the multiplicand and multiplier on most computers in hardware, and for C, you can choose to only use 1/2 of a memory location or variable to store each group of bits in an extended precision number):

Code:

          A0   A1
x         B0   B1
----------------------
          A1 x B1
     A0 x B1
     A1 x B0
A0 x B0
----------------------
sum of the terms above

The summing process will need to deal with carry propagation

Determining efficiency of multiplication algorithm

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Use of AI (ML/DL) in Science

Other than just FizzBuzz to test programmer candidates

Sweetspot of data compression

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect