# Practical question about modular multiplication

1. Aug 9, 2006

### CRGreathouse

OK, so I'm writing a program and I want to calculate modular exponents. The only hard part is the multiplications. It would be best if I could work with numbers where, at all intermediate stages, each number is less than 2^64. If I must create a bit array I suppose I'll do that, but this would cause a huge performance drain.

The problem: given x, y, and m, calculate N:
$$N\equiv xy\pmod{m}$$

I may have a modulus as large as 2^63-1, so I started by splitting each of my numbers to be multiplied into two parts:

$$x=x_1\cdot2^{32}+x_2,\;\;0\le x_1<2^{31},\;0\le x_2<2^{32}$$
$$y=y_1\cdot2^{32}+y_2,\;\;0\le y_1<2^{31},\;0\le y_2<2^{32}$$

So we have

$$N\equiv2^{64}x_1y_1+2^{32}(x_1y_2+x_2y_1)+x_2y_2\pmod{m}$$

So far, so good:

$$x_1y_1<2^{62}$$
$$x_1y_2+x_2y_1<2^{64}$$
$$x_2y_2<2^{64}$$

All three intermediate numbers are a workable size, and the constant multiplications can be computed with shifts (it's convenient for binary computers to multiply by powers of 2).

Here's the trouble: How do I combine these without overflowing? Alternately, is there a different set of steps I could take to get to the same result?

2. Aug 10, 2006

### Staff: Mentor

Slightly off-topic: Assuming C/C++, does your compiler implement the
long long datatype as 128 bit? Which would alleviate some of your overflow issues. This comment is motivated by your "64 bit int" statement in another thread.

The 64 bit systems here have 128 bit long long datatypes....sorry if I'm off-topic.

3. Aug 10, 2006

### shmoe

Why not reduce those 3 quantities mod m before you add them? Reduce the powers of 2 as well before multiplying.

4. Aug 10, 2006

### CRGreathouse

I reduce the three quantities, but it doesn't help. Define M(a,b) as the usual % (since LaTeX hates me):

$$M(a,b)\equiv a\pmod{b},\;\;0\le M(a,b)<b$$

Then the best bounds I have for the reduced quantities:

$$M(x_1y_1,m)\le2^{62}-1$$
$$M(x_1y_2+x_2y_1,m)\le2^{63}-2$$
$$M(x_2y_2,m)\le2^{63}-2$$

I can't multiply two numbers together if their product is at least $$2^{64}$$.

Now the powers of 2 I can reduce, but so far only in an inefficient way. I double the number and reduce modulo m 32 and 64 times. This takes 96 shifts and 96 (expensive!) modulo operations, plus loopng overhead.

Last edited: Aug 10, 2006
5. Aug 10, 2006

### CRGreathouse

It's not off-topic at all; my question is very nearly equivilent to the question, "How do I implement a 128-bit integer using only signed and unsigned longs?". Sadly, my compiler has 64-bit long longs.

6. Aug 10, 2006

### Staff: Mentor

You implement it with a bignum library, like GMP, which is an arbitrary precision library for real and integer operations. There is version for most systems, even Windows.

http://swox.com/gmp/ [Broken]

Last edited by a moderator: May 2, 2017
7. Aug 10, 2006

### shmoe

That's fine, you can add two numbers that are less than 2^63 then reduce modulo m, which is all you need after you've multiplied by 2 enough times.

You won't need to reduce modulo m 64 times to multiply by 2^64. You only need to reduce when you get something that's one shift away from 'going over'. If m is 63 bits, you should get to shift twice on average before needing to reduce (assuming random input on each stage, not really true of course). If m is smaller, even less mods wil be needed.

How expensive is the mod operator? I would hope mod(x,y) is zippy if y is nearly the same size as x, so even if m is 63 bits and you have to mod frequently, the operation should be fast no?

there are multiple libraries for large number arithmetic, gmp as mentioned, or bigint. I don't do much computer work myself, most of the time I'll just use maple or pari/gp which can handle big numbers already (and even with the added overhead will still be much faster than I would probably take the time to code).

8. Aug 10, 2006

### CRGreathouse

The mod operator takes maybe 6 times longer than an addition or subtraction. You're right, though; there's no reason to use it when I can just check the bounds each time and subtract (once) if needed. This does bring it down a bit. I don't know how costly this frequent branching would be (hyperpipelined processors don't like conditional commands), but it would surely be faster.

OK, thanks, I'm going to try to write this program up and see how fast it is.