Practical question about modular multiplication

AI Thread Summary
The discussion revolves around efficiently calculating modular multiplication for large numbers without exceeding 64-bit limits. The user is attempting to compute N ≡ xy mod m, where m can be as large as 2^63-1, and is exploring ways to split numbers to avoid overflow during multiplication. Suggestions include reducing intermediate results modulo m before addition and using libraries like GMP for arbitrary precision arithmetic. The conversation also touches on the performance of the modulus operation, noting that it is slower than addition but can be optimized by checking bounds before applying it. Ultimately, the user plans to implement the discussed strategies in their program to enhance performance.
CRGreathouse
Science Advisor
Homework Helper
Messages
2,832
Reaction score
0
OK, so I'm writing a program and I want to calculate modular exponents. The only hard part is the multiplications. It would be best if I could work with numbers where, at all intermediate stages, each number is less than 2^64. If I must create a bit array I suppose I'll do that, but this would cause a huge performance drain.

The problem: given x, y, and m, calculate N:
N\equiv xy\pmod{m}

I may have a modulus as large as 2^63-1, so I started by splitting each of my numbers to be multiplied into two parts:

x=x_1\cdot2^{32}+x_2,\;\;0\le x_1<2^{31},\;0\le x_2<2^{32}
y=y_1\cdot2^{32}+y_2,\;\;0\le y_1<2^{31},\;0\le y_2<2^{32}

So we have

N\equiv2^{64}x_1y_1+2^{32}(x_1y_2+x_2y_1)+x_2y_2\pmod{m}

So far, so good:

x_1y_1<2^{62}
x_1y_2+x_2y_1<2^{64}
x_2y_2<2^{64}

All three intermediate numbers are a workable size, and the constant multiplications can be computed with shifts (it's convenient for binary computers to multiply by powers of 2).

Here's the trouble: How do I combine these without overflowing? Alternately, is there a different set of steps I could take to get to the same result?
 
Mathematics news on Phys.org
Slightly off-topic: Assuming C/C++, does your compiler implement the
long long datatype as 128 bit? Which would alleviate some of your overflow issues. This comment is motivated by your "64 bit int" statement in another thread.

The 64 bit systems here have 128 bit long long datatypes...sorry if I'm off-topic.
 
Why not reduce those 3 quantities mod m before you add them? Reduce the powers of 2 as well before multiplying.
 
shmoe said:
Why not reduce those 3 quantities mod m before you add them? Reduce the powers of 2 as well before multiplying.

I reduce the three quantities, but it doesn't help. Define M(a,b) as the usual % (since LaTeX hates me):

M(a,b)\equiv a\pmod{b},\;\;0\le M(a,b)<b

Then the best bounds I have for the reduced quantities:

M(x_1y_1,m)\le2^{62}-1
M(x_1y_2+x_2y_1,m)\le2^{63}-2
M(x_2y_2,m)\le2^{63}-2

I can't multiply two numbers together if their product is at least 2^{64}.

Now the powers of 2 I can reduce, but so far only in an inefficient way. I double the number and reduce modulo m 32 and 64 times. This takes 96 shifts and 96 (expensive!) modulo operations, plus loopng overhead.
 
Last edited:
jim mcnamara said:
Slightly off-topic: Assuming C/C++, does your compiler implement the
long long datatype as 128 bit? Which would alleviate some of your overflow issues. This comment is motivated by your "64 bit int" statement in another thread.

The 64 bit systems here have 128 bit long long datatypes...sorry if I'm off-topic.

It's not off-topic at all; my question is very nearly equivilent to the question, "How do I implement a 128-bit integer using only signed and unsigned longs?". Sadly, my compiler has 64-bit long longs.
 
CRGreathouse said:
It's not off-topic at all; my question is very nearly equivilent to the question, "How do I implement a 128-bit integer using only signed and unsigned longs?". Sadly, my compiler has 64-bit long longs.

You implement it with a bignum library, like GMP, which is an arbitrary precision library for real and integer operations. There is version for most systems, even Windows.

http://swox.com/gmp/
 
Last edited by a moderator:
CRGreathouse said:
Then the best bounds I have for the reduced quantities:

M(x_1y_1,m)\le2^{62}-1
M(x_1y_2+x_2y_1,m)\le2^{63}-2
M(x_2y_2,m)\le2^{63}-2

I can't multiply two numbers together if their product is at least 2^{64}.

That's fine, you can add two numbers that are less than 2^63 then reduce modulo m, which is all you need after you've multiplied by 2 enough times.

CRGreathouse said:
Now the powers of 2 I can reduce, but so far only in an inefficient way. I double the number and reduce modulo m 32 and 64 times. This takes 96 shifts and 96 (expensive!) modulo operations, plus loopng overhead.

You won't need to reduce modulo m 64 times to multiply by 2^64. You only need to reduce when you get something that's one shift away from 'going over'. If m is 63 bits, you should get to shift twice on average before needing to reduce (assuming random input on each stage, not really true of course). If m is smaller, even less mods wil be needed.

How expensive is the mod operator? I would hope mod(x,y) is zippy if y is nearly the same size as x, so even if m is 63 bits and you have to mod frequently, the operation should be fast no?


there are multiple libraries for large number arithmetic, gmp as mentioned, or bigint. I don't do much computer work myself, most of the time I'll just use maple or pari/gp which can handle big numbers already (and even with the added overhead will still be much faster than I would probably take the time to code).
 
shmoe said:
How expensive is the mod operator? I would hope mod(x,y) is zippy if y is nearly the same size as x, so even if m is 63 bits and you have to mod frequently, the operation should be fast no?

The mod operator takes maybe 6 times longer than an addition or subtraction. You're right, though; there's no reason to use it when I can just check the bounds each time and subtract (once) if needed. This does bring it down a bit. I don't know how costly this frequent branching would be (hyperpipelined processors don't like conditional commands), but it would surely be faster.

OK, thanks, I'm going to try to write this program up and see how fast it is.
 

Similar threads

Back
Top