# Modular multiplication problem in C

1. Aug 29, 2008

### grappolo

Hi to everyone,

i'm writing a program in C that is supposed to run on a AVR processor with registers and int of 16 bit (like the one used in some sensor for WSN). This program involves the use of the Lagrange Interpolation. I've worked on it managing to solve most part of the problems but i'm still facing a pure mathematical issue.

Basically all my values are now smaller than 2^16 (they're stored modulus 2^16) but when i multiply two of that values an overflow can occur since the avr-gcc compiler (with the setting i'm using) is not capable of handling any values bigger than 2^16.

So the question is: "Given a and b smaller than 2^16 is there an efficient method to obtain (a*b) mod n without using any variable longer than 16 bit?".

In a normal situation i would have simply used a double lenght variable (32 bit in this case) to hold the product a*b then applied the modulus. In this case I can't store any values bigger than 16 bit so i need an efficient alghoritm to compute (a*b) mod n under these costraints.

I cannot use any C++ class or anything different from standard C instructions.

Thank you in advance for all the answers,

Lu

2. Aug 29, 2008

### Staff: Mentor

How large is n?

3. Aug 29, 2008

### grappolo

Hi Borek,
n has to be is the largest prime smaller than 2^16; this value is 65521. Also all the values (a and b) are smaller than 65521 (so also smaller than 2^16).

Sorry for the missing information !

Lu

4. Aug 29, 2008

### Staff: Mentor

I thought about ((a*b) % n == ((a % n) * (b % n)) % n), but for so large n it doesn't change anything, as you must still be able to calculate 65520^2.

But you are most likely right that it is much more of a math question, than the programming one. Could be some properties of modulo arithmetic will simplify things.

5. Aug 29, 2008

### grappolo

That's the problem Borek. The only way to avoid overflow could be avoiding the use of numbers bigger than 2^8 for a and b but this is impossible in my case. There must be an efficient way to do this.

Thank you for your reply anyway.

Lu

6. Aug 29, 2008

### Hurkyl

Staff Emeritus
You already know a way to do this -- you learned it in elementary school. The only difference is here, you should work in base 256 rather than base 10.

(And you may or may not want to use something like Montgomery multiplication or Barrett reduction, rather than implementing elementary-school division)

If you need a buzzword, look for 'multiprecision arithmetic'. Oh, and I'm pretty sure this stuff is in Knuth's texts, if you have that.

Also, you should check out what special arithmetic intrinsic functions / assembly statements you have available to you. Your environment might already offer a way to multiply two 16-bit numbers to get a 32-bit product (the result split across two registers), or allow you to divide-with-remainder a 32-bit number by a 16-bit number (the input split across two registers).

Last edited: Aug 29, 2008
7. Aug 30, 2008

### grappolo

If you mean the basic tecnique of splitting the 16 bit value into two 8 bit value with proper coefficient I've already tried that. The problem is that it isn't efficient since a single multiplication involve a lot more of operation.

I don't know about Barrett reduction but Montgomery multiplication isn't the solution at all. In fact the Montgomery multiplication simply shifts the problem because the output is o = ab R^(-1) mod n, where R = 2^16. You still need to manipulate o * R mod n, that can potentially cause the same overflow issue.

I have the Knuth TAOCP book, but (shame on me) I've been lazy and I've not red it carefully (it's a pdf non-searchable version).

The AVR processor my device is using is not capable of splitting values on contiguous registers. The bigger costraint is that the device is battery operated and it has very very low power and low CPU performance. That's why I needed a more efficient way.

Thank you anyway for your effort. At worst I can still use the 256 base method that also you suggested if nothing better comes out.

Lu

8. Aug 30, 2008

### Hurkyl

Staff Emeritus
Agreed -- but that's unavoidable: 16-bit multiplies are inherently 4 times harder than 8-bit multiplies. The only reason you don't see this 'normally' is because most CPU's only provide a multiplier of a single size, so you can't take advantage of the fact smaller multipliers are cheaper.

Anyways, it strikes me that there is another solution -- I dunno if it's better or not. You can compute "multiplication by b" through a sequence of doubles and adds. (Similar to how you compute modular exponentation efficiently through square-and-multiplies)

Last edited: Aug 30, 2008
9. Aug 30, 2008

### CRGreathouse

Yes, double-and-add might work since the numbers are small; also precomputing 3a would allow fewer additions: 255a = ((3a * 2 * 2 + 3a) * 2 * 2 + 3a) * 2 * 2 + 3a instead of ((((((a * 2 + a) * 2 + a) * 2 + a) * 2 + a) * 2 + a) * 2 + a) * 2 + a.

Since you're working mod a number close to 2^16, it may be easier to approximate the result, multiply mod 2^16, then add or subtract as needed.

10. Aug 31, 2008

### rcgldr

How do you plan on implementing addition (or subraction) modulo 65521 in C (wihtout access to the carry bit that would be available in assembly)?

It seems that tables would help here, for example, a 256 by 16 bit table for (2^16 times the values 0 -> 255) modulo 65521.

11. Aug 31, 2008

### grappolo

You may be right, it can be more efficient. I'll try that tomorrow and I'll compare results using Avrora Simulator.

Thank you,

Lu

12. Aug 31, 2008

### grappolo

My device also offers a very limited storage space and I'm already using part of it for cryptographic material needed by the algorithm. Using a table like the one you're suggesting isn't a good idea in this case (when in a normal situation would have been also my choice probably). At the moment for addition I'm using an "homemade" algorithm that also can be a lot more efficient but whose problems are nothing compared to multiplying efficiency problems.

I'll work probably on it after having solved (or at least improved a little) the performances of multiplying.

Thank you and have a good sunday,

Lu

13. Aug 31, 2008

### KTC

Exactly how much space do you have? And by efficient, do you have a specific speed requirement, or are you just looking for as fast as possible?

14. Aug 31, 2008

### Hurkyl

Staff Emeritus
It's easy enough to detect if addition/subtraction overflows through a comparison.

15. Sep 2, 2008

### grappolo

Hi,
about sotrage space I'm talking of some Kb depending on the sensing algorithm loaded on the device. My program is only supposed to handle authentication between devices so the space left depends on other software space usage.

About efficiency I have no specific costraints, my goal is to minimize the time needed or at least the CPU calculations needed. I'm still in a preliminary phase about this kind of evaluation since the program is not yet complete. Once I have a working version I can start improving effficiency using Avrora Simulation results as benchmarks.

Thank you,

Lu

Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?