Modular multiplication problem in C

  • Thread starter Thread starter grappolo
  • Start date Start date
  • Tags Tags
    Multiplication
AI Thread Summary
The discussion revolves around implementing modular multiplication in C for an AVR processor with 16-bit registers, specifically for values less than 65521. The main challenge is to compute (a*b) mod n without causing overflow, as the compiler cannot handle values larger than 16 bits. Suggestions include using techniques like double-and-add for multiplication and exploring properties of modular arithmetic, but concerns about efficiency and register limitations persist. The user is also considering their limited storage space and the need for optimization in their program. Overall, the focus is on finding a mathematically efficient method to perform multiplication under strict constraints.
grappolo
Messages
7
Reaction score
0
Hi to everyone,

i'm writing a program in C that is supposed to run on a AVR processor with registers and int of 16 bit (like the one used in some sensor for WSN). This program involves the use of the Lagrange Interpolation. I've worked on it managing to solve most part of the problems but I'm still facing a pure mathematical issue.

Basically all my values are now smaller than 2^16 (they're stored modulus 2^16) but when i multiply two of that values an overflow can occur since the avr-gcc compiler (with the setting I'm using) is not capable of handling any values bigger than 2^16.

So the question is: "Given a and b smaller than 2^16 is there an efficient method to obtain (a*b) mod n without using any variable longer than 16 bit?".

In a normal situation i would have simply used a double length variable (32 bit in this case) to hold the product a*b then applied the modulus. In this case I can't store any values bigger than 16 bit so i need an efficient alghoritm to compute (a*b) mod n under these costraints.

I cannot use any C++ class or anything different from standard C instructions.

Thank you in advance for all the answers,

Lu
 
Technology news on Phys.org
How large is n?
 
Hi Borek,
n has to be is the largest prime smaller than 2^16; this value is 65521. Also all the values (a and b) are smaller than 65521 (so also smaller than 2^16).

Sorry for the missing information :redface:!

Lu
 
I thought about ((a*b) % n == ((a % n) * (b % n)) % n), but for so large n it doesn't change anything, as you must still be able to calculate 65520^2.

But you are most likely right that it is much more of a math question, than the programming one. Could be some properties of modulo arithmetic will simplify things.
 
That's the problem Borek. The only way to avoid overflow could be avoiding the use of numbers bigger than 2^8 for a and b but this is impossible in my case. There must be an efficient way to do this.

Thank you for your reply anyway.

Lu
 
grappolo said:
That's the problem Borek. The only way to avoid overflow could be avoiding the use of numbers bigger than 2^8 for a and b but this is impossible in my case. There must be an efficient way to do this.

Thank you for your reply anyway.

Lu
You already know a way to do this -- you learned it in elementary school. The only difference is here, you should work in base 256 rather than base 10.

(And you may or may not want to use something like Montgomery multiplication or Barrett reduction, rather than implementing elementary-school division)

If you need a buzzword, look for 'multiprecision arithmetic'. Oh, and I'm pretty sure this stuff is in Knuth's texts, if you have that.

Also, you should check out what special arithmetic intrinsic functions / assembly statements you have available to you. Your environment might already offer a way to multiply two 16-bit numbers to get a 32-bit product (the result split across two registers), or allow you to divide-with-remainder a 32-bit number by a 16-bit number (the input split across two registers).
 
Last edited:
Hurkyl said:
You already know a way to do this -- you learned it in elementary school. The only difference is here, you should work in base 256 rather than base 10.
If you mean the basic tecnique of splitting the 16 bit value into two 8 bit value with proper coefficient I've already tried that. The problem is that it isn't efficient since a single multiplication involve a lot more of operation.

Hurkyl said:
(And you may or may not want to use something like Montgomery multiplication or Barrett reduction, rather than implementing elementary-school division)
I don't know about Barrett reduction but Montgomery multiplication isn't the solution at all. In fact the Montgomery multiplication simply shifts the problem because the output is o = ab R^(-1) mod n, where R = 2^16. You still need to manipulate o * R mod n, that can potentially cause the same overflow issue.

Hurkyl said:
If you need a buzzword, look for 'multiprecision arithmetic'. Oh, and I'm pretty sure this stuff is in Knuth's texts, if you have that.
I have the Knuth TAOCP book, but (shame on me) I've been lazy and I've not red it carefully (it's a pdf non-searchable version).

Hurkyl said:
Also, you should check out what special arithmetic intrinsic functions / assembly statements you have available to you. Your environment might already offer a way to multiply two 16-bit numbers to get a 32-bit product (the result split across two registers), or allow you to divide-with-remainder a 32-bit number by a 16-bit number (the input split across two registers).

The AVR processor my device is using is not capable of splitting values on contiguous registers. The bigger costraint is that the device is battery operated and it has very very low power and low CPU performance. That's why I needed a more efficient way.

Thank you anyway for your effort. At worst I can still use the 256 base method that also you suggested if nothing better comes out.

Lu
 
grappolo said:
If you mean the basic tecnique of splitting the 16 bit value into two 8 bit value with proper coefficient I've already tried that. The problem is that it isn't efficient since a single multiplication involve a lot more of operation.
Agreed -- but that's unavoidable: 16-bit multiplies are inherently 4 times harder than 8-bit multiplies. The only reason you don't see this 'normally' is because most CPU's only provide a multiplier of a single size, so you can't take advantage of the fact smaller multipliers are cheaper.

Anyways, it strikes me that there is another solution -- I don't know if it's better or not. You can compute "multiplication by b" through a sequence of doubles and adds. (Similar to how you compute modular exponentation efficiently through square-and-multiplies)
 
Last edited:
Yes, double-and-add might work since the numbers are small; also precomputing 3a would allow fewer additions: 255a = ((3a * 2 * 2 + 3a) * 2 * 2 + 3a) * 2 * 2 + 3a instead of ((((((a * 2 + a) * 2 + a) * 2 + a) * 2 + a) * 2 + a) * 2 + a) * 2 + a.

Since you're working mod a number close to 2^16, it may be easier to approximate the result, multiply mod 2^16, then add or subtract as needed.
 
  • #10
How do you plan on implementing addition (or subraction) modulo 65521 in C (wihtout access to the carry bit that would be available in assembly)?

It seems that tables would help here, for example, a 256 by 16 bit table for (2^16 times the values 0 -> 255) modulo 65521.
 
  • #11
CRGreathouse said:
Yes, double-and-add might work since the numbers are small; also precomputing 3a would allow fewer additions: 255a = ((3a * 2 * 2 + 3a) * 2 * 2 + 3a) * 2 * 2 + 3a instead of ((((((a * 2 + a) * 2 + a) * 2 + a) * 2 + a) * 2 + a) * 2 + a) * 2 + a.

Since you're working mod a number close to 2^16, it may be easier to approximate the result, multiply mod 2^16, then add or subtract as needed.

You may be right, it can be more efficient. I'll try that tomorrow and I'll compare results using Avrora Simulator.

Thank you,

Lu
 
  • #12
Jeff Reid said:
How do you plan on implementing addition (or subraction) modulo 65521 in C (wihtout access to the carry bit that would be available in assembly)?

It seems that tables would help here, for example, a 256 by 16 bit table for (2^16 times the values 0 -> 255) modulo 65521.

My device also offers a very limited storage space and I'm already using part of it for cryptographic material needed by the algorithm. Using a table like the one you're suggesting isn't a good idea in this case (when in a normal situation would have been also my choice probably). At the moment for addition I'm using an "homemade" algorithm that also can be a lot more efficient but whose problems are nothing compared to multiplying efficiency problems.

I'll work probably on it after having solved (or at least improved a little) the performances of multiplying.

Thank you and have a good sunday,

Lu
 
  • #13
Exactly how much space do you have? And by efficient, do you have a specific speed requirement, or are you just looking for as fast as possible?
 
  • #14
Jeff Reid said:
How do you plan on implementing addition (or subraction) modulo 65521 in C (wihtout access to the carry bit that would be available in assembly)?
It's easy enough to detect if addition/subtraction overflows through a comparison.
 
  • #15
KTC said:
Exactly how much space do you have? And by efficient, do you have a specific speed requirement, or are you just looking for as fast as possible?

Hi,
about sotrage space I'm talking of some Kb depending on the sensing algorithm loaded on the device. My program is only supposed to handle authentication between devices so the space left depends on other software space usage.

About efficiency I have no specific costraints, my goal is to minimize the time needed or at least the CPU calculations needed. I'm still in a preliminary phase about this kind of evaluation since the program is not yet complete. Once I have a working version I can start improving effficiency using Avrora Simulation results as benchmarks.

Thank you,

Lu
 

Similar threads

Replies
2
Views
2K
Replies
19
Views
5K
Replies
4
Views
2K
Replies
16
Views
2K
Replies
14
Views
7K
Replies
23
Views
2K
Back
Top