Modular multiplication problem in C

grappolo · Aug 29, 2008

Hi to everyone,

i'm writing a program in C that is supposed to run on a AVR processor with registers and int of 16 bit (like the one used in some sensor for WSN). This program involves the use of the Lagrange Interpolation. I've worked on it managing to solve most part of the problems but I'm still facing a pure mathematical issue.

Basically all my values are now smaller than 2^16 (they're stored modulus 2^16) but when i multiply two of that values an overflow can occur since the avr-gcc compiler (with the setting I'm using) is not capable of handling any values bigger than 2^16.

So the question is: "Given a and b smaller than 2^16 is there an efficient method to obtain (a*b) mod n without using any variable longer than 16 bit?".

In a normal situation i would have simply used a double length variable (32 bit in this case) to hold the product a*b then applied the modulus. In this case I can't store any values bigger than 16 bit so i need an efficient alghoritm to compute (a*b) mod n under these costraints.

I cannot use any C++ class or anything different from standard C instructions.

Thank you in advance for all the answers,

Lu

Borek · Aug 29, 2008

How large is n?

grappolo · Aug 29, 2008

Hi Borek,
n has to be is the largest prime smaller than 2^16; this value is 65521. Also all the values (a and b) are smaller than 65521 (so also smaller than 2^16).

Sorry for the missing information :redface:

!

Lu

Borek · Aug 29, 2008

I thought about ((a*b) % n == ((a % n) * (b % n)) % n), but for so large n it doesn't change anything, as you must still be able to calculate 65520^2.

But you are most likely right that it is much more of a math question, than the programming one. Could be some properties of modulo arithmetic will simplify things.

grappolo · Aug 29, 2008

That's the problem Borek. The only way to avoid overflow could be avoiding the use of numbers bigger than 2^8 for a and b but this is impossible in my case. There must be an efficient way to do this.

Thank you for your reply anyway.

Lu

Hurkyl · Aug 29, 2008

grappolo said:

That's the problem Borek. The only way to avoid overflow could be avoiding the use of numbers bigger than 2^8 for a and b but this is impossible in my case. There must be an efficient way to do this.

Thank you for your reply anyway.

Lu

You already know a way to do this -- you learned it in elementary school. The only difference is here, you should work in base 256 rather than base 10.

(And you may or may not want to use something like Montgomery multiplication or Barrett reduction, rather than implementing elementary-school division)

If you need a buzzword, look for 'multiprecision arithmetic'. Oh, and I'm pretty sure this stuff is in Knuth's texts, if you have that.

Also, you should check out what special arithmetic intrinsic functions / assembly statements you have available to you. Your environment might already offer a way to multiply two 16-bit numbers to get a 32-bit product (the result split across two registers), or allow you to divide-with-remainder a 32-bit number by a 16-bit number (the input split across two registers).

grappolo · Aug 30, 2008

Hurkyl said:

You already know a way to do this -- you learned it in elementary school. The only difference is here, you should work in base 256 rather than base 10.

If you mean the basic tecnique of splitting the 16 bit value into two 8 bit value with proper coefficient I've already tried that. The problem is that it isn't efficient since a single multiplication involve a lot more of operation.

Hurkyl said:

(And you may or may not want to use something like Montgomery multiplication or Barrett reduction, rather than implementing elementary-school division)

I don't know about Barrett reduction but Montgomery multiplication isn't the solution at all. In fact the Montgomery multiplication simply shifts the problem because the output is o = ab R^(-1) mod n, where R = 2^16. You still need to manipulate o * R mod n, that can potentially cause the same overflow issue.

Hurkyl said:

If you need a buzzword, look for 'multiprecision arithmetic'. Oh, and I'm pretty sure this stuff is in Knuth's texts, if you have that.

I have the Knuth TAOCP book, but (shame on me) I've been lazy and I've not red it carefully (it's a pdf non-searchable version).

Hurkyl said:

Also, you should check out what special arithmetic intrinsic functions / assembly statements you have available to you. Your environment might already offer a way to multiply two 16-bit numbers to get a 32-bit product (the result split across two registers), or allow you to divide-with-remainder a 32-bit number by a 16-bit number (the input split across two registers).

The AVR processor my device is using is not capable of splitting values on contiguous registers. The bigger costraint is that the device is battery operated and it has very very low power and low CPU performance. That's why I needed a more efficient way.

Thank you anyway for your effort. At worst I can still use the 256 base method that also you suggested if nothing better comes out.

Lu

Hurkyl · Aug 30, 2008

grappolo said:

If you mean the basic tecnique of splitting the 16 bit value into two 8 bit value with proper coefficient I've already tried that. The problem is that it isn't efficient since a single multiplication involve a lot more of operation.

Agreed -- but that's unavoidable: 16-bit multiplies are inherently 4 times harder than 8-bit multiplies. The only reason you don't see this 'normally' is because most CPU's only provide a multiplier of a single size, so you can't take advantage of the fact smaller multipliers are cheaper.

Anyways, it strikes me that there is another solution -- I don't know if it's better or not. You can compute "multiplication by b" through a sequence of doubles and adds. (Similar to how you compute modular exponentation efficiently through square-and-multiplies)

CRGreathouse · Aug 30, 2008

Yes, double-and-add might work since the numbers are small; also precomputing 3a would allow fewer additions: 255a = ((3a * 2 * 2 + 3a) * 2 * 2 + 3a) * 2 * 2 + 3a instead of ((((((a * 2 + a) * 2 + a) * 2 + a) * 2 + a) * 2 + a) * 2 + a) * 2 + a.

Since you're working mod a number close to 2^16, it may be easier to approximate the result, multiply mod 2^16, then add or subtract as needed.

rcgldr · Aug 31, 2008

How do you plan on implementing addition (or subraction) modulo 65521 in C (wihtout access to the carry bit that would be available in assembly)?

It seems that tables would help here, for example, a 256 by 16 bit table for (2^16 times the values 0 -> 255) modulo 65521.

grappolo · Aug 31, 2008

CRGreathouse said:

Yes, double-and-add might work since the numbers are small; also precomputing 3a would allow fewer additions: 255a = ((3a * 2 * 2 + 3a) * 2 * 2 + 3a) * 2 * 2 + 3a instead of ((((((a * 2 + a) * 2 + a) * 2 + a) * 2 + a) * 2 + a) * 2 + a) * 2 + a.

Since you're working mod a number close to 2^16, it may be easier to approximate the result, multiply mod 2^16, then add or subtract as needed.

You may be right, it can be more efficient. I'll try that tomorrow and I'll compare results using Avrora Simulator.

Thank you,

Lu

grappolo · Aug 31, 2008

Jeff Reid said:

How do you plan on implementing addition (or subraction) modulo 65521 in C (wihtout access to the carry bit that would be available in assembly)?

It seems that tables would help here, for example, a 256 by 16 bit table for (2^16 times the values 0 -> 255) modulo 65521.

My device also offers a very limited storage space and I'm already using part of it for cryptographic material needed by the algorithm. Using a table like the one you're suggesting isn't a good idea in this case (when in a normal situation would have been also my choice probably). At the moment for addition I'm using an "homemade" algorithm that also can be a lot more efficient but whose problems are nothing compared to multiplying efficiency problems.

I'll work probably on it after having solved (or at least improved a little) the performances of multiplying.

Thank you and have a good sunday,

Lu

KTC · Aug 31, 2008

Exactly how much space do you have? And by efficient, do you have a specific speed requirement, or are you just looking for as fast as possible?

Hurkyl · Aug 31, 2008

Jeff Reid said:

How do you plan on implementing addition (or subraction) modulo 65521 in C (wihtout access to the carry bit that would be available in assembly)?

It's easy enough to detect if addition/subtraction overflows through a comparison.

grappolo · Sep 2, 2008

KTC said:

Exactly how much space do you have? And by efficient, do you have a specific speed requirement, or are you just looking for as fast as possible?

Hi,
about sotrage space I'm talking of some Kb depending on the sensing algorithm loaded on the device. My program is only supposed to handle authentication between devices so the space left depends on other software space usage.

About efficiency I have no specific costraints, my goal is to minimize the time needed or at least the CPU calculations needed. I'm still in a preliminary phase about this kind of evaluation since the program is not yet complete. Once I have a working version I can start improving effficiency using Avrora Simulation results as benchmarks.

Thank you,

Lu

Modular multiplication problem in C

Similar threads

Hot Threads

Touch-typing for programmers

How to calculate Tension for a series of connected points?

Python Complaining About Python

Fortran Reading files in pre-f77 - handling end of file

Sequential Analog Computers?

Recent Insights

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem