Efficiently computing the parity of the Hamming weight of a binary sequence

1. Aug 11, 2012

burritoloco

Hello,

I'm wondering if there are any tricks to quickly determine the parity of the weight of a binary number, i.e., whether it's even or odd. I'm not interested at all in the actual weight of the number, but in the parity of it. I know that if say x = a XOR b, then x has odd weight iff exactly one of a, b, has odd weight. But would you have come across a good algorithm for this problem before? The web seems full of the actual Hamming weight problem, but haven't noticed anything special about this other case. Thanks a lot guys.

2. Aug 11, 2012

chiro

Hey burritoloco.

Have you come across error correcting codes? Hamming actually wrote a book on these and deals with the construction of error correcting codes where not only can you detect errors, but you can even correct some of them depending on how the code is constructed.

You can construct these with matrices where sums are considered sums modulo 2 instead of just normal sums.

The different kinds of codes are constructed based on a lot of Venn diagrams. You can construct different codes to correct specific kinds of errors, but the general thing is that you need to add more bits to correct more errors. Notice that you will be representing a lot more than your minimum data.

3. Aug 11, 2012

uart

When I've had to do this efficiently in the past I've just used a lookup table for very fast implementation on say a byte (or actually a nibble because I'm too lazy to type out a 256 entry lookup) and then use the property you mentioned to efficiently extend that to word or dwords etc as required.

For example, my (Pascal) code for "word" parity would be something like:
Code (Text):

var a      : word;
ah,al  : byte;
parity : byte;
p_look : array[0..15] of byte = (0,1,1,0,1,0,0,1,1,0,0,1,0,1,1,0);

begin
a := $E234; {word to test} ah := hi(a); al := lo(a); parity := p_look[ah shr 4] xor p_look[ah and$0f] xor
p_look[al shr 4] xor p_look[al and \$0f];

writeln(parity);              {parity is "1" for the given test word}
end.

Last edited: Aug 11, 2012
4. Aug 11, 2012

rcgldr

If your PC is relatively new, it's processor may have the "popcnt" instruction, which returns the number of 1 bits in a value. For Visual Studio 2010 C/C++, this can be accessed using the intrinsic function _popcnt().

http://msdn.microsoft.com/en-us/library/bb385231.aspx

Odd parity would just be an odd count of bits. To do this in Visual Studio 2010 C:

Code (Text):

#include <intrin.h>
//  ...
parity = _popcnt(a ^ b) & 0x01;
//  ...

gcc implents this instrinsic function with the name __builtin_popcount()

Otherwise, just use a table as suggested by uart.

5. Aug 11, 2012

burritoloco

Thanks guys.

Chiro, I took a class on Finite Fields and Coding theory over a year ago. I'm currently doing some work over GF(2) and bits are all over ;).

Thanks uart, but I think that maybe your method might be exponential in time. Given an N-bit word and a look-up table of weight parities of n-bit words, you seem to be XORing 2^{N-n} words to get the result.

rcgldr, thanks for the comment; you already had me informed of popcnt in another post of mine ;). It would be real unfortunate if one would have to count all the '1's in a sequence just to know its parity.

Just another quick question: What are the time complexities of the bitwise arithmetical operations, OR, XOR, &, shifts.

Thanks again everyone.

6. Aug 11, 2012

burritoloco

uart, you are also using 2^(N-n) arithmetical operations to slice the N-bit word into 2^(N-n) n-bit words...

7. Aug 11, 2012

burritoloco

Don't know about you guys, but something in my gut tells me that there should be a quicker way to get the parity of the weight that doesn't involve computing its actual weight.

8. Aug 11, 2012

rcgldr

I've worked with this type of math for error correction code, but haven't done much work with hamming codes. Normally a received code word is re-encoded or decoded to check for errors, I'm not sure why you just want to know the even / odd bit parity.

The time would be linear (multiplicative), not exponential, the number of operations would be related to N/n.

I'm not sure how, since the Hamming weight is just the number of 1 bits in a codeword, and popcnt should be a fast way to do this.

http://en.wikipedia.org/wiki/Hamming_weight

Last edited: Aug 12, 2012
9. Aug 12, 2012

uart

I'm not quite sure how you worked that out? It's non-iterative with one lookup + one xor for per each 4 bits, that sounds linear to me.

Of course you can make it one lookup plus one xor per 8 bits if you don't mind making the lookup table bigger.

10. Aug 13, 2012

burritoloco

I take this back! You're XORing a word of length N/n, which is smaller than the original one. As rcgldr pointed out, it is with time N/n that you slice the original sequence and I stand corrected!

Your method is great when N is bounded above (usually by 64 these days). Unfortunately, I'm dealing with arbitrarily large integers; maybe I should have said that earlier ;).
I never said that I'm doing error codes :). But I'll have to agree with you that computing the weight is the way to go if you want to know the parity: there's just no other alternative that I've noticed online! However, I do know that mathematicians are very tricky people :) (a compliment!). Thanks guys.

p.s. pocnt and __builtin_popcount are so fast it's unbelievable. In my computer, these two give the weights of all integers from 1 to 2^61 - 1 in about 0 seconds! BTW, GCC also has the __builtin_parity function included; visual studio probably has one as well.

Last edited: Aug 13, 2012
11. Aug 14, 2012

rcgldr

I didn't find one. Based on a web search, it appears that gcc's __builtin_parity() is using the PF (parity flag) bit in the status register, which is set if the low order byte of a result (other than a move) has even parity. The advantage is that all X86 processors have the PF status bit. The disadvantage is that the builtin function will need to do more operations than the two instruction sequence {"popcnt" , "and"}, since it will need to shift and xor the operand down to a single byte, then do a LAHF, then a shift and mask ("and") to get the PF bit in the lower bit.

I'm curious as to why you want to calculate the parity on an arbitrary large integer.

Last edited: Aug 14, 2012
12. Aug 14, 2012

uart

Well the xor method doesn't require calculating the hamming weight. It just requires some way of calculating the parity for some given chunk size and finding the overall parity by xor-ing the parity of the separate chunks. The weight is never calculated.

Yes, since popcnt is hardware implemented then of course it's going to be the fastest way, if your processor supports it (SSE4). Even if in general finding the complete hamming weight would not necessarily be the fastest way to do it in software.

Ok that is actually a bit hard to believe, even if we round that zero seconds up to say one complete second. You're presumably generating the numbers 1 .. 2^61 in a loop, there's got to be a minimum (at a machine level) of at least one increment, one popcnt, one xor and one compare+branch per integer tested. To do that in even 1 second by my calculations is about 10^13 MIPS.

I wonder if, in the case that this is just a benchmark and the results aren't being stored or used in anywhere, if some optimizing compiler is not optimizing your code right out of existence?

13. Aug 17, 2012

burritoloco

rcgldr, interesting. Are you sure they aren't using look-up tables?

I'm doing a bit of research on primitive polynomials over GF(2). I needed to compute the parity of certain blocks in a binary sequence and wanted to know the time complexity for doing that.

Yes, I was using the -O2 option :). I did have a hard time believing it as well and thought there was some kind of error, but nope! Perhaps they are using look-up tables as well?

Look-up tables aside, isn't the time complexity of this method linearly proportional to the actual weight of the sequence? If so, then computing the actual weight is equivalent in time.

14. Aug 17, 2012

uart

No if you use _popcnt then it doesn't need a lookup table, it's done in hardware which is even faster.

No matter which way you look at it you're going to need at least 4 machine language instructions per iteration. Now lets say your computer is really good, say it runs at 10GHz and can unroll the loop and do 100 instructions in parallel per clock cycle! It would still take over 100 years to complete your code!

Do the maths,

(4*2^61) / (100*10^10) / (3600 * 24) = 106.75 [strike]years[/strike] days.

Last edited: Aug 17, 2012
15. Aug 17, 2012

rcgldr

Assuming your compiler can output assembly code, output the assembly code and look at the generated code.

That would be 106.75 days.

Assuming that each block is an encoded codeword, then "parity" might mean the remainder produced by division of a block by the primitive polynomial, as opposed to even or odd parity of a block. I'm not sure why a person would want to know the even versus odd parity of a block when using primitive polynomials.

Last edited: Aug 17, 2012
16. Aug 17, 2012

uart

LOL sorry. It's late at night here.

Anyway, still a very long time compared with "0 seconds".

17. Aug 17, 2012

burritoloco

Well, I decided to repeat the experiment. I first COUTed the weights for a smaller loop to make sure everything was in order. The following loop, using the gcc function int __builtin_popcountl(long n) and the -O2 option on for the compiler, took 0 seconds to compute.

long m = (1L << 63) - 1;
for(long i=0; i < m; i++)
{
__builtin_popcountl(i);
}

Even worse, I then measured in milliseconds, and... again, 0 ms. I then removed the -O2 option in the build; same time. My laptop is an Acer 5736Z, Pentium(R) Dual-Core CPU T4500 @ 2.30GHz × 2, x64, not a super-computer. Have you tried a similar experiment by yourself? I would love to output the assembly code if anyone would show me how?

From your comment I take it that perhaps "block" is a term already in use in relation to codewords? I merely meant chunks of a binary sequence. Again, no codewords involved in what I'm doing.

Last edited: Aug 17, 2012
18. Aug 17, 2012

rcgldr

Try changing this to:

long j = 0;
long m = (1L << 31) - 1;
for(long i=0; i < m; i++)
{
j += __builtin_popcountl(i);
}

cout << j << endl;

Last edited: Aug 17, 2012
19. Aug 17, 2012

burritoloco

Cool, computing time was 29197 milliseconds, j = 33285996513. Still, addition time is proportional to the amount of digits no?

20. Aug 17, 2012

rcgldr

Use -S (upper case) to generate assembly source:

gcc -S -O2 example.cpp

or to generate assembly source and object but not link:

gcc -S -O2 -c example.cpp

This will produce an assembler file called example.s.

No, there's no standard for the term "block" when working with primitive polynomials. I wasn't sure if in your case a block would be a codeword or not.

You can try this and hope the compiler doesn't optimize this to just running the last loop:

register long j = 0;
long m = (1L << 31) - 1;
for(long i=0; i < m; i++)
{
j = __builtin_popcountl(i);
}

cout << j << endl;

Last edited: Aug 17, 2012