Combinatorial Bit Enumeration

  • Thread starter rasman
  • Start date
  • Tags
    Bit
In summary, the conversation discusses a problem of finding all combinations of a certain number of items and iterating through them efficiently in a computer program. The best solution proposed involves using a for loop and bitwise operations, but it is noted that this may not be the most efficient solution for larger numbers. The possibility of precomputing and storing the combinations is also mentioned.
  • #1
rasman
8
0
Let's say I have 10 items, and I want to examine every combination of 5 items. Obviously, there are http://www.google.com/search?q=10+choose+5" possible combinations. But I want to actually enumerate them and iterate through them as efficiently as I can in a computer program. I'm wondering what the best way to do this might be. I don't know/remember much about combinatorics, and I'm hoping that this problem has been solved already.

So far, the best thing I can think of is to iterate from the smallest 10-bit number that has 5 bits set...

Code:
int min = (1 << 5) - 1; // 0000011111 = 31

...to the largest 10-bit number that has 5 bits set...

Code:
int max = min << 5; // 1111100000 = 992

...and check every value between them to see if it has exactly 5 bits set.

Can anyone think of a better way? Has this problem been solved already? How can I get the next highest integer with the same number of bits set?

Thanks for any insight that you can provide.
 
Last edited by a moderator:
Physics news on Phys.org
  • #2
I suppose you could make a 5-deep loop where each index is the bit to set, something like this:

for (a=0; a < 6; a++)
for (b=a+1; b < 7; b++)
for (c=b+1; c < 8; c++)
for (d=c+1; d < 9; d++)
for (e=d+1; e < 10; e++)
do_stuff((1 << a) + (1 << b) + (1 << c) + (1 << d) + (1 << e));

I don't know that it would be efficient, though.
 
  • #3
Thanks for the effort, CRGreathouse. I implemented both your way and my way. For small numbers, like my "10 choose 5" example, my algorithm executes in half the time of yours every time. But somewhere around "16 choose 8" they even out, and when you get to "26 choose 13", the tables are reversed, and yours runs in half the time of mine.

For "26 choose 13", my algorithm runs in 1.7 seconds and yours runs in 0.8 seconds. All do_stuff() does is increments a counter, to be sure that we're getting them all. Both get 10400600 for "26 choose 13", which is http://www.google.com/search?q=26+choose+13".

The bummer about your algorithm is that generalizing it to do any combination is a pain, and any data structures like stacks or recursion will probably slow it down too much. To do "26 choose 13" I actually implemented it all the way up to
Code:
int count = 0;
for (int a = 0; a < 14; a++)
  for (int b = a + 1; b < 15; b++)
    for (int c = b + 1; c < 16; c++)
      for (int d = c + 1; d < 17; d++)
        for (int e = d + 1; e < 18; e++)
          for (int f = e + 1; f < 19; f++)
            for (int g = f + 1; g < 20; g++)
              for (int h = g + 1; h < 21; h++)
                for (int i = h + 1; i < 22; i++)
                  for (int j = i + 1; j < 23; j++)
                    for (int k = j + 1; k < 24; k++)
                      for (int l = k + 1; l < 25; l++)
                        for (int m = l + 1; m < 26; m++)
                          if (countBits(
                            (1 << a) + (1 << b) + (1 << c) + (1 << d) + (1 << e) +
                            (1 << f) + (1 << g) + (1 << h) + (1 << i) + (1 << j) +
                            (1 << k) + (1 << l) + (1 << m)) == 13)
                              count++;
return count;
Yuck! As it is, speed is probably more important to me than code elegance. I tried changing your pluses to bitwise ors, since it's the equivalent, but with no change.

Thanks again for your help. Any other ideas?
 
Last edited by a moderator:
  • #4
These are pretty small numbers. Why can't you just precompute these tables and store them in your program?

If you really must compute this on-the-fly, I can probably come up with some pretty clever optimizations on how to determine the number of set bits in a number. Let me think about it a bit.

- Warren
 
  • #5
chroot said:
These are pretty small numbers. Why can't you just precompute these tables and store them in your program?

If you really must compute this on-the-fly, I can probably come up with some pretty clever optimizations on how to determine the number of set bits in a number.

I am pre-computing the bit-counting tables. http://infolab.stanford.edu/~manku/bitcount/bitcount.html" [Broken] was incredibly valuable in helping with that. The bit counting (really a table lookup) is hidden in that countBits() call. It's the enumeration of each value in the combination that I'm trying to perfect.

chroot said:
Let me think about it a bit.

- Warren
Don't just think about it "a bit", think about it lots of bits! :smile:
 
Last edited by a moderator:
  • #6
Well, population count (aka Hamming weight) is a very common task in many fields, like communications and error correction, so I can only assume it's a very well-understood operation. Even the Wikipedia page lists a number of very good implementations:

http://en.wikipedia.org/wiki/Hamming_weight

You could probably use modern processors' MMX/SSE2 extensions to implement a really, really fast parallel implementation of the pop count.

However, like I said, even if you precomputed every possible value of every possible choose of up to, say, a thousand bits, the resulting chunk of data would really not be very large by the standards of modern computing, and you could just look it up on the disk in constant time. Is there any reason why you'd not want to just use this approach?

Perhaps you should tell me more about your application -- your speed requirements, disk space requirements, etc.

- Warren
 
  • #7
Oh, wow. I had misunderstood CRGreathouse's algorithm. I don't need to do the countBits() part, because the for structure was guaranteeing it for me already!

Now his code is really kicking mine's ass. (0.133s to 1.7s for "26 choose 13")

There's a reason that I stopped at "26 choose 13". I'm trying to enumerate the possible card combinations for the defenders of a bridge hand, where half the cards (declarer and dummy) are already known. But it needs to scale all the way down to "2 choose 1" when the hand is almost over.

The problem with the scalability of my algorithm is that the valid values get really sparse with big numbers, and me iterating through all the 134,193,153 integers between the min and max for "26 choose 13" to only get out 10 million of the values that I want was too much.

I knew there had to be a clever algorithm to avoid the bit counting of all the values, and CRGreathouse nailed it. I might investigate faster bit counting routines a little further, but I doubt they'll make up the 12x difference I'm seeing now for the two algorithms at the maximum of "26 choose 13".

Thanks for your help.
 
  • #8
rasman said:
Oh, wow. I had misunderstood CRGreathouse's algorithm. I don't need to do the countBits() part, because the for structure was guaranteeing it for me already!

Heh, glad I could help out. :cool: I was wondering what all the talk of Hamming weights was about...
 
  • #9
Yes, it's definitely more efficient to just generate only those values which have the right number of ones in the first place, than to search the entire space looking for them. :redface:

I'm not sure how you might scale your technique to an arbitrary number of bits, though, CR. Every bit that you add requires introducing a new for loop. I suppose you could create dynamic code and eval() it in interpreted languages like Python, but you couldn't do it in C or Java.

- Warren
 
  • #10
chroot said:
Yes, it's definitely more efficient to just generate only those values which have the right number of ones in the first place, than to search the entire space looking for them. :redface:

Honestly I don't think it's straightforward. Calculating them directly like I did takes a lot of effort in terms of bit operations, and if even for large problems it only has a factor of 12 speedup then with fast calculation routines for Hamming codes with lookup tables as needed it might be faster to just loop through all possibilities (or most all*) to avoid the extra work.

* Most of the benefit of 'my' method comes from the outermost loop, or at least the outermost few. I wonder if a hybrid method could be efficient? I have trouble thinking of a way to do that efficiently, but in principle I see no reason it would be impossible. At the very least unolling the last loop might be beneficial, especially with a branching routine with fallthru.

chroot said:
I'm not sure how you might scale your technique to an arbitrary number of bits, though, CR. Every bit that you add requires introducing a new for loop. I suppose you could create dynamic code and eval() it in interpreted languages like Python, but you couldn't do it in C or Java.

I agree. I'd simply have a helper application created in source form, then compiled and run -- it's awkward, but not too difficult I hope. Of course that's only worthwhile if you have to do this with many variables -- if it's just to be for a few, write up each possibility in the code and switch between types.
 
  • #11
CRG's method can be scaled indefinitely by using an array of indices. Here' we look at all 32-bit integers with 16 1's.

Code:
int index[17];
int i;
uint32_t value;

for(i = 0; i < 16; ++i) index[i] = i;
index[16] = 32;

again:
value = 0;
for(i = 0; i < 16; ++i) value |= (uint32_t)1 << index[i];

for(i = 0; i < 16; ++i) {
  ++index[i];
  if(index[i] < index[i+1]) goto again;
  index[i] = i;
}

(my apologies for the goto. But honestly, I think it's cleaner than the other ways to code this particular implementation in C)


I'm fairly sure there's a clever way to do this using bit operations... you don't need to use any indices whatsoever!
 
Last edited:
  • #12
Well, that's all true. I'd expect your algorithm to be "slow" when the space is dense (i.e. a large fraction of the possible codes are "hits", like 23 choose 12), and quite fast when the space is very sparse (i.e. 23 choose 2).

Of course, in general, these numbers are pretty sparse. The largest case he's giving us is 23 choose n. 16% of all possible 23-bit numbers have 12 bits set, and this is as dense as it gets.

- Warren
 
  • #13
Hurkyl, thank you; of course that's it. Why didn't I think of that?

For implementation purposes: Of course with double indices it would be best to make sure the 'left' index is of power-of-two size so shifts can be used.

chroot said:
Well, that's all true. I'd expect your algorithm to be "slow" when the space is dense (i.e. a large fraction of the possible codes are "hits", like 23 choose 12), and quite fast when the space is very sparse (i.e. 23 choose 2).

That all depends on how quickly you can calculate and compare the number of bits set vs. how long it takes to compute the set bits in 'my' method.

Sparseness * time to compute bits vs. time to check # of set bits
 
  • #14
Hmm.. I'm not sure I can bring myself to use a goto. And my target language is java, anyway.

I have modified CRG's algorithm to do the shifting in the for loops. The factor of 12 that I mentioned before was incorrect because I had forgotten to do the sum to get the actual value in my implementation of CRG's algorithm, so I was just timing the looping. As it stands my speeds are:

Code:
A gave 10400600 and took 1504.797 ms
B gave 10400600 and took 534.127 ms
C gave 10400600 and took 337.718 ms

Where A is my original algorithm, B is CRG's version, and C is my modified "shifting for" version shown here:

Code:
for (int a = 1; a < 0x4000; a <<= 1)
  for (int b = a << 1; b < 0x8000; b <<= 1)
    for (int c = b << 1; c < 0x10000; c <<= 1)
      for (int d = c << 1; d < 0x20000; d <<= 1)
        for (int e = d << 1; e < 0x40000; e <<= 1)
          for (int f = e << 1; f < 0x80000; f <<= 1)
            for (int g = f << 1; g < 0x100000; g <<= 1)
              for (int h = g << 1; h < 0x200000; h <<= 1)
                for (int i = h << 1; i < 0x400000; i <<= 1)
                  for (int j = i << 1; j < 0x800000; j <<= 1)
                    for (int k = j << 1; k < 0x1000000; k <<= 1)
                      for (int l = k << 1; l < 0x2000000; l <<= 1)
                        for (int m = l << 1; m < 0x4000000; m <<= 1)
                        {
                          final int value = a + b + c + d + e + f + g + h + i + j + k + l + m;
                          // do stuff
                          count++;
                        }
That's a lot of less-thans!

With this kind of performance, I'm fine with writing a class full of methods that look like this, one for each "X choose Y" combo I need. There aren't that many of them.
 
  • #15
Hmm.. I'm not sure I can bring myself to use a goto. And my target language is java, anyway.
Come over to the dark side! Bwa ha ha! Actually, it's just because the other ways I can imagine to do this loop involve wrapping the main part in an infinite loop, replacing the goto with a break, and then either:
(1) Duplicating the loop condition after the inner loop to tell if it exited normally. (And thus break out of the outer loop)

(2) Use a flag variable

(1) is bad because of code duplication; you have to remember that if you change it in one place, you have to change it in the other place too.

(2) is the "typical" solution, but I have a hefty dislike for it because it obfuscates the code and complicates the flow of control... which are precisely (some of) the things that people are trying to prevent when they ban the use of goto.


Fortunately, java is designed to handle this particular construct naturally:

Code:
again: while(true) {
  // stuff
  for(...) {
    // stuff
    if(test) continue again;
  }
  break;
}
 
Last edited:
  • #16
Hurkyl said:
Fortunately, java is designed to handle this particular construct naturally:

Code:
again: while(true) {
  // stuff
  for(...) {
    // stuff
    if(test) continue again;
  }
  break;
}

Very interesting. I might give yours a try then.

Thanks for avoiding the old (and no longer true) "java is slow" jokes.
 
  • #17
Aha, here's a way to do it with bit ops: this code will iterate through all 8-bit integers that have 3 one's in them.

(Supply your own display command that prints an integer in binary, or does whatever you want to do with it)

Code:
int size = 8;
int pop = 3;

int n = ((1 << pop) - 1) << (size - pop);

while(true) {
	display(n, size);

	int lowest = n & -n;

	if(lowest > 1) {
		n = n ^ lowest ^ (lowest >> 1);
		continue;
	}

	int high = n & (n + lowest);
	if(high == 0) break;

	int low = n ^ high;

	low = (low << 2) + 3;

	while((low & high) == 0) low <<= 1;
	n = high ^ low;
}

That last while loop could be eliminated, but it would require that you have some (fast!) code to find the index of the highest and lowest 1's in an integer. (And this is probably faster for smaller integers anyways)


I don't claim this is the fastest way to do it with bit operations; it's just one way. It looks messy, but if you display the intermediate variables, it's pretty clear what's happening.
 
  • #18
And we have a winner! Excellent work, Hurky!

I never, in 10,400,600 years, would have come up with that myself!

The final results:
Code:
A gave 10400600 and took 1582.985 ms
B gave 10400600 and took 607.255 ms
C gave 10400600 and took 354.218 ms
D gave 10400600 and took 296.483 ms
 
  • #19
After subsequent runs, it looks like Hurkey's algorithm is really closer to 136.705 ms. Amazing!

I'm pretty sure that's unbeatable.
 
  • #20
This might speed it up a little bit. Replace

n = n ^ lowest ^ (lowest >> 1);

with

n = n - (lowest >> 1);



It's actually not too hard to come up with basic bitops algorithms; you just have to learn what tools are available (and think to try it out). Don't think of an integer as an integer: think of it as an array of bits. Then, you have routines such as:

n & -n : "Find the index of the first 1 in n, and return an array with a 1 in that position"

and

n + 1 : "n begins with k ones followed by a zero. Replace that with k zeroes followed by a one."
 

1. What is combinatorial bit enumeration?

Combinatorial bit enumeration is a method used in computer science and mathematics to systematically generate and list all possible combinations of a set of binary digits (bits). These combinations are often used in programming and algorithm design, as well as in combinatorics and cryptography.

2. How is combinatorial bit enumeration different from other enumeration methods?

Combinatorial bit enumeration is unique in that it only considers binary digits, which can either be 0 or 1. This differs from other enumeration methods that may consider more complex elements such as letters or numbers. Additionally, combinatorial bit enumeration generates all possible combinations, whereas other methods may only generate a subset of the possibilities.

3. What are some real-world applications of combinatorial bit enumeration?

Combinatorial bit enumeration has practical uses in various fields, including computer science, data analysis, and cryptography. It is often used in coding theory to analyze and optimize error-correcting codes, as well as in the design of algorithms for tasks such as data compression and pattern recognition.

4. Are there any limitations to combinatorial bit enumeration?

One limitation of combinatorial bit enumeration is that it can quickly become computationally intensive as the number of bits and combinations increases. This can make it impractical for large data sets or complex problems. Additionally, combinatorial bit enumeration may not be suitable for tasks that involve non-binary elements or variables.

5. How can I learn more about combinatorial bit enumeration?

There are many resources available online for learning about combinatorial bit enumeration, including textbooks, research papers, and tutorials. You can also try implementing the method yourself in a programming language, which can help deepen your understanding of how it works. Additionally, attending conferences or workshops on combinatorics or computer science can provide further insights and knowledge on the topic.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
913
  • Set Theory, Logic, Probability, Statistics
2
Replies
55
Views
4K
  • Computing and Technology
Replies
4
Views
666
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
27
Views
3K
  • Programming and Computer Science
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Programming and Computer Science
Replies
1
Views
172
Back
Top