Associative Arrays in C/C++ - Favorite Strategies?

In summary, using a hash-map with a binary tree will slow down the lookup time by reducing the number of bytes needed to store the data, but it will not be as fast as a normal array.
  • #1
lpetrich
988
178
Associative Arrays in C/C++ -- Favorite Strategies?

I mean by that something like what Perl or Python or Ruby have, a.k.a. dictionary types. I can do it in Mathematica with the help of label[key] = value and Head and DownValues, but it's C/C++ where it is the most difficult of all.


I once had the task of searching an array with elements {params, data} -- find data for some params value. I used a speedup that consisted of making a hash table of
(hash value for params) -> (index in data array)

I got around the hash-collision problem by checking the element at the data-array index. If it has the right params value, then I've succeeded, but if not, then I'd search for params in the data array and update the hash table with the index that I found.


Most recently, in my Lie-algebra code, I've used the C++ Standard Template Library container "map". I wanted to see if some calculated root vector was already in some list of them, so I created a map object that contains the listed root vectors. Searching in the map object requires doing a compare operation, but I found a way to speed it up relative to an element-by-element compare.

When I add a root vector to the map object, I calculate and include a hash code for it. When it comes time to compare, I first compare the hash codes, then if they are equal, I do an element-by-element comparison. This strategy produces a rather noticeable speedup.


In summary, do a fast lookup or compare, and if that fails or returns equal, then do a slow but accurate one.


Are these reasonable strategies?
 
Technology news on Phys.org
  • #3


phiby said:
There is a proposal to add hash_map to the STL - however, I am not sure if it's been accepted or not?

Yep, it's in C++11 as the class template std::unordered_map, which was pretty much taken verbatim from the Boost class template of the same name (but of course it's in namespace Boost in Boost rather than in namespace std).
 
  • #4


If the data size is small enough and ram size is large enough, you can use data as an index to a huge array of addresses to create an associatve array. Initializing the entire array with "invalid address" entries (use a flag or some invalid address value) is time consuming, but access time is quick.

Otherwise you use some type of hash and/or search algorithm as already mentioned.
 
Last edited:
  • #5


Thanx. Seems like I'm on the right track.

I've been using OSX Snow Leopard's out-of-the-box gcc, which is 4.2. However, I can get gcc up to 4.6 on the OSX package manager fink.

I now recall that in some recent hash-array code, I'd created an index-chain array, where the value for each index is another index whose data has the same hash value. The end of a chain is marked out with NONE (-1). That speeds up searching compared to my earlier effort.

In both of them, I'd used a fixed-size hash array, and a power of 2 at that. I think that if I made the hash table variable-sized, then I'd have to recalculate it every time I changed its size.I also checked on gcc's implementation of STL's associative containers. Those objects use a red-black binary tree.

For N elements, the search time:
Binary tree: log(N)
Linear-search hash table: N/H (H = hash-table size)

If one creates a hash table with a binary tree for each hash value, the search time would become
log(N/H) = log(N) - log(H)
 
  • #6


lpetrich said:
If one creates a hash table with a binary tree for each hash value, the search time would become
log(N/H) = log(N) - log(H)

You have to be careful with hash-maps because if the hashing algorithm is not uniform then you will get more collisions in certain areas depending on your input data (keys).

The other thing is that along with the above, your map needs to take into account the number of entries because whenever you do get a collision, what will happen then is that it will traverse usually a linked-list structure to get the actual structure pointer which means that now you need to consider that not only are you now searching in linear Big O time, but since it is a linked-list you will have indirection which means that it will not be as fast as a normal array and that the compiler can not optimize the code due to the nature of your bin structure being a linked list.

Even if you used something like a binary-tree for the internal bin structure, you still have indirection and the point of the hash-map is to reduce the search and lookup time of an object to something resembling O(C) and not O(logn): if you want to use something more resembling the latter, then just use a binary tree and forget about the hash-map.
 
  • #7


lpetrich said:
Thanx. Seems like I'm on the right track.

I've been using OSX Snow Leopard's out-of-the-box gcc, which is 4.2. However, I can get gcc up to 4.6 on the OSX package manager fink.
If you are using 4.2.1, you can use the TR1 (a C++ evaluation committee) version of unordered_map via #include <tr1/unordered_map>.

Word of warning: There's a good reason Snow Leopard comes with a five year old version of gcc. gcc 4.2.1 was the last version of gcc released under GPL version 2. All subsequent releases, including 4.6, were released under GPL version 3. This shouldn't be a problem if you are a student. If you are working for a living, you might want to think hard about using gcc 4.6. IMHO, stay far, far away from gcc 4.6. Release your application as a binary compiled with gcc 4.6 and boom! your work just because public domain.

You can use fink to get the Boost libraries. Boost was released with a reasonable usage license. The Boost are incredibly powerful. Getting Boost is well worth looking into.
 
  • #8


You can also just use hash map directly with the GCC you already have. This is what I use, it works fine with the gcc in xcode 3.x:

Code:
#include <ext/hash_map>
using namespace ::__gnu_cxx;
namespace __gnu_cxx {                                                                                             
	template<> struct hash< std::string > // Allow STL strings with hash_map
	{ size_t operator()( const std::string& x ) const { return hash< const char* >()( x.c_str() ); } };          
	template<> struct hash< void * > // Allow pointers with hash_map               
	{ size_t operator()( const void * x ) const { return hash< uintptr_t >()( (uintptr_t)x ); } };          
}

(I'm not totally sure, now that I think about it, what this code will do when you run it in clang.)

I think if you try to install your own GCC on Snow Leopard, you will regret it.
 

1. What is an associative array in C/C++?

An associative array in C/C++ is a data structure that stores values in key-value pairs. This means that each element in the array has a corresponding key and value, making it easy to access and manipulate data.

2. How do you declare and initialize an associative array in C/C++?

To declare and initialize an associative array in C/C++, you can use the "map" container from the std library. This can be done by specifying the data types for the key and value, and adding elements using the insert() or emplace() function.

3. What are some common strategies for working with associative arrays in C/C++?

Some common strategies for working with associative arrays in C/C++ include using loops to iterate through the elements, using the find() function to search for specific keys, and using the erase() function to remove elements from the array.

4. Can associative arrays in C/C++ have duplicate keys?

No, associative arrays in C/C++ cannot have duplicate keys. Each key must be unique in order to access the corresponding value in the array.

5. How do associative arrays in C/C++ differ from traditional arrays?

Associative arrays in C/C++ differ from traditional arrays in that they use keys to access and store values, rather than numerical indices. This allows for more flexibility in terms of data manipulation and organization.

Similar threads

  • Programming and Computer Science
Replies
2
Views
1K
  • Programming and Computer Science
Replies
25
Views
1K
  • Programming and Computer Science
Replies
29
Views
1K
  • Programming and Computer Science
4
Replies
107
Views
5K
  • Programming and Computer Science
Replies
17
Views
2K
  • Programming and Computer Science
Replies
2
Views
975
  • Programming and Computer Science
Replies
20
Views
1K
  • Programming and Computer Science
Replies
5
Views
1K
  • Programming and Computer Science
Replies
2
Views
1K
  • Programming and Computer Science
Replies
9
Views
1K
Back
Top