Why are hash maps constant time?

  • Thread starter Thread starter ergospherical
  • Start date Start date
  • Tags Tags
    Constant Time
Click For Summary
SUMMARY

The discussion centers on the time complexity of hash maps, specifically addressing the common assumption that hash map operations are O(1). Participants clarify that while hash maps can achieve O(1) average time complexity, this is contingent on factors such as the load factor and the quality of the hash function. The conversation highlights the importance of amortized time complexity and the limitations of big-O notation in practical applications, particularly when considering modern hardware and data structures. Additionally, the need for benchmarking specific applications is emphasized to validate performance claims.

PREREQUISITES
  • Understanding of big-O notation and algorithmic complexity
  • Familiarity with hash maps and their operational principles
  • Knowledge of amortized time complexity
  • Basic concepts of data structures, including vectors and linked lists
NEXT STEPS
  • Research the principles of hash function design for optimal performance
  • Learn about the impact of load factors on hash map efficiency
  • Explore benchmarking techniques for evaluating data structure performance
  • Investigate the differences between vectors and linked lists in various scenarios
USEFUL FOR

Software developers, computer scientists, and data structure enthusiasts seeking to deepen their understanding of hash map performance and algorithmic complexity in real-world applications.

ergospherical
Science Advisor
Homework Helper
Education Advisor
Insights Author
Messages
1,100
Reaction score
1,387
When analysing the time-complexity of an algorithm I was told to assume that a hash map is ##0(1)##. A cursory google search afterward revealed that this is true because the time complexity of the hash function is independent of the size of the hash table. That seems alright but if you have a finite number of bins ##B##, then if the number of elements ##n > B##, by the pidgeonhole principle at least one bin will be mapped to by more than one key - in which case the time complexity would then be ##0(n \log n)## right? And that's not even taking into account stuff like, say, the hashing function depending in some manner (say linearly) on the length of the key, et cetera. Why does a hash map turn out to be just ##0(1)##?
 
Technology news on Phys.org
The basic algorithm of a hashing function does not have ##n \gt B##, When it is true that ##n \gt B## then there are a variety of methods to handle it and they may have a variety of execution-time characteristics. Therefore, you were told to assume that a hash map is ##0(1)##. But that is just an assumption, not necessarily true in practice.
 
Last edited:
  • Like
Likes   Reactions: pbuk, ergospherical and jim mcnamara
In addition to the load factor (size relative to number of bins) a complexity of ##O(1)## also assumes a perfect hash function, which is usually not a trivial characteristics to achieve in practice for all use cases.

Also, let me add that the notation is called big-O notation, with O being the letter O and not the digit 0.
 
  • Like
Likes   Reactions: Wrichik Basu, FactChecker, ergospherical and 1 other person
Your intuition is right, they don't have a worst case time complxity of O(1) for lookups, they have an amortized time complexity of O(1).
 
Last edited:
  • Like
Likes   Reactions: ergospherical and pbuk
Welcome to the minefield of algorithmic complexity. There are two important lessons which are generally only presented at the end of an introductory course, if at all.
  1. What we are often interested in for practical applications is how long on average does it take to compute a solution for a typical input - this is the amortized time complexity to which @Jarvis323 refers above (although abusing the notation somewhat). Unfortunately all of the three common measures of algorithmic complexity are based on worst case run time.
  2. Even when we calculate theoretical algorithmic complexity for a non-trivial algorithm we make assumptions which are only valid within certain limits. For instance we sometimes assume that addition, or even multiplication, of integers is ## \Theta(1) ##. If we are talking about 32- or 64-bit integers then this may be true, however if we allow the domain to be unbounded then clearly this is not true. It gets even worse when we apply this to a practical application: an algorithm which is ## \Theta(n \log n) ## will take time proportional to ## n \log n ## only while the computation fits in L1 cache and then there will be discontinuity with a new constant of proportionality for L2 cache etc.
There is a useful discussion of this in the context of the ## O(1) ## assumption for a hash table at http://www.cs.cornell.edu/courses/cs3110/2011sp/Lectures/lec20-amortized/amortized.htm.
 
  • Like
  • Informative
  • Love
Likes   Reactions: jbunniii, Wrichik Basu, FactChecker and 2 others
Time complexity and big o notation is just an approximation. The way we teach it is broken as it assumes 1960s hardware that doesn't resemble the computers of today. On most machines, O(n) insertion into a vector is often faster than O(1) insertion into a linked list, due to the cache and pipeline.
 
  • Informative
Likes   Reactions: anorlunda
kalabren said:
On most machines, O(n) insertion into a vector is often faster than O(1) insertion into a linked list, due to the cache and pipeline.
I don't think this is correct. Evidence or reference?
 
No, BS is saying that TRAVERSING a vector OF INTEGERS (in order to find the insertion point) is often faster than traversing a linked list, and that this efficiency dominates the theoretically less efficient process of insertion, where FOR INTEGERS the inefficiency is mitigated by cache hits.

In the real world we are hardly ever dealing with lists of integers, and we often have much larger data structures, or entities that cannot be serialized to a fixed-length record at all. The post you linked considers this, showing that for the combined search and insert operation in the context tested, a vector was about twice as fast for 100,000 8-byte records, but for 100,000 128-byte records it was about 50% slower and for 100,000 4K records the vector was more than 20x slower than the list. Where the records are non-trivial data types (so they cannot simply be serialized into a fixed-length array element), again the vector is 20x slower for 100,000 records (the 'random_insert - 16 bytes' chart immediately before the 'Random Remove' section).

So rather than "On most machines, O(n) insertion into a vector is often faster than O(1) insertion into a linked list, due to the cache and pipeline" I think we can say that "On a machine with a cache that is larger than the dataset in question, the combined process of insertion of a record preceded by searching for an insertion point can be significantly faster for a vector than for a list, however for large datasets or where moving data is a non-trivial operation then the vector performance can be many times worse than a list."
 
Last edited:
  • Like
Likes   Reactions: anorlunda
  • #10
The real lesson from this is that profiling is essential if you really care about efficiency, and that we shouldn't trust the textbook like it was dogma.

As a sidebar, I would't put too much trust in the exact values in that report, since it was run on 2012 hardware. However it does demonstrate the key point.
 
  • #11
pbuk said:
I don't think this is correct. Evidence
I think it's correct but may be misleading, because computational complexity is mentioned but not relevant. Vectors are smaller and access is more predictable and hardware takes advantage of that.
 
  • #12
Vanadium 50 said:
I think it's correct...
Then I am afraid you are wrong. Insertion is almost always slower for vectors but searching followed by insertion may in certain circumstances be quicker for vectors.
 
  • #13
pbuk said:
wrong

pbuk said:
almost always
I don't think both of these can be true.

All I am saying is you tell me shat behavior you want to see, and I'll show you an edge case that does it.
 
  • #14
kalabren said:
All I am saying is you tell me shat behavior you want to see, and I'll show you an edge case that does it.
Yes of course, but any number of edge cases do not prove an "often", which is the statement I am refuting.
 

Similar threads

  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
9
Views
3K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K