Avg. Num. Probes to Insert in Hash Table w/ Random Collision Res. Strat.

Click For Summary

Discussion Overview

The discussion revolves around the expected number of probes required to insert an element into a hash table using a random probing collision resolution strategy. Participants explore the mathematical underpinnings of this concept, particularly focusing on the probability aspects involved in determining the average number of probes needed for successful insertion or unsuccessful search.

Discussion Character

  • Exploratory
  • Mathematical reasoning

Main Points Raised

  • One participant explains the mechanics of hash tables and the random probing strategy, emphasizing the role of the load factor, λ, in determining the expected number of probes.
  • Another participant presents a breakdown of the total number of probes required for N inserts, illustrating how the expected number of probes can be derived from a geometric series.
  • A participant expresses confusion regarding the implications of λ being greater than 1 and its relation to the expected number of probes, questioning the interpretation of the formula as a ratio of total cells to empty cells.
  • One participant offers an intuitive perspective, suggesting that if a small fraction of cells are empty, it would take, on average, many attempts to find an empty cell.

Areas of Agreement / Disagreement

Participants demonstrate varying levels of understanding and interpretation of the mathematical concepts involved. While some agree on the basic principles of the expected number of probes, there remains uncertainty and confusion regarding specific values of λ and their implications.

Contextual Notes

Participants note the limitations of their understanding, particularly regarding the conditions under which the formula applies and the implications of different values of λ.

cocoon
Messages
10
Reaction score
0
Hi, I'm reading Data Structures and Algorithm Analysis in Java (Second Edition) and I'm on page 178, if anyone owns the book. I know It's a computer science book and this is a math subforum, but the question is largely a mathematical one involving probability. You just need to understand what a Hash Table is. If you don't feel like reading dense prose, a hash table is basically a large array with a function that maps a key, which is usually a string or an integer, to an address (0 ... arraylength - 1). So it's a modulus function: hash(x) = g(x) \bmod T. T is the length of the array. You simply add T to hash(x) is hash(x) is negative.

It's particulars aren't important; just know that I'm talking about a hash table that implements a random probing collision resolution strategy, with a hash function like: hash(x, i) = ( h(x) + f(i) ) \bmod T, f(0) = 0

i represents the i'th probe. If you hash a key and the cell is already filled, then you probe a second time (i = 2), and a third time (i = 3), etc... until you find an empty cell to place x in.

It removes the clustering problem of typical linear (where f(i) = A*i + B) and, to a lesser extent, quadratic probing (where f(i) = A*i^2 + B*i + C). In random probing, f(i) is random, but is still a function. There is only one value for every i. This removes the clustering problem for small \lambda, the ratio of filled cells to total cells.

The books takes me through the steps to determine the "expected number of probes in an unsuccessful search," which is basically the average number of probes it takes to reach an empty cell (for an insertion or to indicate that the search was unsuccessful).

The author says "the fraction of empty cells is 1 - \lambda." I follow that. Then he concludes that "the number of cells we expect to probe is \frac{1}{1 - \lambda}". This smells like a probability problem to me. My question, at last, is why?
 
Last edited:
Physics news on Phys.org
Suppose you do N inserts, for some large N.

All N inserts require at least one try.

N*\lambda inserts require a second try (because N*(1-\lambda) succeed right away.)

N*\lambda^2 inserts require a third try (because N*\lambda*(1-\lambda) hit an empty cell on the second try).

...

The total number of probes is N*(1+\lambda+\lambda^2+...) = \frac{N}{1-\lambda}.
 
Oh, ok that wasn't too hard. What if \lambda = 2? Oh wait \lambda <= 1 and if \lambda = 1, that'd be infinity, which is correct. Man... still don't get it. I still think of \frac{1}{1-\lambda} as (number of total cells) / (number of empty cells), but I guess I see it as a geometric series too.
 
You can see it intuitively like this. If only one in ten cells are empty, then any random process would take on average ten steps to hit upon an empty one.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 7 ·
Replies
7
Views
4K
  • · Replies 7 ·
Replies
7
Views
4K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K