Avg. Num. Probes to Insert in Hash Table w/ Random Collision Res. Strat.

In summary, the conversation discusses the concept of a hash table and its implementation using random probing collision resolution strategy. The expected number of probes in an unsuccessful search is determined to be \frac{1}{1 - \lambda}, which can also be seen as a geometric series. The conversation also addresses the issue of \lambda being greater than 1, which would result in an infinite number of probes.
  • #1
cocoon
10
0
Hi, I'm reading Data Structures and Algorithm Analysis in Java (Second Edition) and I'm on page 178, if anyone owns the book. I know It's a computer science book and this is a math subforum, but the question is largely a mathematical one involving probability. You just need to understand what a Hash Table is. If you don't feel like reading dense prose, a hash table is basically a large array with a function that maps a key, which is usually a string or an integer, to an address (0 ... arraylength - 1). So it's a modulus function: [tex]hash(x) = g(x) \bmod T[/tex]. T is the length of the array. You simply add T to hash(x) is hash(x) is negative.

It's particulars aren't important; just know that I'm talking about a hash table that implements a random probing collision resolution strategy, with a hash function like: [tex]hash(x, i) = ( h(x) + f(i) ) \bmod T, f(0) = 0[/tex]

i represents the i'th probe. If you hash a key and the cell is already filled, then you probe a second time (i = 2), and a third time (i = 3), etc... until you find an empty cell to place x in.

It removes the clustering problem of typical linear (where [tex]f(i) = A*i + B[/tex]) and, to a lesser extent, quadratic probing (where [tex]f(i) = A*i^2 + B*i + C[/tex]). In random probing, [tex]f(i)[/tex] is random, but is still a function. There is only one value for every i. This removes the clustering problem for small [tex]\lambda[/tex], the ratio of filled cells to total cells.

The books takes me through the steps to determine the "expected number of probes in an unsuccessful search," which is basically the average number of probes it takes to reach an empty cell (for an insertion or to indicate that the search was unsuccessful).

The author says "the fraction of empty cells is [tex]1 - \lambda[/tex]." I follow that. Then he concludes that "the number of cells we expect to probe is [tex]\frac{1}{1 - \lambda}[/tex]". This smells like a probability problem to me. My question, at last, is why?
 
Last edited:
Mathematics news on Phys.org
  • #2
Suppose you do N inserts, for some large N.

All N inserts require at least one try.

[itex]N*\lambda[/itex] inserts require a second try (because [itex]N*(1-\lambda)[/itex] succeed right away.)

[itex]N*\lambda^2[/itex] inserts require a third try (because [itex]N*\lambda*(1-\lambda)[/itex] hit an empty cell on the second try).

...

The total number of probes is [tex]N*(1+\lambda+\lambda^2+...) = \frac{N}{1-\lambda}[/tex].
 
  • #3
Oh, ok that wasn't too hard. What if [tex]\lambda = 2[/tex]? Oh wait [tex]\lambda <= 1[/tex] and if [tex]\lambda = 1[/tex], that'd be infinity, which is correct. Man... still don't get it. I still think of [tex]\frac{1}{1-\lambda}[/tex] as (number of total cells) / (number of empty cells), but I guess I see it as a geometric series too.
 
  • #4
You can see it intuitively like this. If only one in ten cells are empty, then any random process would take on average ten steps to hit upon an empty one.
 

FAQ: Avg. Num. Probes to Insert in Hash Table w/ Random Collision Res. Strat.

1. How is the average number of probes determined in a hash table with random collision resolution strategy?

The average number of probes is determined by taking the total number of probes required for all insertions and dividing it by the total number of insertions. This provides an average value that represents the number of probes needed per insertion in the hash table.

2. What is a random collision resolution strategy?

A random collision resolution strategy is a method used to handle collisions in a hash table. It involves randomly selecting an empty slot in the hash table when a collision occurs, instead of following a specific pattern or algorithm. This helps to distribute the data evenly throughout the table and reduces the likelihood of further collisions.

3. How does a random collision resolution strategy differ from other collision resolution strategies?

Unlike other collision resolution strategies, such as linear probing or chaining, a random collision resolution strategy does not follow a specific pattern when handling collisions. This means that the location of the data in the hash table is not dependent on its key, but rather on where an empty slot happens to be at the time of insertion.

4. What impact does the load factor have on the average number of probes in a hash table with random collision resolution strategy?

The load factor, which is the ratio of the number of elements in the hash table to the total number of slots, can have a significant impact on the average number of probes. As the load factor increases, the likelihood of collisions also increases, which can result in a higher average number of probes needed for insertion.

5. How can the average number of probes be minimized in a hash table with random collision resolution strategy?

One way to minimize the average number of probes is to ensure that the load factor is kept low. This can be achieved by increasing the size of the hash table or implementing a resizing strategy. Additionally, using a more efficient hash function that distributes the data evenly can also help to reduce the number of collisions and probes needed for insertion.

Similar threads

Replies
1
Views
1K
Replies
6
Views
3K
Replies
2
Views
4K
Replies
2
Views
3K
Replies
6
Views
4K
Replies
1
Views
2K
Replies
125
Views
17K
Back
Top