- #1

cocoon

- 10

- 0

Hi, I'm reading Data Structures and Algorithm Analysis in Java (Second Edition) and I'm on page 178, if anyone owns the book. I know It's a computer science book and this is a math subforum, but the question is largely a mathematical one involving probability. You just need to understand what a Hash Table is. If you don't feel like reading dense prose, a hash table is basically a large array with a function that maps a key, which is usually a string or an integer, to an address (0 ... arraylength - 1). So it's a modulus function: [tex]hash(x) = g(x) \bmod T[/tex]. T is the length of the array. You simply add T to hash(x) is hash(x) is negative.

It's particulars aren't important; just know that I'm talking about a hash table that implements a random probing collision resolution strategy, with a hash function like: [tex]hash(x, i) = ( h(x) + f(i) ) \bmod T, f(0) = 0[/tex]

It removes the clustering problem of typical linear (where [tex]f(i) = A*i + B[/tex]) and, to a lesser extent, quadratic probing (where [tex]f(i) = A*i^2 + B*i + C[/tex]). In random probing, [tex]f(i)[/tex] is random, but is still a function. There is only one value for every i. This removes the clustering problem for small [tex]\lambda[/tex], the ratio of filled cells to total cells.

The books takes me through the steps to determine the "expected number of probes in an unsuccessful search," which is basically the average number of probes it takes to reach an empty cell (for an insertion or to indicate that the search was unsuccessful).

The author says "the fraction of empty cells is [tex]1 - \lambda[/tex]." I follow that. Then he concludes that "the number of cells we expect to probe is [tex]\frac{1}{1 - \lambda}[/tex]". This smells like a probability problem to me. My question, at last, is

It's particulars aren't important; just know that I'm talking about a hash table that implements a random probing collision resolution strategy, with a hash function like: [tex]hash(x, i) = ( h(x) + f(i) ) \bmod T, f(0) = 0[/tex]

*i*represents the i'th probe. If you hash a key and the cell is already filled, then you probe a second time (i = 2), and a third time (i = 3), etc... until you find an empty cell to place**in.***x*It removes the clustering problem of typical linear (where [tex]f(i) = A*i + B[/tex]) and, to a lesser extent, quadratic probing (where [tex]f(i) = A*i^2 + B*i + C[/tex]). In random probing, [tex]f(i)[/tex] is random, but is still a function. There is only one value for every i. This removes the clustering problem for small [tex]\lambda[/tex], the ratio of filled cells to total cells.

The books takes me through the steps to determine the "expected number of probes in an unsuccessful search," which is basically the average number of probes it takes to reach an empty cell (for an insertion or to indicate that the search was unsuccessful).

The author says "the fraction of empty cells is [tex]1 - \lambda[/tex]." I follow that. Then he concludes that "the number of cells we expect to probe is [tex]\frac{1}{1 - \lambda}[/tex]". This smells like a probability problem to me. My question, at last, is

**why?**
Last edited: