Expected number of collisions for hashing with chaining

CGandC · Nov 11, 2022

Attempt:
Denote ## T## as the hash table, ## h ## as the hash function.
Denote ## n ## as the number of keys in the universe, and ## m ## as the size of hash table.

Order the set of keys from the universe as ## \{ k_1 ,..., k_n \} ## such that ## k_i \leq k_j ## where ## i \leq j ##.
Note that a collision occurs when we try to insert some key ## k ## into the table and it occurs that ## T[h(k)] \neq NULL ##.

Let ## X ## be a random variable denoting a key ( ## X(\omega) = \omega ## , ## \omega \in \Omega ## is a key ).
Let ## C(X) ## be a random variable denoting the number of collisions with ## X ##.

Let ## X_{ij} ## be an indicator random variable denoting the event that a collision between ## k_i ## and ## k_j ## occured, i.e. whether ## h(k_i) = h(k_j) ## occured or not.

Thus, ## \mathbb{E}[ C(X)] = \sum_{i=1}^n \mathbb{E}[ C(X) | X = i] \mathbb{P}(X=i) ## ## = \frac{1}{n} \sum_{i=1}^n \mathbb{E}[ C(X) | X = i] ##
( where ## \mathbb{P}(X=i) = \frac{1}{n} ## under the assumption that the element being collided is equally likely to be any of the ## n ## elements stored in the table ).

Note that ## C(X) | X=i ## is a random variable and ## C(X) | X=i ~~~~ = \sum_{i \neq j = 1 }^n X_{ij} ##.

Thus, ## \mathbb{E}[ C(X) | X=i] = \sum_{i \neq j = 1 }^n \mathbb{E}[ X_{ij}] = \sum_{i \neq j = 1 }^n \mathbb{P}( h(k_i) = h(k_j) ) ## ## = \sum_{i \neq j = 1 }^n \frac{1}{m} = \frac{n-1}{m} ##

* The sum went over all possible values of ## j ## besides the case where ## j = i ##, hence the ## n-1 ## in the sum .
* ## \mathbb{P}( h(k_i) = h(k_j) ) = \frac{1}{m} ## because of the assumption of simple uniform hashing .

Thus, ## \frac{1}{n} \sum_{i=1}^n \mathbb{E}[ C(X) | X = i] = \frac{1}{n} \cdot \frac{n(n-1)}{m} = \frac{n-1}{m} ##.
Hence, ## \mathbb{E}[ C(X)] = \frac{n-1}{m} ##, which is what we wanted.

Correct answer:
Denote as ## Y ## a random variable representing total number of collisions.
Note that ## Y= \sum_{i \neq j }^n X_{ij} ##.
Thus, ## \mathbb{E}[ Y] =\sum_{i \neq j }^n \mathbb{E}[ X_{ij}] = {n \choose 2} \cdot \frac{1}{m} = \frac{n(n-1)}{2m} ##

Question:
Obviously my answer is not correct since ## \frac{n-1}{m} \neq \frac{n(n-1)}{2m} ##, and I understood the correct answer and I agree with it, but I don't understand why my lines of reasoning in the attempt didn't yield the correct answer, what was wrong in my attempt that led to incorrect result?

Thanks in advance for any help!

pbuk · Nov 14, 2022

CGandC said:

what was wrong in my attempt that led to incorrect result?

You disappeared off into the woods after

CGandC said:

Let ## X_{ij} ## be an indicator random variable denoting the event that a collision between ## k_i ## and ## k_j ## occured, i.e. whether ## h(k_i) = h(k_j) ## occured or not.

Clearly ## P(X_{ij} = 1) = \dfrac 1 m ##, so all we have to do is count the number of ## X_{ij} ##s and remember not to count the same collision twice.

Expected number of collisions for hashing with chaining

1. What is the expected number of collisions for hashing with chaining?

2. How does the number of keys and buckets affect the expected number of collisions?

3. Can the expected number of collisions be reduced?

4. How does the load factor affect the expected number of collisions?

5. Is it possible to have zero collisions with hashing and chaining?

Similar threads

Hot Threads

Recent Insights