Comp Sci Expected number of collisions for hashing with chaining

CGandC · Nov 11, 2022

Attempt:
Denote ## T## as the hash table, ## h ## as the hash function.
Denote ## n ## as the number of keys in the universe, and ## m ## as the size of hash table.

Order the set of keys from the universe as ## \{ k_1 ,..., k_n \} ## such that ## k_i \leq k_j ## where ## i \leq j ##.
Note that a collision occurs when we try to insert some key ## k ## into the table and it occurs that ## T[h(k)] \neq NULL ##.

Let ## X ## be a random variable denoting a key ( ## X(\omega) = \omega ## , ## \omega \in \Omega ## is a key ).
Let ## C(X) ## be a random variable denoting the number of collisions with ## X ##.

Let ## X_{ij} ## be an indicator random variable denoting the event that a collision between ## k_i ## and ## k_j ## occured, i.e. whether ## h(k_i) = h(k_j) ## occured or not.

Thus, ## \mathbb{E}[ C(X)] = \sum_{i=1}^n \mathbb{E}[ C(X) | X = i] \mathbb{P}(X=i) ## ## = \frac{1}{n} \sum_{i=1}^n \mathbb{E}[ C(X) | X = i] ##
( where ## \mathbb{P}(X=i) = \frac{1}{n} ## under the assumption that the element being collided is equally likely to be any of the ## n ## elements stored in the table ).

Note that ## C(X) | X=i ## is a random variable and ## C(X) | X=i ~~~~ = \sum_{i \neq j = 1 }^n X_{ij} ##.

Thus, ## \mathbb{E}[ C(X) | X=i] = \sum_{i \neq j = 1 }^n \mathbb{E}[ X_{ij}] = \sum_{i \neq j = 1 }^n \mathbb{P}( h(k_i) = h(k_j) ) ## ## = \sum_{i \neq j = 1 }^n \frac{1}{m} = \frac{n-1}{m} ##

* The sum went over all possible values of ## j ## besides the case where ## j = i ##, hence the ## n-1 ## in the sum .
* ## \mathbb{P}( h(k_i) = h(k_j) ) = \frac{1}{m} ## because of the assumption of simple uniform hashing .

Thus, ## \frac{1}{n} \sum_{i=1}^n \mathbb{E}[ C(X) | X = i] = \frac{1}{n} \cdot \frac{n(n-1)}{m} = \frac{n-1}{m} ##.
Hence, ## \mathbb{E}[ C(X)] = \frac{n-1}{m} ##, which is what we wanted.

Correct answer:
Denote as ## Y ## a random variable representing total number of collisions.
Note that ## Y= \sum_{i \neq j }^n X_{ij} ##.
Thus, ## \mathbb{E}[ Y] =\sum_{i \neq j }^n \mathbb{E}[ X_{ij}] = {n \choose 2} \cdot \frac{1}{m} = \frac{n(n-1)}{2m} ##

Question:
Obviously my answer is not correct since ## \frac{n-1}{m} \neq \frac{n(n-1)}{2m} ##, and I understood the correct answer and I agree with it, but I don't understand why my lines of reasoning in the attempt didn't yield the correct answer, what was wrong in my attempt that led to incorrect result?

Thanks in advance for any help!

pbuk · Nov 14, 2022

CGandC said:

what was wrong in my attempt that led to incorrect result?

You disappeared off into the woods after

CGandC said:

Let ## X_{ij} ## be an indicator random variable denoting the event that a collision between ## k_i ## and ## k_j ## occured, i.e. whether ## h(k_i) = h(k_j) ## occured or not.

Clearly ## P(X_{ij} = 1) = \dfrac 1 m ##, so all we have to do is count the number of ## X_{ij} ##s and remember not to count the same collision twice.

Comp Sci Expected number of collisions for hashing with chaining

Thread 'Rectifier Meter'

Similar threads

Bending Stress and Shear Stress

Engineering Full bridge circuit with inductor and resistor

How Do I Draw This Shear and Moment Diagram?

Engineering AM-AM and AM-PM graph generation in LTSpice

PLL - How to find all the gains of a PI corrector and fix Ki ? MATLAB

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers