How Does the Middle Square Method Hash Function Work?

MrWarlock616 · Oct 2, 2013

I came across "Data Structures and Algorithms with Object-Oriented Design Patterns in C++" by Bruno R. Preiss on the internet. I'm very confused with the theory on a Hashing method called the middle square method.

Link to the topic in the book

The author states that for a given key x, the Hash function is ##h(x) = \frac{M}{W}(x^2 \mod W)## where W=2^w where w=word size of the machine, and M is the hash table size. He goes on to say that this can be implemented by bit shifting the square of the key to the right w-k times, since W/M = 2^w-k, which gives us the middle bits of the square of the key. How is that?

What I don't get is how is this hash function not supposed to return 0 for every key x? I tried it myself on a program and it didn't work! The reason stated in the book also doesn't make sense, that h(x) = 0 for the sets

## \{ x \in\mathbb Z : x< \sqrt{\frac{W}{M}} \} ##

and ## \{ x\in\mathbb Z^+ : x=n\sqrt{W}, n\in\mathbb Z^+ \} ##

WHY?!? Somebody please help. :)

pbuk · Oct 7, 2013

MrWarlock616 said:

this can be implemented by bit shifting the square of the key to the right w-k times, since W/M = 2^w-k, which gives us the middle bits of the square of the key. How is that?

Well it's not actually the middle bits, it is bit w to bit w-k so perhaps the name is confusing you, but it's only a name.

Work through the algorithm with w=32 and k=10, so you are shifting 22 bits to the right. Try x=2048 (√2²²). Can you see what is happening now?

D H · Oct 7, 2013

You are using unsigned ints here. The product of two 32 bit unsigned integer is in general a 64 bit unsigned integer. However, the C/C++ rules for multiplying unsigned ints dictate that the product be computed modulo 2³². The hash algorithm uses x², which is truncated to 32 bits per those unsigned int multiplication rules. The high order bits are discarded by the multiplication rules. The hash algorithm itself shifts out the low order bits. What's left is the middle bits of x².

MrWarlock616 · Oct 7, 2013

D H said:

However, the C/C++ rules for multiplying unsigned ints dictate that the product be computed modulo 2³².

What does that mean?! Sorry, I'm only in second year.

phinds · Oct 7, 2013

MrWarlock616 said:

What does that mean?! Sorry, I'm only in second year.

Do you understand what "modulo" means? If you don't, you should look it up and make sure you get an understanding of it.

MrWarlock616 · Oct 7, 2013

No I know what it means, I just don't see how it applies here. I was never taught what you said

D H · Oct 7, 2013

If you weren't taught your education is deficient.

Suppose you're on a machine in which signed and unsigned ints are 32 bits long. Note: Neither the C nor C++ standard don't say that this must be the case. Both standards specify a minimum size, and both say that signed and unsigned ints must be of the same size. (The same also applies to signed long versus unsigned long, signed short versus unsigned short, etc.)

With a 32 bit signed int, the number 2³¹-1 is a valid value. What happens if you square this? The result is 18014398241046529, or 3blackf00000001 in hexadecimal. That won't fit in a 32 bit signed integer.

So what is the result on your computer, with your compiler? Surprisingly, the result is whatever the implementation wants it to be. All that the standards say is that signed integer overflow is undefined behavior. If an implementation ignores the overflow, that's OK. If it triggers a fatal error, that's OK, too. If it calls erase_main_drive() in response, well that's also OK as far as the standard is concerned. Anything in the standard tagged as undefined behavior means the vendor can do *anything* and still be compliant with the standard. Undefined behavior is the ultimate "get out of jail, free!" card to a compiler vendor.

That is not the case with signed integers. The standards are very clear on how signed integer arithmetic must work for an implementation to be compliant with the standards. Signed integer arithmetic must be computed using modulo 2^N arithmetic. With 32 bit integers, N=32, and with the given example, the result *must* be 0xf0000001 = 4026531841 or the implementation is broken.

One way to look at unsigned int arithmetic with 32 bit integers is that all unsigned int computations are performed as if the numbers were 64 bit unsigned integers, but the high order 32 bits are discarded with each computation.

MrWarlock616 · Oct 7, 2013

D H said:

If you weren't taught your education is deficient.

That's an understatement for the education system in India.

D H said:

Signed integer arithmetic must be computed using modulo 2^N arithmetic. With 32 bit integers, N=32, and with the given example, the result *must* be 0xf0000001 = 4026531841 or the implementation is broken.

One way to look at unsigned int arithmetic with 32 bit integers is that all unsigned int computations are performed as if the numbers were 64 bit unsigned integers, but the high order 32 bits are discarded with each computation.

So, that means if there is ever an overflow in integer numbers, then the result will be the actual result modulo 2^32? Then what if I want the actual result?

phinds · Oct 7, 2013

MrWarlock616 said:

So, that means if there is ever an overflow in integer numbers, then the result will be the actual result modulo 2^32? Then what if I want the actual result?

You have to use a "long" variable, or if it gets too big for that you'll have to use one of the math packs that allows for very large numbers.

MrWarlock616 · Oct 7, 2013

Oh yeah. Thanks for the explanation though. I really never knew about this modulo 2^32 rule. I'm not going to refer to this book unless it's really necessary. The author hasn't explained himself properly. Definitely not a good book for a beginner like me..

Thank you D H, and the others!

How Does the Middle Square Method Hash Function Work?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

Use of AI (ML/DL) in Science

Other than just FizzBuzz to test programmer candidates

Python Why does my loop run slower with larger lists in Python?

File Structure vs Data Structure

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight