# Question about a Hash Function

I came across "Data Structures and Algorithms with Object-Oriented Design Patterns in C++" by Bruno R. Preiss on the internet. I'm very confused with the theory on a Hashing method called the middle square method.

Link to the topic in the book

The author states that for a given key x, the Hash function is ##h(x) = \frac{M}{W}(x^2 \mod W)## where W=2w where w=word size of the machine, and M is the hash table size. He goes on to say that this can be implemented by bit shifting the square of the key to the right w-k times, since W/M = 2w-k, which gives us the middle bits of the square of the key. How is that?

What I don't get is how is this hash function not supposed to return 0 for every key x? I tried it myself on a program and it didn't work! The reason stated in the book also doesn't make sense, that h(x) = 0 for the sets

## \{ x \in\mathbb Z : x< \sqrt{\frac{W}{M}} \} ##

and ## \{ x\in\mathbb Z^+ : x=n\sqrt{W}, n\in\mathbb Z^+ \} ##

WHY?!? Somebody please help. :)

## Answers and Replies

pbuk
Science Advisor
Gold Member
this can be implemented by bit shifting the square of the key to the right w-k times, since W/M = 2w-k, which gives us the middle bits of the square of the key. How is that?
Well it's not actually the middle bits, it is bit w to bit w-k so perhaps the name is confusing you, but it's only a name.

Work through the algorithm with w=32 and k=10, so you are shifting 22 bits to the right. Try x=2048 (√222). Can you see what is happening now?

D H
Staff Emeritus
Science Advisor
You are using unsigned ints here. The product of two 32 bit unsigned integer is in general a 64 bit unsigned integer. However, the C/C++ rules for multiplying unsigned ints dictate that the product be computed modulo 232. The hash algorithm uses x2, which is truncated to 32 bits per those unsigned int multiplication rules. The high order bits are discarded by the multiplication rules. The hash algorithm itself shifts out the low order bits. What's left is the middle bits of x2.

However, the C/C++ rules for multiplying unsigned ints dictate that the product be computed modulo 232.
What does that mean?! Sorry, I'm only in second year.

phinds
Science Advisor
Gold Member
What does that mean?! Sorry, I'm only in second year.

Do you understand what "modulo" means? If you don't, you should look it up and make sure you get an understanding of it.

No I know what it means, I just don't see how it applies here. I was never taught what you said

D H
Staff Emeritus
Science Advisor
If you weren't taught your education is deficient.

Suppose you're on a machine in which signed and unsigned ints are 32 bits long. Note: Neither the C nor C++ standard don't say that this must be the case. Both standards specify a minimum size, and both say that signed and unsigned ints must be of the same size. (The same also applies to signed long versus unsigned long, signed short versus unsigned short, etc.)

With a 32 bit signed int, the number 231-1 is a valid value. What happens if you square this? The result is 18014398241046529, or 3blackf00000001 in hexadecimal. That won't fit in a 32 bit signed integer.

So what is the result on your computer, with your compiler? Surprisingly, the result is whatever the implementation wants it to be. All that the standards say is that signed integer overflow is undefined behavior. If an implementation ignores the overflow, that's OK. If it triggers a fatal error, that's OK, too. If it calls erase_main_drive() in response, well that's also OK as far as the standard is concerned. Anything in the standard tagged as undefined behavior means the vendor can do *anything* and still be compliant with the standard. Undefined behavior is the ultimate "get out of jail, free!" card to a compiler vendor.

That is not the case with signed integers. The standards are very clear on how signed integer arithmetic must work for an implementation to be compliant with the standards. Signed integer arithmetic must be computed using modulo 2N arithmetic. With 32 bit integers, N=32, and with the given example, the result *must* be 0xf0000001 = 4026531841 or the implementation is broken.

One way to look at unsigned int arithmetic with 32 bit integers is that all unsigned int computations are performed as if the numbers were 64 bit unsigned integers, but the high order 32 bits are discarded with each computation.

If you weren't taught your education is deficient.
That's an understatement for the education system in India.

Signed integer arithmetic must be computed using modulo 2N arithmetic. With 32 bit integers, N=32, and with the given example, the result *must* be 0xf0000001 = 4026531841 or the implementation is broken.

One way to look at unsigned int arithmetic with 32 bit integers is that all unsigned int computations are performed as if the numbers were 64 bit unsigned integers, but the high order 32 bits are discarded with each computation.
So, that means if there is ever an overflow in integer numbers, then the result will be the actual result modulo 2^32? Then what if I want the actual result?

phinds
Science Advisor
Gold Member
So, that means if there is ever an overflow in integer numbers, then the result will be the actual result modulo 2^32? Then what if I want the actual result?

You have to use a "long" variable, or if it gets too big for that you'll have to use one of the math packs that allows for very large numbers.

Oh yeah. Thanks for the explanation though. I really never knew about this modulo 2^32 rule. I'm not gonna refer to this book unless it's really necessary. The author hasn't explained himself properly. Definitely not a good book for a beginner like me..

Thank you D H, and the others!