How Does the Middle Square Method Hash Function Work?

  • Thread starter Thread starter MrWarlock616
  • Start date Start date
  • Tags Tags
    Function
Click For Summary

Discussion Overview

The discussion revolves around the middle square method as a hashing technique, specifically focusing on its implementation and behavior as described in a book on data structures and algorithms. Participants express confusion about the theoretical underpinnings, practical applications, and specific behaviors of the hash function.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Homework-related

Main Points Raised

  • One participant questions the validity of the hash function's output, specifically why it does not return 0 for every key, referencing the conditions given in the book.
  • Another participant clarifies that the term "middle bits" may be misleading and suggests that the relevant bits are actually from bit w to bit w-k.
  • A participant explains the implications of unsigned integer multiplication in C/C++, noting that the product of two 32-bit unsigned integers results in a 64-bit value, but is truncated to 32 bits due to language rules.
  • Concerns are raised about the undefined behavior of signed integer overflow in C/C++, with a detailed explanation of how this affects the results of calculations.
  • One participant expresses frustration with the educational system and the clarity of the book, suggesting it is not suitable for beginners.
  • Another participant suggests using a "long" variable or a math library for calculations that exceed standard integer limits.

Areas of Agreement / Disagreement

Participants express varying levels of understanding and agreement regarding the middle square method and its implementation. There is no consensus on the clarity of the book's explanations or the handling of integer overflow in programming.

Contextual Notes

Participants highlight limitations in their understanding of the hashing method and the implications of integer arithmetic in programming languages. The discussion reveals a reliance on specific definitions and assumptions that may not be universally understood.

Who May Find This Useful

Readers interested in hashing methods, programming language behavior regarding integer arithmetic, and those seeking clarification on data structure concepts may find this discussion beneficial.

MrWarlock616
Messages
160
Reaction score
3
I came across "Data Structures and Algorithms with Object-Oriented Design Patterns in C++" by Bruno R. Preiss on the internet. I'm very confused with the theory on a Hashing method called the middle square method.

Link to the topic in the book

The author states that for a given key x, the Hash function is ##h(x) = \frac{M}{W}(x^2 \mod W)## where W=2w where w=word size of the machine, and M is the hash table size. He goes on to say that this can be implemented by bit shifting the square of the key to the right w-k times, since W/M = 2w-k, which gives us the middle bits of the square of the key. How is that?

What I don't get is how is this hash function not supposed to return 0 for every key x? I tried it myself on a program and it didn't work! The reason stated in the book also doesn't make sense, that h(x) = 0 for the sets

## \{ x \in\mathbb Z : x< \sqrt{\frac{W}{M}} \} ##

and ## \{ x\in\mathbb Z^+ : x=n\sqrt{W}, n\in\mathbb Z^+ \} ##

WHY?!? Somebody please help. :)
 
Technology news on Phys.org
MrWarlock616 said:
this can be implemented by bit shifting the square of the key to the right w-k times, since W/M = 2w-k, which gives us the middle bits of the square of the key. How is that?
Well it's not actually the middle bits, it is bit w to bit w-k so perhaps the name is confusing you, but it's only a name.

Work through the algorithm with w=32 and k=10, so you are shifting 22 bits to the right. Try x=2048 (√222). Can you see what is happening now?
 
You are using unsigned ints here. The product of two 32 bit unsigned integer is in general a 64 bit unsigned integer. However, the C/C++ rules for multiplying unsigned ints dictate that the product be computed modulo 232. The hash algorithm uses x2, which is truncated to 32 bits per those unsigned int multiplication rules. The high order bits are discarded by the multiplication rules. The hash algorithm itself shifts out the low order bits. What's left is the middle bits of x2.
 
D H said:
However, the C/C++ rules for multiplying unsigned ints dictate that the product be computed modulo 232.
What does that mean?! Sorry, I'm only in second year.
 
MrWarlock616 said:
What does that mean?! Sorry, I'm only in second year.

Do you understand what "modulo" means? If you don't, you should look it up and make sure you get an understanding of it.
 
No I know what it means, I just don't see how it applies here. I was never taught what you said
 
If you weren't taught your education is deficient.

Suppose you're on a machine in which signed and unsigned ints are 32 bits long. Note: Neither the C nor C++ standard don't say that this must be the case. Both standards specify a minimum size, and both say that signed and unsigned ints must be of the same size. (The same also applies to signed long versus unsigned long, signed short versus unsigned short, etc.)

With a 32 bit signed int, the number 231-1 is a valid value. What happens if you square this? The result is 18014398241046529, or 3blackf00000001 in hexadecimal. That won't fit in a 32 bit signed integer.

So what is the result on your computer, with your compiler? Surprisingly, the result is whatever the implementation wants it to be. All that the standards say is that signed integer overflow is undefined behavior. If an implementation ignores the overflow, that's OK. If it triggers a fatal error, that's OK, too. If it calls erase_main_drive() in response, well that's also OK as far as the standard is concerned. Anything in the standard tagged as undefined behavior means the vendor can do *anything* and still be compliant with the standard. Undefined behavior is the ultimate "get out of jail, free!" card to a compiler vendor.

That is not the case with signed integers. The standards are very clear on how signed integer arithmetic must work for an implementation to be compliant with the standards. Signed integer arithmetic must be computed using modulo 2N arithmetic. With 32 bit integers, N=32, and with the given example, the result *must* be 0xf0000001 = 4026531841 or the implementation is broken.

One way to look at unsigned int arithmetic with 32 bit integers is that all unsigned int computations are performed as if the numbers were 64 bit unsigned integers, but the high order 32 bits are discarded with each computation.
 
D H said:
If you weren't taught your education is deficient.
That's an understatement for the education system in India.

D H said:
Signed integer arithmetic must be computed using modulo 2N arithmetic. With 32 bit integers, N=32, and with the given example, the result *must* be 0xf0000001 = 4026531841 or the implementation is broken.

One way to look at unsigned int arithmetic with 32 bit integers is that all unsigned int computations are performed as if the numbers were 64 bit unsigned integers, but the high order 32 bits are discarded with each computation.
So, that means if there is ever an overflow in integer numbers, then the result will be the actual result modulo 2^32? Then what if I want the actual result?
 
MrWarlock616 said:
So, that means if there is ever an overflow in integer numbers, then the result will be the actual result modulo 2^32? Then what if I want the actual result?

You have to use a "long" variable, or if it gets too big for that you'll have to use one of the math packs that allows for very large numbers.
 
  • #10
Oh yeah. Thanks for the explanation though. I really never knew about this modulo 2^32 rule. I'm not going to refer to this book unless it's really necessary. The author hasn't explained himself properly. Definitely not a good book for a beginner like me..

Thank you D H, and the others!
 

Similar threads

Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
31
Views
4K
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 69 ·
3
Replies
69
Views
9K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 80 ·
3
Replies
80
Views
10K
  • · Replies 1 ·
Replies
1
Views
2K