Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Question about a Hash Function

  1. Oct 2, 2013 #1
    I came across "Data Structures and Algorithms with Object-Oriented Design Patterns in C++" by Bruno R. Preiss on the internet. I'm very confused with the theory on a Hashing method called the middle square method.

    Link to the topic in the book

    The author states that for a given key x, the Hash function is ##h(x) = \frac{M}{W}(x^2 \mod W)## where W=2w where w=word size of the machine, and M is the hash table size. He goes on to say that this can be implemented by bit shifting the square of the key to the right w-k times, since W/M = 2w-k, which gives us the middle bits of the square of the key. How is that?

    What I don't get is how is this hash function not supposed to return 0 for every key x? I tried it myself on a program and it didn't work! The reason stated in the book also doesn't make sense, that h(x) = 0 for the sets

    ## \{ x \in\mathbb Z : x< \sqrt{\frac{W}{M}} \} ##

    and ## \{ x\in\mathbb Z^+ : x=n\sqrt{W}, n\in\mathbb Z^+ \} ##

    WHY?!? Somebody please help. :)
     
  2. jcsd
  3. Oct 7, 2013 #2
    Well it's not actually the middle bits, it is bit w to bit w-k so perhaps the name is confusing you, but it's only a name.

    Work through the algorithm with w=32 and k=10, so you are shifting 22 bits to the right. Try x=2048 (√222). Can you see what is happening now?
     
  4. Oct 7, 2013 #3

    D H

    User Avatar
    Staff Emeritus
    Science Advisor

    You are using unsigned ints here. The product of two 32 bit unsigned integer is in general a 64 bit unsigned integer. However, the C/C++ rules for multiplying unsigned ints dictate that the product be computed modulo 232. The hash algorithm uses x2, which is truncated to 32 bits per those unsigned int multiplication rules. The high order bits are discarded by the multiplication rules. The hash algorithm itself shifts out the low order bits. What's left is the middle bits of x2.
     
  5. Oct 7, 2013 #4
    What does that mean?! Sorry, I'm only in second year.
     
  6. Oct 7, 2013 #5

    phinds

    User Avatar
    Gold Member
    2016 Award

    Do you understand what "modulo" means? If you don't, you should look it up and make sure you get an understanding of it.
     
  7. Oct 7, 2013 #6
    No I know what it means, I just don't see how it applies here. I was never taught what you said
     
  8. Oct 7, 2013 #7

    D H

    User Avatar
    Staff Emeritus
    Science Advisor

    If you weren't taught your education is deficient.

    Suppose you're on a machine in which signed and unsigned ints are 32 bits long. Note: Neither the C nor C++ standard don't say that this must be the case. Both standards specify a minimum size, and both say that signed and unsigned ints must be of the same size. (The same also applies to signed long versus unsigned long, signed short versus unsigned short, etc.)

    With a 32 bit signed int, the number 231-1 is a valid value. What happens if you square this? The result is 18014398241046529, or 3blackf00000001 in hexadecimal. That won't fit in a 32 bit signed integer.

    So what is the result on your computer, with your compiler? Surprisingly, the result is whatever the implementation wants it to be. All that the standards say is that signed integer overflow is undefined behavior. If an implementation ignores the overflow, that's OK. If it triggers a fatal error, that's OK, too. If it calls erase_main_drive() in response, well that's also OK as far as the standard is concerned. Anything in the standard tagged as undefined behavior means the vendor can do *anything* and still be compliant with the standard. Undefined behavior is the ultimate "get out of jail, free!" card to a compiler vendor.

    That is not the case with signed integers. The standards are very clear on how signed integer arithmetic must work for an implementation to be compliant with the standards. Signed integer arithmetic must be computed using modulo 2N arithmetic. With 32 bit integers, N=32, and with the given example, the result *must* be 0xf0000001 = 4026531841 or the implementation is broken.

    One way to look at unsigned int arithmetic with 32 bit integers is that all unsigned int computations are performed as if the numbers were 64 bit unsigned integers, but the high order 32 bits are discarded with each computation.
     
  9. Oct 7, 2013 #8
    That's an understatement for the education system in India.

    So, that means if there is ever an overflow in integer numbers, then the result will be the actual result modulo 2^32? Then what if I want the actual result?
     
  10. Oct 7, 2013 #9

    phinds

    User Avatar
    Gold Member
    2016 Award

    You have to use a "long" variable, or if it gets too big for that you'll have to use one of the math packs that allows for very large numbers.
     
  11. Oct 7, 2013 #10
    Oh yeah. Thanks for the explanation though. I really never knew about this modulo 2^32 rule. I'm not gonna refer to this book unless it's really necessary. The author hasn't explained himself properly. Definitely not a good book for a beginner like me..

    Thank you D H, and the others!
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook




Similar Discussions: Question about a Hash Function
  1. Hash Functions (Replies: 4)

  2. Question about Java (Replies: 4)

Loading...