Question about a Hash Function

  • Thread starter MrWarlock616
  • Start date
  • Tags
    Function
In summary: This isn't exactly a summary, is it?In summary, the author states that for a given key x, the Hash function is ##h(x) = \frac{M}{W}(x^2 \mod W)## where W=2w where w=word size of the machine, and M is the hash table size. He goes on to say that this can be implemented by bit shifting the square of the key to the right w-k times, since W/M = 2w-k, which gives us the middle bits of the square of the key. How is that?Well it's not actually the middle bits, it is bit w to bit w-k so perhaps the name is confusing you
  • #1
MrWarlock616
160
3
I came across "Data Structures and Algorithms with Object-Oriented Design Patterns in C++" by Bruno R. Preiss on the internet. I'm very confused with the theory on a Hashing method called the middle square method.

Link to the topic in the book

The author states that for a given key x, the Hash function is ##h(x) = \frac{M}{W}(x^2 \mod W)## where W=2w where w=word size of the machine, and M is the hash table size. He goes on to say that this can be implemented by bit shifting the square of the key to the right w-k times, since W/M = 2w-k, which gives us the middle bits of the square of the key. How is that?

What I don't get is how is this hash function not supposed to return 0 for every key x? I tried it myself on a program and it didn't work! The reason stated in the book also doesn't make sense, that h(x) = 0 for the sets

## \{ x \in\mathbb Z : x< \sqrt{\frac{W}{M}} \} ##

and ## \{ x\in\mathbb Z^+ : x=n\sqrt{W}, n\in\mathbb Z^+ \} ##

WHY?!? Somebody please help. :)
 
Technology news on Phys.org
  • #2
MrWarlock616 said:
this can be implemented by bit shifting the square of the key to the right w-k times, since W/M = 2w-k, which gives us the middle bits of the square of the key. How is that?
Well it's not actually the middle bits, it is bit w to bit w-k so perhaps the name is confusing you, but it's only a name.

Work through the algorithm with w=32 and k=10, so you are shifting 22 bits to the right. Try x=2048 (√222). Can you see what is happening now?
 
  • #3
You are using unsigned ints here. The product of two 32 bit unsigned integer is in general a 64 bit unsigned integer. However, the C/C++ rules for multiplying unsigned ints dictate that the product be computed modulo 232. The hash algorithm uses x2, which is truncated to 32 bits per those unsigned int multiplication rules. The high order bits are discarded by the multiplication rules. The hash algorithm itself shifts out the low order bits. What's left is the middle bits of x2.
 
  • #4
D H said:
However, the C/C++ rules for multiplying unsigned ints dictate that the product be computed modulo 232.
What does that mean?! Sorry, I'm only in second year.
 
  • #5
MrWarlock616 said:
What does that mean?! Sorry, I'm only in second year.

Do you understand what "modulo" means? If you don't, you should look it up and make sure you get an understanding of it.
 
  • #6
No I know what it means, I just don't see how it applies here. I was never taught what you said
 
  • #7
If you weren't taught your education is deficient.

Suppose you're on a machine in which signed and unsigned ints are 32 bits long. Note: Neither the C nor C++ standard don't say that this must be the case. Both standards specify a minimum size, and both say that signed and unsigned ints must be of the same size. (The same also applies to signed long versus unsigned long, signed short versus unsigned short, etc.)

With a 32 bit signed int, the number 231-1 is a valid value. What happens if you square this? The result is 18014398241046529, or 3blackf00000001 in hexadecimal. That won't fit in a 32 bit signed integer.

So what is the result on your computer, with your compiler? Surprisingly, the result is whatever the implementation wants it to be. All that the standards say is that signed integer overflow is undefined behavior. If an implementation ignores the overflow, that's OK. If it triggers a fatal error, that's OK, too. If it calls erase_main_drive() in response, well that's also OK as far as the standard is concerned. Anything in the standard tagged as undefined behavior means the vendor can do *anything* and still be compliant with the standard. Undefined behavior is the ultimate "get out of jail, free!" card to a compiler vendor.

That is not the case with signed integers. The standards are very clear on how signed integer arithmetic must work for an implementation to be compliant with the standards. Signed integer arithmetic must be computed using modulo 2N arithmetic. With 32 bit integers, N=32, and with the given example, the result *must* be 0xf0000001 = 4026531841 or the implementation is broken.

One way to look at unsigned int arithmetic with 32 bit integers is that all unsigned int computations are performed as if the numbers were 64 bit unsigned integers, but the high order 32 bits are discarded with each computation.
 
  • #8
D H said:
If you weren't taught your education is deficient.
That's an understatement for the education system in India.

D H said:
Signed integer arithmetic must be computed using modulo 2N arithmetic. With 32 bit integers, N=32, and with the given example, the result *must* be 0xf0000001 = 4026531841 or the implementation is broken.

One way to look at unsigned int arithmetic with 32 bit integers is that all unsigned int computations are performed as if the numbers were 64 bit unsigned integers, but the high order 32 bits are discarded with each computation.
So, that means if there is ever an overflow in integer numbers, then the result will be the actual result modulo 2^32? Then what if I want the actual result?
 
  • #9
MrWarlock616 said:
So, that means if there is ever an overflow in integer numbers, then the result will be the actual result modulo 2^32? Then what if I want the actual result?

You have to use a "long" variable, or if it gets too big for that you'll have to use one of the math packs that allows for very large numbers.
 
  • #10
Oh yeah. Thanks for the explanation though. I really never knew about this modulo 2^32 rule. I'm not going to refer to this book unless it's really necessary. The author hasn't explained himself properly. Definitely not a good book for a beginner like me..

Thank you D H, and the others!
 

1. What is a hash function?

A hash function is a mathematical algorithm that takes an input (such as a string or number) and produces a fixed-length output called a hash value or hash code. This function is commonly used in computer science to map data of any size to a fixed-size output.

2. How does a hash function work?

A hash function takes an input and runs it through a series of mathematical operations to produce a unique hash value. The input can be of any length, but the output will always be the same fixed length. This allows for efficient storage and retrieval of data.

3. What is the purpose of a hash function?

The main purpose of a hash function is to provide a quick and efficient way to retrieve data without having to search through all the data in a database. It is also used for data integrity and security purposes, as any slight change in the input will result in a completely different hash value.

4. What are some common applications of hash functions?

Hash functions are widely used in computer science, including in data structures such as hash tables, cryptography, data compression, and checksums. They are also used in password storage, digital signatures, and file comparison.

5. What makes a good hash function?

A good hash function should be deterministic, meaning that for the same input, the output will always be the same. It should also have a low collision rate, meaning that different inputs should not produce the same output. A good hash function should also have a high avalanche effect, meaning that a small change in the input should result in a significantly different output.

Similar threads

  • Engineering and Comp Sci Homework Help
Replies
1
Views
809
  • Programming and Computer Science
Replies
2
Views
1K
Replies
3
Views
1K
  • Programming and Computer Science
Replies
7
Views
1K
  • Programming and Computer Science
Replies
31
Views
2K
  • Advanced Physics Homework Help
Replies
1
Views
671
  • Differential Equations
Replies
5
Views
591
  • Advanced Physics Homework Help
Replies
4
Views
859
  • Advanced Physics Homework Help
Replies
7
Views
1K
  • Programming and Computer Science
Replies
1
Views
598
Back
Top