Implementing a hash table

  • Comp Sci
  • Thread starter JonnyG
  • Start date
  • #1
227
22
Homework Statement:
Implement a database stored on disk using bucket hashing. Define records to
be 128 bytes long with a 4-byte key and 120 bytes of data. The remaining
4 bytes are available for you to store necessary information to support the
hash table. A bucket in the hash table will be 1024 bytes long, so each bucket
has space for 8 records. The hash table should consist of 27 buckets (total
space for 216 records with slots indexed by positions 0 to 215) followed by
the overflow bucket at record position 216 in the file. The hash function for
key value K should be K mod 213. (Note that this means the last three
slots in the table will not be home positions for any record.) The collision
resolution function should be linear probing with wrap-around within the
bucket. For example, if a record is hashed to slot 5, the collision resolution
process will attempt to insert the record into the table in the order 5, 6, 7, 0,
1, 2, 3, and finally 4. If a bucket is full, the record should be placed in the
overflow section at the end of the file.

Your hash table should implement the dictionary ADT of Section 4.4. When
you do your testing, assume that the system is meant to store about 100 or so
records at a time.
Relevant Equations:
No relevant equations. I want to implement this hash table in c++
I understand hashing (at least in theory), but I am struggling with actual implementation. I was wondering what the best way to go about this problem was, and this just led to more questions. Now I am confused.

1) The database has to be stored on disk. Let's say it's stored as a folder of individual files, with each file containing a record. But where is the actual hash table stored? The question states that each record is to be 128 bytes long and that the hash table should hold about 100 records. So the hash table will be about 12.5 MB in size. I can fit this in main memory but the question asks me to store the hash table on disk. But then every time I want to search for a record, I would have to make a disk access, which would be relatively slow.

Wouldn't it be better if the hash table was stored in main memory as an array of references to the files instead? (maybe store a string that contains the path and filename). A string would be 32 bytes (I used the sizeof operator in c++ to check this) and so an array of 100 strings would easily fit into main memory. This would mean that there are no disk accesses when searching in the hash table and so the program should be faster and would take up less space than if I store the entire actual records in the array (where each record is 128 byes).

Am I thinking about this the wrong way?
 
Last edited:

Answers and Replies

  • #2
FactChecker
Science Advisor
Gold Member
6,213
2,408
The hash table can be stored on disk when it is modified and saved, but it can be loaded into memory for interactive use.
 
  • #3
34,884
6,624
Let's say it's stored as a folder of individual files, with each file containing a record.
That's not reasonable -- it should be stored as a single file that contains the 27 buckets with their 216 records.
But where is the actual hash table stored? The question states that each record is to be 128 bytes long and that the hash table should hold about 100 records. So the hash table will be about 12.5 MB in size.
It's stored on disk -- that's one of the requirements. The idea here is to have an idea of how to work with data that could conceivably be too large to fit into main memory.
Wouldn't it be better if the hash table was stored in main memory as an array of references to the files instead?
No. For one thing your program would need to keep track of the multitude of files. For another thing, your program really needs to follow the requirements that are given.
 
  • Like
Likes FactChecker
  • #4
FactChecker
Science Advisor
Gold Member
6,213
2,408
The hash table and data should be stored as a single file that is read in, modified in memory, and saved when appropriate.
 
  • #5
Tom.G
Science Advisor
3,819
2,513
Hint: fseek() command.
 

Related Threads on Implementing a hash table

Replies
6
Views
2K
Replies
6
Views
3K
  • Last Post
Replies
0
Views
2K
Replies
1
Views
659
  • Last Post
Replies
0
Views
2K
Replies
1
Views
894
Replies
9
Views
2K
Replies
3
Views
2K
  • Last Post
Replies
1
Views
4K
Replies
1
Views
2K
Top