Optimizing Hash Table Storage for Efficient Record Retrieval

  • Context: Comp Sci 
  • Thread starter Thread starter JonnyG
  • Start date Start date
  • Tags Tags
    Table
Click For Summary

Discussion Overview

The discussion revolves around the implementation of a hash table for efficient record retrieval, specifically focusing on storage strategies for the hash table and associated records. Participants explore the implications of storing the hash table on disk versus in main memory, considering performance and compliance with given requirements.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant expresses confusion about the implementation of a hash table stored on disk, questioning whether it would be more efficient to store an array of references to files in main memory instead of the records themselves.
  • Another participant suggests that the hash table can be stored on disk when modified but should be loaded into memory for interactive use.
  • A different participant argues against storing records as individual files, proposing that the hash table and data should be stored as a single file containing multiple buckets and records.
  • Some participants emphasize the importance of adhering to the requirements of the problem, suggesting that the hash table should be stored on disk as specified.
  • A hint regarding the use of the fseek() command is provided, possibly to address file handling in the context of the discussion.

Areas of Agreement / Disagreement

Participants exhibit disagreement regarding the optimal storage strategy for the hash table and records, with multiple competing views on whether to store the hash table on disk or in memory. The discussion remains unresolved as participants present differing opinions on the implementation approach.

Contextual Notes

Participants highlight the need to consider performance implications of disk access versus memory access, as well as the requirement to manage data that may exceed main memory capacity. There are unresolved assumptions regarding the size and structure of the data and the specific requirements of the problem.

JonnyG
Messages
233
Reaction score
45
Homework Statement
Implement a database stored on disk using bucket hashing. Define records to
be 128 bytes long with a 4-byte key and 120 bytes of data. The remaining
4 bytes are available for you to store necessary information to support the
hash table. A bucket in the hash table will be 1024 bytes long, so each bucket
has space for 8 records. The hash table should consist of 27 buckets (total
space for 216 records with slots indexed by positions 0 to 215) followed by
the overflow bucket at record position 216 in the file. The hash function for
key value K should be K mod 213. (Note that this means the last three
slots in the table will not be home positions for any record.) The collision
resolution function should be linear probing with wrap-around within the
bucket. For example, if a record is hashed to slot 5, the collision resolution
process will attempt to insert the record into the table in the order 5, 6, 7, 0,
1, 2, 3, and finally 4. If a bucket is full, the record should be placed in the
overflow section at the end of the file.

Your hash table should implement the dictionary ADT of Section 4.4. When
you do your testing, assume that the system is meant to store about 100 or so
records at a time.
Relevant Equations
No relevant equations. I want to implement this hash table in c++
I understand hashing (at least in theory), but I am struggling with actual implementation. I was wondering what the best way to go about this problem was, and this just led to more questions. Now I am confused.

1) The database has to be stored on disk. Let's say it's stored as a folder of individual files, with each file containing a record. But where is the actual hash table stored? The question states that each record is to be 128 bytes long and that the hash table should hold about 100 records. So the hash table will be about 12.5 MB in size. I can fit this in main memory but the question asks me to store the hash table on disk. But then every time I want to search for a record, I would have to make a disk access, which would be relatively slow.

Wouldn't it be better if the hash table was stored in main memory as an array of references to the files instead? (maybe store a string that contains the path and filename). A string would be 32 bytes (I used the sizeof operator in c++ to check this) and so an array of 100 strings would easily fit into main memory. This would mean that there are no disk accesses when searching in the hash table and so the program should be faster and would take up less space than if I store the entire actual records in the array (where each record is 128 byes).

Am I thinking about this the wrong way?
 
Last edited:
Physics news on Phys.org
The hash table can be stored on disk when it is modified and saved, but it can be loaded into memory for interactive use.
 
JonnyG said:
Let's say it's stored as a folder of individual files, with each file containing a record.
That's not reasonable -- it should be stored as a single file that contains the 27 buckets with their 216 records.
JonnyG said:
But where is the actual hash table stored? The question states that each record is to be 128 bytes long and that the hash table should hold about 100 records. So the hash table will be about 12.5 MB in size.
It's stored on disk -- that's one of the requirements. The idea here is to have an idea of how to work with data that could conceivably be too large to fit into main memory.
JonnyG said:
Wouldn't it be better if the hash table was stored in main memory as an array of references to the files instead?
No. For one thing your program would need to keep track of the multitude of files. For another thing, your program really needs to follow the requirements that are given.
 
  • Like
Likes   Reactions: FactChecker
The hash table and data should be stored as a single file that is read in, modified in memory, and saved when appropriate.
 
  • Like
Likes   Reactions: JonnyG
Hint: fseek() command.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 21 ·
Replies
21
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 17 ·
Replies
17
Views
6K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K