Optimizing Hash Table Storage for Efficient Record Retrieval

JonnyG · May 2, 2021

I understand hashing (at least in theory), but I am struggling with actual implementation. I was wondering what the best way to go about this problem was, and this just led to more questions. Now I am confused.

1) The database has to be stored on disk. Let's say it's stored as a folder of individual files, with each file containing a record. But where is the actual hash table stored? The question states that each record is to be 128 bytes long and that the hash table should hold about 100 records. So the hash table will be about 12.5 MB in size. I can fit this in main memory but the question asks me to store the hash table on disk. But then every time I want to search for a record, I would have to make a disk access, which would be relatively slow.

Wouldn't it be better if the hash table was stored in main memory as an array of references to the files instead? (maybe store a string that contains the path and filename). A string would be 32 bytes (I used the sizeof operator in c++ to check this) and so an array of 100 strings would easily fit into main memory. This would mean that there are no disk accesses when searching in the hash table and so the program should be faster and would take up less space than if I store the entire actual records in the array (where each record is 128 byes).

Am I thinking about this the wrong way?

FactChecker · May 3, 2021

The hash table can be stored on disk when it is modified and saved, but it can be loaded into memory for interactive use.

Mark44 · May 3, 2021

JonnyG said:

Let's say it's stored as a folder of individual files, with each file containing a record.

That's not reasonable -- it should be stored as a single file that contains the 27 buckets with their 216 records.

JonnyG said:

But where is the actual hash table stored? The question states that each record is to be 128 bytes long and that the hash table should hold about 100 records. So the hash table will be about 12.5 MB in size.

It's stored on disk -- that's one of the requirements. The idea here is to have an idea of how to work with data that could conceivably be too large to fit into main memory.

JonnyG said:

Wouldn't it be better if the hash table was stored in main memory as an array of references to the files instead?

No. For one thing your program would need to keep track of the multitude of files. For another thing, your program really needs to follow the requirements that are given.

FactChecker · May 3, 2021

The hash table and data should be stored as a single file that is read in, modified in memory, and saved when appropriate.

Tom.G · May 3, 2021

Hint: fseek() command.

Optimizing Hash Table Storage for Efficient Record Retrieval

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Bending Stress and Shear Stress

Four L-shaped members: Mechanical Analysis Problem

Engineering Full bridge circuit with inductor and resistor

Engineering AM-AM and AM-PM graph generation in LTSpice

Engineering Joint and Marginal Distributions of a Randomly Selected Test Answer

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers