Why Are Hotspots Rare in GFS with Sequential Reads of Large Multi-Chunk Files?

  • Context: Comp Sci 
  • Thread starter Thread starter shivajikobardan
  • Start date Start date
  • Tags Tags
    File Google System
Click For Summary
SUMMARY

In the Google File System (GFS), hotspots are infrequent due to the predominant use of large multi-chunk files read sequentially. Hotspots occur when multiple clients access the same small file, leading to contention at the chunkserver level. The GFS employs lazy space allocation, delaying physical space allocation until sufficient data is present, which mitigates hotspot issues. Consequently, the architecture of GFS allows for efficient data access patterns that reduce the likelihood of simultaneous read/write operations on the same chunk.

PREREQUISITES
  • Understanding of Google File System (GFS) architecture
  • Familiarity with lazy space allocation techniques
  • Knowledge of chunk size implications in distributed file systems
  • Basic concepts of contention and resource locking in computing
NEXT STEPS
  • Research "Google File System architecture and design" for deeper insights
  • Explore "Lazy space allocation in distributed systems" for advanced understanding
  • Study "Contention management techniques in file systems" to learn about resource locking
  • Investigate "Chunk size optimization strategies in GFS" for performance improvements
USEFUL FOR

Software engineers, system architects, and data engineers interested in optimizing data access patterns and understanding the performance characteristics of distributed file systems like Google File System.

shivajikobardan
Messages
637
Reaction score
54
Homework Statement
In Google File System,hotspots haven't been a major issue because our applications mostly read large multi chunk files sequentially. what it mean?
Relevant Equations
none
In Google File System,hotspots haven't been a major issue because our applications mostly read large multi chunk files sequentially. what it mean?

hotspot-: region of computer program where a high proportion of executed instructions occur

Lazy space allocation-:https://stackoverflow.com/questions/18109582/what-is-lazy-space-allocation-in-google-file-system

With lazy space allocation, the physical allocation of space is delayed as long as possible, until data at the size of the chunk size (in GFS's case, 64 MB according the 2003 paper) is accumulated.
Large chunk size in GFS-:
=>A large chunk size, even with lazy space allocation has its disadvantages.
=> A small file consists of a small number of chunks, perhaps just one.
=> The chunkservers storing those chunks may become hot spots if many clients are accessing the same file.
=> In practice hotspots haven't been a major issue because our applications mostly read large multi chunk files sequentially.
I don't understand how hotspots are no issue when we read large multi chunk files sequentially. They say hotspots are issue if clients are accessing same small file(file of just 1 chunk).

I will represent scenario where small file=small no. of chunks is being accesed by multiple clients.



it makes sense why chunkservers will be hotspot in this case as they will be active if they are being accessed by multiple clients.
but it absolutely doesn't make sense when the research paper say " In practice hotspots haven't been a major issue because our applications mostly read large multi chunk files sequentially." What's the difference. If I imagine a scenario like above, here file is made up of multiple chunks and rest is same, what difference is made here?
 
Physics news on Phys.org
Collision issues can occur when multiple clients try to read / write or append to a common file. When writing only one client is given permission to write and all others must wait until the operation is complete before they can access the file.
 
jedishrfu said:
Collision issues can occur when multiple clients try to read / write or append to a common file.
Alright I get this.
jedishrfu said:
When writing only one client is given permission to write and all others must wait until the operation is complete before they can access the file.
So what? I don't get this.
In Google File System,hotspots haven't been a major issue because our applications mostly read large multi chunk files sequentially. what it mean?
is my question
 
shivajikobardan said:
In Google File System,hotspots haven't been a major issue because our applications mostly read large multi chunk files sequentially. what it mean?
Because
shivajikobardan said:
our applications mostly read large multi chunk files sequentially
then the situation where multiple clients try to read or write the same chunk at the same time does not occur often so it has not been a major issue.
 
pbuk said:
Because

then the situation where multiple clients try to read or write the same chunk at the same time does not occur often so it has not been a major issue.
can you tell me why this. I have one example but I prefer listening to your idea.
 
Hmm so we are giving you helpful suggestions here and you have an example but don't want to share until you hear someone else’s example first.

Thats not being very open. I would have provided my example which would get me even more comments but now I guess Ill just wait and see what happens.

If your example is proprietary to your work then I understand but must also say you should not be discussing work related stuff on the internet.
 
  • Haha
Likes   Reactions: shivajikobardan
jedishrfu said:
Hmm so we are giving you helpful suggestions here and you have an example but don't want to share until you hear someone else’s example first.

Thats not being very open. I would have provided my example which would get me even more comments but now I guess Ill just wait and see what happens.

If your example is proprietary to your work then I understand but must also say you should not be discussing work related stuff on the internet.
LOL what are you saying, why wouldn't I share it? It is here
Imagine you have a large barrel (file). In it, there is one tennis ball (chunk). Then, reach in blindfolded and grab the tennis ball (read file), Ok. Now put the ball back and get nine friends to join you. Then, have everyone grab the ball. There WILL be contention (hotspot). Now put 100 tennis balls into the barrel and you and your friends try to grab a ball. Most of the time, everyone will get a ball. Occasionally, there will be contention (hotspot) but it will be far less frequent.
 
  • Haha
Likes   Reactions: jedishrfu
It’s an interesting analogy though it’s unlikely that google chunks data in tennis balls. In filesystems or databases contention occurs when trying to update a specific resource. Locks are used to insure only one client may write to that resource.

It may be that Google logs some information as each client tries to read a given chunk which causes other clients to wait on that chunk. It may be that the web service that handles the reads has serialized the client requests which appears to the client as a wait. I’ve seen that in some web services but wouldn’t expect it in a Google service.

I’ve found this writeup on how it works so maybe you can find your answer there:

https://computer.howstuffworks.com/internet/basics/google-file-system.htm

and here’s a stackoverflow discussion on GFS hotspots

https://stackoverflow.com/questions...es-create-hot-spots-in-the-google-file-system
 
jedishrfu said:
It’s an interesting analogy though it’s unlikely that google chunks data in tennis balls. In filesystems or databases contention occurs when trying to update a specific resource. Locks are used to insure only one client may write to that resource.

It may be that Google logs some information as each client tries to read a given chunk which causes other clients to wait on that chunk. It may be that the web service that handles the reads has serialized the client requests which appears to the client as a wait. I’ve seen that in some web services but wouldn’t expect it in a Google service.

I’ve found this writeup on how it works so maybe you can find your answer there:

https://computer.howstuffworks.com/internet/basics/google-file-system.htm

and here’s a stackoverflow discussion on GFS hotspots

https://stackoverflow.com/questions...es-create-hot-spots-in-the-google-file-system
Hmm I didn't make it, someone from another forum did it. It clicked with my brain immediately.
 
  • Like
Likes   Reactions: jedishrfu

Similar threads

  • · Replies 5 ·
Replies
5
Views
3K
Replies
2
Views
3K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 4 ·
Replies
4
Views
7K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 7 ·
Replies
7
Views
4K
  • · Replies 9 ·
Replies
9
Views
4K