Multithreaded File IO Homework: 3 Readers, 3 Writers, Circular Buffer

  • Thread starter Adyssa
  • Start date
  • Tags
    File
In summary, the conversation discusses the use of multiple threads to perform file IO using a circular buffer. The main challenge is figuring out how to synchronize the threads and utilize the circular buffer efficiently. The idea of having each thread handle a specific portion of the file is deemed inefficient, and the use of a counting semaphore is suggested to help with synchronization. The conversation also touches on the concept of parallelizing I/O to improve performance.
  • #1
Adyssa
203
3

Homework Statement



Perform file IO with 3 reader and 3 writer threads using a circular buffer. Read the (text) file in and write the same file out.

The Attempt at a Solution



I have an assignment that requires me to use multiple (3) threads to read data from a file, and multiple threads (again, 3) to write this data out to a new file, but the spec is very vague after that and I'm having trouble working out the logic. To compound the issue, I need to use a circular buffer which I have not implemented before.

I'd just like to understand this problem better before I get to coding.

Would it make sense to have each reader thread only concern itself with a portion of the file? So reader_thread_1 read()'s from the start of the file to 1/3 in, and reader_thread_2 reads from 1/3 to 2/3, with reader_thread_3 from 2/3 to the end, and writer_thread_1-3 working in a similar fashion? I will need to pass file offset information to each thread on creation.

With regards to the circular buffer, it seems I will need to wait for reader_thread_1 to complete a full read all the way to 1/3 in, at which time reader_thread_2 can start adding its lines to the buffer. Writer_thread_1 can start consumining lines as soon as they are available but how do I know when to stop, and start writer_thread_2? I'm a bit confused here.

In any case, does this seem like a logical path to take? I realize that this probably isn't the most efficient way to perform File IO but I guess it's an exercise in thread synchronisation first and foremost.
 
Physics news on Phys.org
  • #2
Are all three reader threads accessing the same text file?

One way to approach reading a file with one thread and writing with another is to use a mutex for the first thread, so the second thread would wait until the first thread is done filling a buffer in memory, then it would hand over control of the mutex to the second thread. The second thread would write to file and release the mutex back to the first thread, allowing it to continue reading. With this method, the reader would block the writer until it's filled its buffer or finished the whole file, and the writer would block the reader from re-filling the read buffer until the writer writes it all to file. Eg:

20mb file, 8mb buffer in memory
Reader enters critical section and reads 8mb, writer waits.
Reader exits critical section, writer enters critical section.
Writer writes 8mb from the reader, reader waits.

This continues until the reader reaches the end of the file, then the reader thread terminates, and the writer thread, allowed to resume with an empty read buffer, also terminates.

I don't know how or why you'd use three threads for a single file though.
 
  • #3
Yes, all 3 reader threads are reading the same file, and 3 writer threads are writing to the same output file. As I mentioned I think this is quite an inefficient way to do File IO but it's an assignment, and we are studying operating systems, processes, threads, etc

I've been reading up on semaphores, I'm going to need to use more than just a mutex I think.

If 3 threads are all reading from the same file, do I need to lock the file with a mutex if they all open it as read only? I think I can get away without one in that case. However, I will need to use one when writing to the shared buffer, and the writing threads will definitely need to lock it the output file.

I think a counting semaphore will help with the circular buffer too, where reader_thread_1 can increment the semaphore with each line it adds to the buffer, and writer_thread_1 can decrement it each time it grabs one, and then I start writer_thread_2 when it's done.

I have some test programs to write I think!
 
  • #4
Yes, a counting semaphore would probably be great with multiple threads. Increment after reading a buffer in a reader thread, decrement when writing a buffer in a writer thread.

Rather than have each thread read one third sequentially of the file, you might wish to have each reader thread read a buffer amount (say, 1mb) whenever it reads, so you would have 3 writers writing out the read buffers. This is where the circular buffer comes in: each reader reads the next "block" in order, then appends in order. If it can't append in order (i.e. block read pending somewhere between it and the head of the circular buffer), it'll spin until it can place its buffer as the head of the circular buffer. Each writer would lock, write and remove the tail of the circular buffer.
 
  • #5
Adyssa said:
Would it make sense to have each reader thread only concern itself with a portion of the file? So reader_thread_1 read()'s from the start of the file to 1/3 in, and reader_thread_2 reads from 1/3 to 2/3, with reader_thread_3 from 2/3 to the end, and writer_thread_1-3 working in a similar fashion? I will need to pass file offset information to each thread on creation.
You really don't want to do it this way unless the file is small. At the start you will have five threads waiting for the first reader to finish reading its share of the file, and at the end you once again only have one active thread. You have pretty much maximized this inherent lack of parallelism at start and end by reading and writing thirds of the file. Reading and writing one byte at a time will minimize this particular problem, but this single byte approach obviously isn't optimal either. There's a buffer size that represents a happy ground somewhere between these extremes.

Adyssa said:
As I mentioned I think this is quite an inefficient way to do File IO but it's an assignment, and we are studying operating systems, processes, threads, etc
Parallelizing your I/O can make the output appear to be cost free. For example, suppose the input file is on one drive, the output file on another. Having parallel readers and writers can halve the time needed to accomplish the task compared to that taken by a single threaded process.

That said, this mandated architecture is a bit silly. The point is to make you learn how to do threading, how to control what those threads are doing. You are moving from a subroutine model of programming to a coroutine model, and making that paradigm switch is perhaps easiest to accomplish when you are working on a problem that is a bit artificial.

I've been reading up on semaphores, I'm going to need to use more than just a mutex I think.
A state machine might be of help as well. Your readers are, at a minimum, waiting to read or reading. Depending on your architecture, a reader might also go into output modes to transfer its newly acquired data to some consumer of the data. A similar concept applies to your writers.
 
  • #6
OK I have it working with 1 reader and 1 writer thread, a mutex for the buffer and a counting semaphore to regulate production and consumption and it appears to be working well o_O my output file is correct. Now to add some more threads and see what happens. eek! haha
 
  • #7
I have been working on this for a week or so, and I have a problem. For the record, "fill count" and "empty count" are counting semaphores and I use two more semaphores as mutexes for each file (input and output). My logic is like this:

Spawn 3 readers, spawn 3 writers (all explicitly JOINABLE by setting an attribute variable). Each thread function is a while(1) loop with an if-clause for the termination condition that is executed each loop.

A reader thread will:

Code:
while(1)

lock the input file (sem down)
if(fgets a line == NULL)
{
    unlock the input file (sem up)
    set a global flag "reading complete"
    break;
}
decrement emptycount (sem down)
lock the shared buffer (sem down)
unlock the input file (sem up) /* the 2 locks overlap here to enforce line order */
insert the line into the buffer
unlock the shared buffer (sem up)
increment the fillcount (sem up)

when I break (above) I free the line resource and call pthread_exit(). The reader threads seem to work well, and they all exit appropriately when fgets == NULL.

My writer threads are not so well behaved. Their logic is similar:

Code:
while(1)

if(reading is complete && the buffer is empty)
{
    break; /* cleanup and pthread_exit() */
}
decrement fill count (sem down)
lock the shared buffer (sem down)
remove the next line from the buffer
lock the output file (sem down)
unlock the shared buffer (sem up) /* again the 2 locks overlap to enforce line order */
increment empty count (sem up)
insert the line into the output file
unlock the output file (sem up)

Now, here and there my writer threads will all finish as expected, and the output file is correct, but that is very much the exception rather than the rule! More often, my program hangs at pthread_join() waiting for either the 1st or 2nd writer thread to finish.

So it seems I have some kind of deadlock but I'm at a bit of a loss as to how to debug it. The writer threads are not terminating (gdb reports thread termination) but they are also not looping (I can use a printf in my loop and it will stop printing at this point), so I think they're stuck on one of the semaphores but which one, and how to find out?

Also, I'm not entirely sure that my method of overlapping semaphore locks (the buffer and file mutexes) to enforce line order is appropriate ... perhaps I should be using some other kind of method.

Sometimes when I do get the program to finish, I have a single line output twice (in a row) to the output file, which doesn't seem possible, but there it is ...

Oh, I can post some code, but since it's an assignment I decided not to at this stage. I think my pseudo-code should make it clear what's going on.
 
Last edited:
  • #8
Interestingly, I forgot to add this, if I comment out my pthread_join() calls for the writer threads, the program terminates without a problem, but I still sometimes get a single double line output to the file (only ever once, and always well into the middle of the file, never at the start or end). It's a 10000-odd line file (a movie script).
 

Related to Multithreaded File IO Homework: 3 Readers, 3 Writers, Circular Buffer

1. What is multithreaded file IO?

Multithreaded file IO refers to the process of using multiple threads simultaneously to read from or write to a file. This allows for more efficient and faster file operations compared to using a single thread.

2. What is the purpose of the 3 readers and 3 writers in the circular buffer?

The 3 readers and 3 writers in the circular buffer serve to read and write data to and from the buffer. This allows for multiple operations to occur simultaneously, improving efficiency and reducing wait times.

3. How does the circular buffer work in the multithreaded file IO homework?

The circular buffer in the multithreaded file IO homework works by using a fixed-size buffer to store data. As data is read or written, the buffer moves in a circular manner, allowing for continuous and efficient data transfer.

4. What are the benefits of using multithreaded file IO?

Using multithreaded file IO can provide several benefits, including faster file operations, improved efficiency, and reduced wait times. It also allows for multiple files to be accessed simultaneously, increasing productivity.

5. Are there any potential drawbacks to using multithreaded file IO?

While multithreaded file IO can offer many advantages, there are also some potential drawbacks. These can include increased complexity in coding and debugging, as well as the potential for data corruption if not implemented correctly.

Similar threads

  • Engineering and Comp Sci Homework Help
Replies
3
Views
4K
  • Programming and Computer Science
Replies
12
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
10
Views
1K
  • Programming and Computer Science
Replies
3
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
2K
  • Programming and Computer Science
Replies
10
Views
25K
  • Engineering and Comp Sci Homework Help
Replies
3
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
2K
  • Programming and Computer Science
Replies
15
Views
1K
  • Programming and Computer Science
Replies
2
Views
3K
Back
Top