Threading - Avoiding Thread Conflicts

  • Thread starter Thread starter twoski
  • Start date Start date
  • Tags Tags
    Thread
AI Thread Summary
The discussion revolves around implementing a thread pool in Java without using built-in utilities, focusing on safely managing access to a task queue. The user experiences exceptions when increasing the number of threads, primarily due to concurrent access to an empty queue. Solutions discussed include using synchronized methods and considering wait/notify mechanisms for thread safety, although the necessity of these methods is debated. Performance optimization is also a key concern, with users noting that increasing threads does not always lead to faster execution, especially if the number of threads exceeds the hardware's parallelism capabilities. Ultimately, the assignment emphasizes writing correct, thread-based code while maintaining performance, without introducing unnecessary complexity.
twoski
Messages
177
Reaction score
2

Homework Statement



I have to write a simple program that uses threads to execute a queue of arbitrary tasks.

The more threads i make, the better it should perform (theoretically).

I have to simulate a thread pool without using any of Java's built in utilities for threads.

The Attempt at a Solution



My current attempt seems to work if i use only 3 threads. If i increase the number of threads, i end up getting exceptions caused by threads trying to access an empty queue.

How can i make my threads safely access the queue? I assume i need to do something with locks or somesuch.

I am also not sure what implementing a thread pool means. Is it sufficient to just store all my threads in some sort of list then call join() on all of them?

Here is relevant code: http://pastebin.com/Cbj64t4k
 
Last edited by a moderator:
Physics news on Phys.org
For this assignment we are not allowed to use anything from java.util.concurrent.

With that in mind, how could i implement a thread safe list type data structure? I assume i would call wait() before doing any operations on the list so that a thread can only do something if the list has an item in it. Would i just call notifyAll on the list if it holds 1 or more items after i remove one?

alternatively: http://docs.oracle.com/javase/tutorial/essential/concurrency/syncmeth.html

I could just make the actual method that checks the list and then removes an item synchronized so that only one thread at a time may access the list. Is that preferable?
 
Last edited:
I think i may have figured something out here.

Code:
        private synchronized Task GetTask()
	{
		if (!tasks.isEmpty())
		{
			Task t = (Task) tasks.remove();
			return t;
		}
		
		return null;
	}
	
	public void run()
	{
		Task t;
		
		while ((t = GetTask()) != null)
		{
			t.Perform();
		}
		
		System.out.println("Thread " + this.id + " terminated safely!");
	}

In theory this should work but i still get exceptions caused by accessing an empty list. Strangely, the exceptions do not always occur. If i have 100 tasks and 5 threads, it works fine, but if i go with 500 tasks and 10 threads i get exceptions.
 
Last edited:
What makes you think that different threads executing their methods GetTask will synchronize with each other when accessing the shared tasks object?
 
I assumed that any method that is synchronized would synchronize regardless of being in a thread or not.

If i made a method external to the thread class which was synchronized and called this method from the thread class, would that work?
 
What does the keyword "synchronized", when used in the definition of a method, mean? Can this keyword be used differently, i.e., not in the definition of a method?
 
Ah, i created my own TaskList class and put the synchronized function within that class and now my program works.

Part of my assignment requires me to optimize the code such that a greater number of threads will result in faster code execution. Right now if i run the program with large data sets and only one thread, it takes 8ms. If i use 5 threads for the same data sets, i get a time of 9ms.

According to the assignment:

The actual difference will vary considerably due to (1) machine architecture (2) the specific
algorithm(s) used within the code.

With that in mind, how am i supposed to optimize this?
 
That requirement, as stated, is impossible to comply with. If the number of threads is much greater than the parallelism the underlying hardware can support, adding more threads will make the system slower, not faster.
 
  • #10
Wouldn't it also depend on the computer running the code and therefore be very subjective when it comes to optimization and speed gains?

I have some more snippets from the assignment handout. The teacher seems to think we can achieve close to linear speed increases, though i have no idea how. My code is already pretty barebones. I am using a Stack for my task list, but i don't think Stack.Pop is inefficient by any stretch.

The primary objective of the assignment is to write thread based code that will run
correctly. By “correct”, we mean that (1) multiple threads are created and they concurrently process
the task list, (2) you produce the proper results (e.g., with the FIND and ADD operations), and (3) the
application does not crash. The grader will evaluate this using a series of tests cases (i.e., various test
parameters. You do not get a copy of the test parms beforehand ).
One thing that you will probably realize, however, is that concurrent doesn’t always mean parallel. In
other words, if your computer has more than one core (as almost all do nowadays), you will see that
running the code with 2 or 4 threads may not make the code any faster than when it is run with one
thread. That can be a little discouraging after all of this work. In fact, this is quite common with parallel
or multi-threaded applications. This is why they are quite difficult to write (well) in practice.
So the last 10% of your grade for this assignment will involve improving the performance of the
application. Ideally, we want to get something closer to linear speedup (e.g., 2 X faster if you have 2
cores). Note that you may not see truly linear speedup since there is typically overhead associated with
thread creation and scheduling. In any case, to improve performance, you can “tweak” the code any way
that you like (algorithms, data structures, etc.), as long as the command line parameters do not change
and the final output is the same

edit:

With very large data sets, my program takes 75ms using 1 thread and 38ms with 2 threads. So i guess it depends on the size of the data set being used.
 
Last edited:
  • #11
That sounds more reasonable as a requirement.

Without knowing what that "Perform" thing does, however, I do not think I can say anything specific. Non-specifically, I can say that it should minimize or ideally avoid interaction with other threads. Interaction means any sort of thing when it has to wait for some other threads to do something, which includes access to "synchronized" resources.
 
  • #12
Well in order for my program to be thread safe wouldn't i have to implement blocking in some way to protect the shared resource (which is the task list in this case)?

Perform is a method which just checks if a static TreeSet contains an arbitrary value... I assume it was just meant to be some arbitrary operation which would take time to complete so we could see meaningful speed differences with larger data sets.

The only blocking taking place is when a thread accesses the stack of tasks in order to pop() and retrieve a task to execute. This would require blocking in order to avoid having threads call pop() on an empty stack.

So in my TaskList class i have:

Code:
public synchronized Task GetTask()
{
	if (!tasks.empty())
	{
		return tasks.pop();
	}
		
	return null;
}
 
  • #13
If the only thread interaction is in the GetTask() method, then your code is close to optimal in that case. I would not attempt any further optimization unless a very specific optimization target was set.
 
  • #14
I have not read every detail you have provided in this thread, but it seems that your code is not waiting (i.e. invoking one of the wait methods) in order to retrieve a new task. If so, that means that all worker threads will continuously poll (also known as busy-wait resulting in 100% CPU usage) until the queue releases a task for them and you should probably consider if this is acceptable or not in the solution for your assignment. In real life, this would of course not be acceptable and one would normally use a wait/notify exchange to make sure that threads only wake up when a task enters the queue. Note however, that it is easy to code an incorrect wait/notify exchange so you should make sure you understand the mechanism first or at least read some guides on how to do it before you start coding it.
 
  • #15
I've looked at a handful of tutorials on wait/notify and they're all pretty confusing. This code seems to get stuck after the first thread is made.

Code:
       public void run()
	{
		boolean processing = true;
		
		while (processing)
		{
			synchronized (tasks)
			{
				try 
				{
					tasks.wait();
				} 
				catch (InterruptedException e) 
				{
					e.printStackTrace();
				}
			}
			
			Task t = tasks.GetTask();
			
			if (t != null)
			{
				t.Perform(id, debug);
			}
			else
			{
				processing = false;
			}
			
			/*while ((t = tasks.GetTask()) != null)
			{
				t.Perform(id, debug);
			}*/
			
			tasks.notifyAll();
		}
		
		System.out.println("Thread " + this.id + " terminated.");
	}

I don't quite understand the program flow. Every example seems to call wait() before anything else, but that means that every thread defaults to the waiting state when the program starts, right?

My understanding is that the code flow should go something like

Wait ---> Do Action Requiring Lock ---> Call NotifyAll

Should i be doing this inside a while loop?
 
  • #16
Yes, that code has to get stuck, because the thread first waits, then notifies. It is going to wait forever before it can notify (and unblock other threads).

More fundamentally, what are trying to achieve here? Yesterday I had an impression that the task list was populated before the threads were launched, and no new tasks were ever added afterwards. In which case, there is no reason to wait. What are going to wait for now?
 
  • #17
The task list is populated prior to creating the threads. Does this mean that waiting would be unnecessary?

I don't want to lose marks on this assignment and my code seems to run correctly... There is no mention of wait/notify in the assignment handout and my code is thread safe, but the real question i suppose is should i really be using wait/notify for this?
 
  • #18
Unless you can clearly state a reason why you should add waiting to your code, stay away from that. I do not think there is anything in the art of computer programming that is worse than adding code that does things for no apparent reason. It is called "cargo cult programming" (look this up).
 
  • #19
voko said:
Yesterday I had an impression that the task list was populated before the threads were launched, and no new tasks were ever added afterwards. In which case, there is no reason to wait.

Reading this thread more carefully (ok, pun intended for once) I agree that there is nothing in the material that twoski has presented that should indicate that this assignment is about constructing a general thread-safe task queue as I was rambling about in my previous post.

I apologize if my previous comment erroneously prompted twoski to get sidetracked into consider adding an unnecessary wait/notify coordination.
 

Similar threads

Back
Top