Why cant I create 1 million threads?

  • Thread starter Thread starter camel-man
  • Start date Start date
  • Tags Tags
    Threads
Click For Summary

Discussion Overview

The discussion revolves around the limitations and behaviors of thread creation in Java applications, particularly focusing on why a program cannot create one million threads. Participants explore various factors influencing thread management, including operating system constraints, thread stack size, and the implications of excessive thread creation.

Discussion Character

  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant notes that the operating system may impose undocumented limits on the number of threads a program can create or report.
  • Another participant suggests that creating a large number of threads could indicate poor application design, as it is uncommon for applications to require more than 100 threads simultaneously.
  • Concerns are raised about the costs associated with context switching, where the CPU spends more time switching between threads rather than executing tasks, potentially leading to thrashing.
  • Participants discuss the concept of thrashing, explaining that operating systems are designed to prevent excessive thread creation to avoid performance degradation.
  • One participant highlights that each thread requires its own execution stack, and with a default stack size of 1024k in Java, creating one million threads would require a significant amount of memory, potentially exceeding available resources.
  • Another participant points out that many threads may be I/O suspended, waiting for input, which could affect the active thread count at any given time.
  • There is mention of the operating system possibly maintaining a long queue of inactive threads, which may not be reflected in real-time thread counts.

Areas of Agreement / Disagreement

Participants express various viewpoints on the limitations of thread creation, with some agreeing on the implications of excessive threads leading to thrashing, while others propose alternative scenarios where many threads may not be active. The discussion remains unresolved regarding the exact reasons for the inability to create one million threads.

Contextual Notes

Participants note limitations related to stack size and memory requirements, as well as the potential for performance issues due to context switching and cache misses, but do not resolve these complexities.

camel-man
Messages
76
Reaction score
0
I have code written in java and I am fluxuating the parameters of how many threads I am creating, the OS accounts for threads up to 20,000 and I can see on the task manager that the number of running threads increases. Then how come when I put in 1 million the program still runs but the number of threads remains constant at the normal operating level? it doesn't go past the standard maintenace of 1000 threads running? I don't understand.
 
Technology news on Phys.org
You haven't specified which OS you're using, but there may be a (possibly undocumented) limit on the number of threads a single program can create - or on the number that can be reported by the OS.

It is uncommon for an application to spawn more than 100 threads at a time. And for most applications, creating that many threads would indicate a poor design.
 
Just because the OS shows thread number 20000 doesn't mean there are that many threads, chances are most earlier ones were long killed.

Switching is costly, having many threads can substantially slow down the computer, as CPU spends more time switching between threads, than doing the real job. That's why number of concurrent threads can be limited, like .Scott suggested. Those you tried to start can be in queue, waiting, or they were simply not started and ignored.
 
Excessive numbers of threads result in "thrashing" which is what Borek described. Operating Systems are designed to not allow unlimited thrashing and so will not attempt to run unlimited numbers of threads. I'm saying the same thing Borek said, just using more formal terminology.
 
phinds said:
Excessive numbers of threads result in "thrashing" which is what Borek described. Operating Systems are designed to not allow unlimited thrashing and so will not attempt to run unlimited numbers of threads. I'm saying the same thing Borek said, just using more formal terminology.

Thanks, sometimes I am limited by my English.
 
phinds said:
Excessive numbers of threads result in "thrashing" which is what Borek described. Operating Systems are designed to not allow unlimited thrashing and so will not attempt to run unlimited numbers of threads. I'm saying the same thing Borek said, just using more formal terminology.

The processor state with each thread must be maintained as a scheduler switches them out to do work. To do a switch, the registers for the thread have to be saved. The kernel registers have to be loaded, the scheduler comes up with the next thread to get some time, the kernel registers have to be saved. The next thread's registers have to be loaded.

So as the number of threads increases, the computer spends more and more of its time exchanging registers instead of doing work.
 
SixNein said:
So as the number of threads increases, the computer spends more and more of its time exchanging registers instead of doing work.

Exactly. The formal name for this process is, as you said before you edited you post, "context switching" and when it is done to excess it is called "thrashing". The name comes from the manual separation of grain kernels from their stalk by slapping them back and forth on stones, a process known as thrashing. Why this term is used to describe something that doesn't get any work done, or actually PREVENTS work from getting done, seems unclear since it does get work done. I think it's the moving back and forth part of the term that is what is being used to describe the computer process.
 
phinds said:
The name comes from the manual separation of grain kernels from their stalk by slapping them back and forth on stones, a process known as thrashing. Why this term is used to describe something that doesn't get any work done, or actually PREVENTS work from getting done, seems unclear since it does get work done. I think it's the moving back and forth part of the term that is what is being used to describe the computer process.
"Thrashing" was originally applied to mainframe systems with disc drive - the kind of disc drives that resembled automatic washing machines in bulk and overall appearance. The thrashing could be easily seen and heard to the computer operator as system cause the read/write heads to position back and forth to widely (several inches) separated cylinders.
 
phinds said:
Excessive numbers of threads result in "thrashing" which is what Borek described.
Not necessarily. For example, suppose each thread represents a remote connection of a person typing queries into some database. Almost all of the threads will be I/O suspended, waiting for slow human input. Only a small handful will be active, and most of those will be suspended as well, either waiting for the database or waiting for permission to access the database.

@camel-man, the reason you cannot create a million threads is stack size. Each thread has its own execution stack. The default with several Java Virtual Machines is 1024k per thread on a 64 bit processor. 1 million times 1024k -- that's a terabyte. You probably don't have that much disk space, period, let alone that much disk space set up as virtual memory.
 
Last edited:
  • #10
D H said:
Not necessarily. For example, suppose each thread represents a remote connection of a person typing queries into some database. Almost all of the threads will be I/O suspended, waiting for slow human input. Only a small handful will be active, and most of those will be suspended as well, either waiting for the database or waiting for permission to access the database.

Good point.
 
  • #11
D H said:
Not necessarily. For example, suppose each thread represents a remote connection of a person typing queries into some database. Almost all of the threads will be I/O suspended, waiting for slow human input. Only a small handful will be active, and most of those will be suspended as well, either waiting for the database or waiting for permission to access the database.

@camel-man, the reason you cannot create a million threads is stack size. Each thread has its own execution stack. The default with several Java Virtual Machines is 1024k per thread on a 64 bit processor. 1 million times 1024k -- that's a terabyte. You probably don't have that much disk space, period, let alone that much disk space set up as virtual memory.

Good point, only the heap is shared in a multithreading environment.

There can also be price to pay on cpu cache performance if the code is large enough since those threads will trigger lots of cache misses.
 
  • #12
The OS may well have a long queue of a million inactive threads waiting their turn. The report of 20,000 may be the number of threads loaded during the last sample time. With 1 msec allocation that time period would be 20 seconds. If the threads were all idle it might only take 50 usec to switch and check, that gives a sample reporting time closer to 1 second.
 
  • #13
The pid of a process is just an incrementing integer, I get pid's of 20000 or so fairly often, but most of the processes that had lower pid's are long gone. I guess the OS just keeps incrementing until it's safe to start again at some lower number.
 

Similar threads

  • · Replies 15 ·
Replies
15
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
Replies
7
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 17 ·
Replies
17
Views
3K
  • · Replies 11 ·
Replies
11
Views
4K
  • · Replies 15 ·
Replies
15
Views
3K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 37 ·
2
Replies
37
Views
5K