| Thread Closed |
Parallel Programming Languages |
Share Thread |
| Jan11-05, 07:18 PM | #1 |
|
|
Parallel Programming Languages
I was recently thinking about how GPUs can do enough calculations to make CPUs cry even though they run at lower clock speeds. My understanding is that this is because they can do so many operations in parallel because the operations are repetitions of many the same types of operations that are independent of each other.
I was recently thinking that it would be very useful if regular, old CPUs could be designed similarly...of course, they would need programming languages designed for their functionality to take full advantage of it. Suppose you could have a CPU with many physical processing paths for both integer and floating point operations. You could blaze through many different instructions in a single clock cycle. The key thing to remembers is that the operations being performed in parallel must be independent of each other: there can be no overlap between the inputs and outpus of the instructions. In other words, the instructions can't operate on the same data, and the output from one operation can't be the input of another operation. Actually, langauges like Java (without pointers) could probably be adapted very easily since one could write a program that builds dependency trees for the statements in the source programs. However, in program languages that use pointers, this would be impossible to get 100% correct, as far as I can tell, because you could not tell if two instructions are reading/writing the same memory areas at compile time. Of course, pointers are very useful. So, I started thinking about the possibility of the programmer specifying which computations are independent of each other (or conversely, specifying which are dependent, which would probably be easier both for programmer and compiler). Another possibility is assigning statements to separate "parallel groups" sort of similar to threads, but different from threads in that statements in one block of code could be assigned to different parallel groups, whereas threads are used at a much higher level. Of course, this could work for multi-processor systems as well as parallel processing within single CPU cores. (The programmer would probably take a different approach in each situation, though, given bus speed limitations inherent in SMP sytems.) Are there any languages out there that allow the programmer to specify characteristics like this in existence? If not, what do you think of my ideas for parallel processing? |
| Jan11-05, 07:52 PM | #2 |
|
|
Your ideas have been around for at least four decades.
The Cray machines, Connection Machines, and other supercomputers extensively used vector processing (now called SIMD, or single-instruction, multiple-data computing). Even your lowly Pentium has a number of indepedent functional blocks for things like adding and multiplying. It is the compiler's job to order the instructions so to make the best use of those functional units, and the registers and cache on the chip. The processor's control path(s) then handle(s) assigning operands to each functional unit when your program is run. The language itself is frankly irrelevant. Modern compilers (particularly those designed for large supercomputing machines) will automatically identify parallelizable blocks and run them independently. - Warren |
| Jan11-05, 07:57 PM | #3 |
|
|
SSE, SSE2, and MMX are all SIMD CPU extentions for x86 processors. Altivec is one of the better features of recent G4 processors, and in my opinion, one of the better (if not the best) SIMD implementations. UltraSparcs have a more MMX-like implemention, called VIS.
|
| Jan11-05, 10:30 PM | #4 |
|
|
Parallel Programming LanguagesHowever, once that level of optimization becomes necessary it's generally just easiest to hand-code the instructions, rather than jumping through a bunch of hoops to try to trick the compiler into doing what you want. Introducing special optimization mechanisms into a language generally isn't helpful when you really, really need the extra speed; you have to step outside the language anyway. And when you don't need that extra speed they just make it easier to prematurely optimize a program. |
| Jan11-05, 11:19 PM | #5 |
|
|
------------------------------------ Apparently, the P4 has 2 DDR integer addition units and an integer multiplication unit and only 1 FP arithmetic unit. This allows some possibility for parallelization, but I'm thinking more on the order of dozens of similar units for different types of operations. The supercomputers that you mentioned seem to be very much what I am thinking of. I'll have to look into SIMD (includine SSE, etc. that graphic7 mentioned). It would be nice to have that kind of funcationality in typical home CPUs, as we do with typical home GPUs. |
| Jan12-05, 07:27 AM | #6 |
|
|
There are a number of extensions for all the more common programming languages that let you do parallel and distributed processing. there's even a whole newsgroup devoted to parallel computing discussions.
Some of the parallel computing links I keep in my bookmarks: PVM - Parallel Virtual Machine, been around for at least 10 years. MPI - Message Passing Interface, also been around at least 10 years ORCA Supercomputing and Parallel Computing at Yahoo! Lately most of the focus seems to be on distributed computing applications (spreading the computing workload over separate networked CPUs) rather than conventional parallel computing (spreading the computing workload over multple CPUs in the same computer) |
| Jan12-05, 08:48 AM | #7 |
|
|
|
| Jan12-05, 07:54 PM | #8 |
|
|
I'm not saying that I would write a program like that, it was just for demonstration. A compiler needs to be robust...errors are not acceptable. It needs to be able to deal with any input that fits the specification of the language. If the compiler tries to read the code and incorrectly applies an optimization technique, it breaks the program.
As I said, that was just a simple example. There could be much more complex programs that introduce aliasing without you even realizing (and sometimes you are forced into it), especially when you are talking about interacting with different software systems, such as the OS or different commercial packages included in programs like physics engines, sound system software, etc. |
| Jan12-05, 08:12 PM | #9 |
|
|
The most basic kind of compiler parses its input into a tree, then repeatedly applies a set of transformation rules that manipulate the tree, gradually moving it closer to the structure of the target language. Finally, the compiler's back-end walks the intermediate tree and emits the target language.
Since each of the manipulation steps is simple and well-defined (i.e. their effects and side-effects can be mathematically proven to be closed), the entire compilation is mathematically guaranteed. Now, truth be told, real compilers will do more than simple tree-manipulation to achieve high-quality output, but nearly all compilers begin with this kind of processing. - Warren |
| Jan12-05, 08:14 PM | #10 |
|
|
And generally, the issues you're talking about are not relevant to compilation. How programs or processes share information is not particularly relevant to compiler-level optimization -- after all, one program is certainly not going to pass another a pointer.
- Warren |
| Jan12-05, 08:57 PM | #11 |
|
|
Suppose you passed a pointer a function, and the function then used that value plus some offset to access a memory area. Suppose also that there is a loop (either inside or outside the function) that starts with that pointer and goes through some number of memory cells that it wants to try to parallelize with the first operation. To be safe, it would have to just not parallelize it. Otherwise, it opens itself up to errors. |
| Jan12-05, 09:09 PM | #12 |
|
|
Dissident Dan,
Right... the compiler would not be able to parallelize it. Which, as master_coda said, means that the programmer must bear the burden of writing code that can be optimized. Some code can't be parallelized, even though it can be shown mathematically to be identical to another representation which can be parallelized. - Warren |
| Jan12-05, 09:18 PM | #13 |
|
|
Thanks, everyone, for all your responses.
I have some questions about compiler optimizations and SIMDs. Do the /G3,/G4, etc. MSVC++ compiler options (or similar optimizations for other compilers), (which tell it to optimize for P-Pro - PIII and P4 architectures, respectively) automatically cause the compiler to try to apply MMX, SSE, SSE2 optimizations? I know that you can also do something like "arch:SSE" in MSVC++. If my first question is true, is either option more advantageous than the other in any regards? Also, I read a little about explicitly programming SSE instructions. Do programmers generally or ever achieve speed advantages by doing this by hand over the compiler optimizations? |
| Jan12-05, 09:30 PM | #14 |
|
|
Most of the time if your code can't be easily rewritten into a form the compiler can optimize then there's a good reason; optimizing it will probably break it. |
| Jan12-05, 09:43 PM | #15 |
|
|
Dissident Dan,
In my experience the MS compiler is not very good at identifying opportunities to use SSE. I have done quite a bit of hand-coding to write matrix arithmetic and digital filters with SSE, and was never able to get the compiler to generate code that came anywhere close to being as efficient. And yes, a hand-coded asm program that uses SSE for a FIR digital filter, for example, will run about 6x as fast as a hand-coded asm program that uses the normal integer unit. The SSE3 instructions can do as many as eight integer multiplications in one clock cycle, IIRC. - Warren |
| Jan12-05, 09:53 PM | #16 |
|
|
Setting "arch:SSE" or "arch:SSE2" will override any /Gx flags you specify. The compiler will generate SSE code if you specify them, although it probably won't do it as often as it could. In fact, MSVC++ probably won't use them very often at all; I've heard that it's SIMD optimizations are very poor. Intel's compiler is supposed to be much better, but I don't know anyone personally who has any experience optimizing with it. I have seen people get better performance hand-optimizing SIMD code (the weak MSVC++ optimizations probably have something to do with that). It's probably not useful as often with a better compiler, but there's almost always a extra speed that can be cranked out by hand it you need to. |
| Jan12-05, 09:53 PM | #17 |
|
|
As vectors are often treated like pointers, this could be limiting to optimization. I suppose that features such as SSE offer the best opportunity for relatively easy, comprehensible code. |
| Thread Closed |
Similar discussions for: Parallel Programming Languages
|
||||
| Thread | Forum | Replies | ||
| qualities of an international language? | Biology, Chemistry & Other Homework | 8 | ||
| Programming Languages | Programming & Comp Sci | 19 | ||
| New ideas for programming languages? | Programming & Comp Sci | 2 | ||
| Programming languages for numerical work | Academic Guidance | 15 | ||
| Languages | Social Sciences | 7 | ||