I was recently thinking about how GPUs can do enough calculations to make CPUs cry even though they run at lower clock speeds. My understanding is that this is because they can do so many operations in parallel because the operations are repetitions of many the same types of operations that are independent of each other. I was recently thinking that it would be very useful if regular, old CPUs could be designed similarly...of course, they would need programming languages designed for their functionality to take full advantage of it. Suppose you could have a CPU with many physical processing paths for both integer and floating point operations. You could blaze through many different instructions in a single clock cycle. The key thing to remembers is that the operations being performed in parallel must be independent of each other: there can be no overlap between the inputs and outpus of the instructions. In other words, the instructions can't operate on the same data, and the output from one operation can't be the input of another operation. Actually, langauges like Java (without pointers) could probably be adapted very easily since one could write a program that builds dependency trees for the statements in the source programs. However, in program languages that use pointers, this would be impossible to get 100% correct, as far as I can tell, because you could not tell if two instructions are reading/writing the same memory areas at compile time. Of course, pointers are very useful. So, I started thinking about the possibility of the programmer specifying which computations are independent of each other (or conversely, specifying which are dependent, which would probably be easier both for programmer and compiler). Another possibility is assigning statements to separate "parallel groups" sort of similar to threads, but different from threads in that statements in one block of code could be assigned to different parallel groups, whereas threads are used at a much higher level. Of course, this could work for multi-processor systems as well as parallel processing within single CPU cores. (The programmer would probably take a different approach in each situation, though, given bus speed limitations inherent in SMP sytems.) Are there any languages out there that allow the programmer to specify characteristics like this in existence? If not, what do you think of my ideas for parallel processing?