RISC: The Power Behind Gaming & Embedded Markets

FulhamFan3 · May 28, 2007

RISC processors dominate the embedded market and gaming consoles. Even Microsoft ditched the x86 processor in favor of PowerPC for the XBOX 360. In high performance computing 4 of the top 5 fastests supercomputers are based on the PowerPC.

So why has there been this aversion to RISC processors in the general market? Even Apple went to intel. What makes RISC so attractive for gaming and not fit for our world of word processors and email?

BoredNL · May 29, 2007

I wonder about this as well.

I don't see why a 128-bit PowerPC processor with 8 cores, all running at 3.2 ghz wouldn't be better than the x86 processors we're using now. Is there something I'm missing?

NoTime · May 29, 2007

Both multi core and RISC require intelligent optimization in order to make effective use of the capability.

My opinion
The "advantage" of RISC is that all instructions are single cycle as opposed to complex multi-cycle instructions on a "standard" uP.
Not really an advantage at all as when that complex instruction now gets implemented in the compiler it burns more machine cycles than the hardware complex instruction.
With the advent of single cycle "complex" instructions implemented in the current generation of hardware, I would say the concept of RISC is well past its prime.

Multi-core is still a meaningful concept, but well beyond the capabilities of the current generation of compilers to make any meaningful use of it.
OSs can make some use of this feature.
The major difference here being true parallel execution of a linear job stream as opposed to time slicing.

FulhamFan3 · May 29, 2007

NoTime said:

Both multi core and RISC require intelligent optimization in order to make effective use of the capability.

My opinion
The "advantage" of RISC is that all instructions are single cycle as opposed to complex multi-cycle instructions on a "standard" uP.
Not really an advantage at all as when that complex instruction now gets implemented in the compiler it burns more machine cycles than the hardware complex instruction.
With the advent of single cycle "complex" instructions implemented in the current generation of hardware, I would say the concept of RISC is well past its prime.

Multi-core is still a meaningful concept, but well beyond the capabilities of the current generation of compilers to make any meaningful use of it.
OSs can make some use of this feature.
The major difference here being true parallel execution of a linear job stream as opposed to time slicing.

That's interesting because I consistently saw RISC processors with lower clock speeds compared to CISC processors with higher clock speeds. And yet you say it uses more clock cycles to do the same work. Have I been duped this whole time?

AlephZero · May 29, 2007

NoTime said:

Multi-core is still a meaningful concept, but well beyond the capabilities of the current generation of compilers to make any meaningful use of it.

I would dispute that. The makers of parallel machines with large numbers of processors have had compilers than can automatically optimize code over multiple CPUS for 20 years now (Sun+Cray Research+Silicon Graphics have developed a pretty good compiler system - though some of their competitors managed to FUBAR their own attempts at writing one).

IMO the big problem with the "multi core" concept is that the memory bandwidth to the chip is unlikely to keep pace with sticking more cores in the chip. Also there are considerations like "cache trashing" where multiple copies of memory contents are cached for different processors, then one processor updates the value. It's (fairly) easy to design something that handles that and works, but not so easy to design something that still works fast. That's part of the reason why a 32-processor SGI box costs rather a lot of money, but it really can run a single "real-world" application (not an artificial benchmark) on 32 processors, 31.9 times as fast as on a single processor.

The other problems are that not all applications are suitable for multi-processor optimization, and (the biggest problem) the proportion of working programmers and system designers who have real practical experience of this is very small - and from my own experience, you can't become an expert by going on a 2-day appreciation course, it takes more like 2 years to really get your head around it.

As you say, the easy way to use multiple processors is to run multiple independent tasks - but for most "personal" computing, that's not a useful option. Re-inventing the mainframe doesn't seem a very sensible way to make progress.

chroot · May 29, 2007

AlephZero is correct. SGI, for example, has compilers that will automatically parallelize anything you throw at it. You don't need to learn how the architecture works -- just write normal, general-purpose C code, and the SGI compilers will parellelize it for you. I'm not qualified to say precisely how well this parallelization is performed, but it's hardly a new concept. Making good use of multiple processors is absolutely not "well beyond the capabilities" of current compilers... unless maybe you've only had experience with Visual Studio or something.

Personally, I'd be thrilled to see technologies like SGI's cache-coherent non-uniform memory access architecture (ccNUMA) make it onto desktop personal computers.

- Warren

AlephZero · May 29, 2007

chroot said:

AlephZero is correct. SGI, for example, has compilers that will automatically parallelize anything you throw at it. You don't need to learn how the architecture works -- just write normal, general-purpose C code, and the SGI compilers will parellelize it for you.

Been there, done that... don't believe everything you read in the marketing brochures though. There is a good Freudian reason why "parallelize" is often pronounced "paralyze"

As for how simple (or not) this technology really is, I've seen the source code of an SGI library routine to multiply two matrices stored in rectangular arrays. With all the options for different ways of optimizing the code depending on the size and shape of the matrices and the number of parallel threads, it was about 700 lines of source code in total - and I mean 700 lines of code, not 10 lines of code and 690 lines of comments. There's no way your average Joe Programmer is ever going to use this sort of hardware efficiently without that level of assistance.

NoTime · May 29, 2007

AlephZero said:

I would dispute that. The makers of parallel machines with large numbers of processors have had compilers than can automatically optimize code over multiple CPUS for 20 years now (Sun+Cray Research+Silicon Graphics have developed a pretty good compiler system - though some of their competitors managed to FUBAR their own attempts at writing one).

While I haven't worked with something like SGI, I have worked on both multitasking OS code and parallel hardware processing.
My understanding of these compilers is that they are limited primarily to array processing without programmer assistance.
Are you saying they are capable of more than that?
If so, I'm suitably impressed.
AFAIK it just won't work with the mix of instructions found in the typical business/personal application program.
That may be a moot point anyway since the typical application is normally I/O bound and the only thing that will speed it up is an increase in I/O channel speed.

AlephZero said:

The other problems are that not all applications are suitable for multi-processor optimization, and (the biggest problem) the proportion of working programmers and system designers who have real practical experience of this is very small - and from my own experience, you can't become an expert by going on a 2-day appreciation course, it takes more like 2 years to really get your head around it.

Yep! There is a tremendous amount of headwork involved in this. You can't simply throw code at the problem and hope some of it sticks. It takes a thorough understanding of your intended results and how to segment the problem into pieces that can be completed independently.
The people that can do this are few and far between.

AlephZero said:

As you say, the easy way to use multiple processors is to run multiple independent tasks - but for most "personal" computing, that's not a useful option. Re-inventing the mainframe doesn't seem a very sensible way to make progress.

The only two "personal" computing concepts, that I can think of that can make real use of multi-core are relational databases and video rendering. Four if you count sorting and plain table lookup.

NoTime · May 29, 2007

chroot said:

AlephZero is correct. SGI, for example, has compilers that will automatically parallelize anything you throw at it. You don't need to learn how the architecture works -- just write normal, general-purpose C code, and the SGI compilers will parellelize it for you.

Frankly, I'll see that when I believe it.

chroot said:

Personally, I'd be thrilled to see technologies like SGI's cache-coherent non-uniform memory access architecture (ccNUMA) make it onto desktop personal computers.

Maybe not, when cracking RSA keys becomes trivial

graphic7 · May 29, 2007

The MIPSpro compilers and whatever they use for their Altix systems are by no means the only compilers today that do auto-parallelization of code. The Intel and Sun compilers both support OpenMP, and the Sun compilers (Sun Studio) support auto-parallelization. Sun Studio is available for free for both, Solaris and Linux.

graphic7 · May 29, 2007

NoTime said:

Frankly, I'll see that when I believe it.Maybe not, when cracking RSA keys becomes trivial

Small ccNUMA systems have been around for quite sometime. Any system from 2000 and on that Sun sells that has a Fireplane is considered a ccNUMA system. My Sun Blade 1000 has a Fireplane interconnect, and is thus, a ccNUMA machine, has no chance of ever cracking RSA keys in a trivial manner.

AlephZero · May 30, 2007

graphic7 said:

...the Sun compilers (Sun Studio) support auto-parallelization. Sun Studio is available for free for both, Solaris and Linux.

Actually Sun wrote the parallelization code for the Cray Unix-based compilers, which was then ported to SGI when SGI bought Cray. The SGI and Solaris compilers are pretty much identical, which is good news for software developers, because fighting with different compilers when porting high performance code to different systems is a pain in the ****.

AlephZero · May 30, 2007

NoTime said:

While I haven't worked with something like SGI, I have worked on both multitasking OS code and parallel hardware processing.
My understanding of these compilers is that they are limited primarily to array processing without programmer assistance.
Are you saying they are capable of more than that?
If so, I'm suitably impressed.
AFAIK it just won't work with the mix of instructions found in the typical business/personal application program.
That may be a moot point anyway since the typical application is normally I/O bound and the only thing that will speed it up is an increase in I/O channel speed.

I guess it depends what you define as "array processing". They do quite a lot more than just "numerical operations on matrices". But you are right, the real key to getting good performance is designing the algorithm not tweaking the code. That applies equally to parallel and non-parallel, though parallel makes a whole new set of options available.

Re business applications, database searching is definitely on the parallelizable applications list.

I agree it's fairly irrelevant for most personal applications (except games).

The SGI architecture is pretty good for I/O speed as wel. The I/O bandwidth scales with the number of processors, and the OS doesn't force all I/O requests to squeeze through one serialized bottleneck. You don't get either of those by just putting multiple cores on one CPU chip.

NoTime · May 30, 2007

graphic7 said:

Small ccNUMA systems have been around for quite sometime. Any system from 2000 and on that Sun sells that has a Fireplane is considered a ccNUMA system. My Sun Blade 1000 has a Fireplane interconnect, and is thus, a ccNUMA machine, has no chance of ever cracking RSA keys in a trivial manner.

Wait another 10 years

RISC: The Power Behind Gaming & Embedded Markets

What is RISC and how does it relate to gaming and embedded markets?

What are the advantages of using RISC processors in gaming and embedded systems?

Are there any disadvantages to using RISC processors in gaming and embedded systems?

How does RISC differ from CISC (Complex Instruction Set Computer) architecture?

What is the future of RISC in gaming and embedded markets?

Similar threads

Hot Threads

Recent Insights