Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Teraflop GPU's, but no TFLOP GPU's?

  1. Dec 17, 2009 #1
    Today's high-end GPU's are capable of multiple-teraflop computing power. But even a Core i7 975 @ 4.0 GHz is only capable of about 60-70 Gigaflops. Why don't CPU's have the raw computing power of GPU's? When will we see TFLOP processors in PC's? 2020? 2030?

    1 Teraflop = 1,000,000,000,000 (Trillion) operations per second in floating points.
    1 Gigaflop + 1,000,000,000 (Billion) operations per second in floating points.

    When will we see Petaflop GPU's? What kind of estimated computing power will we have in our computers in the year 2050 or even 2100?
  2. jcsd
  3. Dec 18, 2009 #2


    User Avatar
    Gold Member

    "Why don't CPU's have the raw computing power of GPU's?"

    Would you rather use a general purpose processor for applications such as DSP or would you rather use a special purpose processor which has been optimized to perform a specific task? Would a general purpose processor core have the same "raw computing power" as a DSP core for DSP applications?
  4. Dec 18, 2009 #3
    High-end GPUs are capable of breaking into teraflops because they are massively parallel. It does not cost much to add more processing units to the chip (processing units are relatively compact), but their usefulness for most practical tasks goes down fast unless you also scale the cache. Which you can't do without reducing transistor size or increasing power consumption.

    I recall that Intel was planning to launch a massively-parallel chip called Larrabee, it was supposed to have up to 48 parallel cores and that would've let it get above 1 teraflop.
  5. Dec 18, 2009 #4
    Larrabee was also suppose to be x86 compatible, vastly simplifying programming for it.

    Intel canceled Larrabee recently however, though. They were looking at 2011-2012 release, and with ATI's new 5xxx series cards, specifically the 5870, being able to push 2.7 TFLOPS, and with nVidia's offering, Fermi, looking to match or exceed those numbers, it wouldn't be anywhere remotely economical to offer a 1 TFLOP product a couple years down the road with parts exceeding its performance out now.
  6. Dec 18, 2009 #5
    Teraflops are a measure of floating point calculations, which, as a general rule, is not what CPU's are best at. In fact, back in the dark days, CPU's did not actually have specialized hardware for floating point arithmetic; you had to buy a FPU to plug into your motherboard.

    Not surprisingly, the kind of massively parallel floating point calculations needed for many computer graphics tasks are exactly what graphics card chips are good at doing. They are a lot better at doing this than most CPU's, so a lot of scientific applications are being adapted to use GPU power, and in some cases, a single computer with some high end graphics cards can perform calculations as fast as supercomputing clusters.

    But, the kind of computing your GPU is good at is probably not what you need to run your standard application quickly and smoothly.
  7. Dec 19, 2009 #6
    Just to add a more simpler explanation, its because CPUs have a much more complex instruction set than GPUs do. In other words, GPUs do only one thing and they do that one thing very well. CPUs are designed to do all sorts of things which means they have to take a hit in performance.
  8. Dec 23, 2009 #7
    CPUs are designed to do everything "kind of" well. GPUs are designed to do some select things really well. As it turns out, it just so happens that these select things that require vast quantities of processing power just so happen to be the kind of things fit for a GPU, so it works out nicely. The supposed limitations of the GPU are largely inconsequential in practice. In fact modern CPUs are becoming much more like GPUs, using non uniform memory access and parallel construction. Simply put CPUs are slow in parallel applications because they are largely today, and were entirely in the past, serial processors. The architecture has very few arithmetic elements and a lot of cache with a lot of optimization elements for serial execution.

    I would wager a generation or two after Haswell, but I do think it's a largely irrelevant question. If you're looking for floating point performance in CPUs, you're looking in the wrong place. If you want to reliably execute arbitrary threads, get a CPU. If you want to perform huge matrix computations, get a GPU. If a CPU is to become as fast as a GPU, it needs to become a GPU. This is why Intel's Larrabee met it's inevitable demise. It was a bunch of CPUs pretending to be GPUs. Slap on an unscalable ring bus and you've got fail. If Intel wants to compete with vector processors it needs to create a device which is a vector processor by it's very nature. This requires non-uniformity and many more arithmetic elements at the expense of other architectural features. Intel will have to drop it's fairy-tale programming model. A vector processor will never be developed for like a single threaded application model. I suppose the sooner developers come to grips with this the sooner Intel will stop trying to push out impractical hardware. Fortunately from what I've read their next attempt will be somewhat closer to reality, unfortunately this means paying tribute to the devil and do what Nvidia and ATI have been doing for years. For example, notice that their new on-package GPU on the new processors is actually a GPU, not another CPU.


    50 AFLOPS (AwesomeFLOPS). Sorry, but that question has no serious answer :)

    It's not x86 that makes programming easier. In fact there's nothing easy about SSE optimization. They simply wanted to obfuscate the parallel elements of the hardware to the programmer. They also apparently wanted to avoid making real floating point vector hardware. Both of those culminated in inevitable failure.

    One real advantage Larrabee could have had if Intel didn't set unrealistic architectural goals is unified memory addressing. This does not (and nothing ever will) remove the parallel element from developing parallel applications, but it would have made memory management much easier for the developer. This isn't generally a desirable thing when you're developing performance critical applications, but it would have been a convenience for certain problems, particularly when prototyping. However Fermi also offers unified addressing, but only in CUDA, not for general programming.
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Similar Discussions: Teraflop GPU's, but no TFLOP GPU's?
  1. GPU Supercomputers (Replies: 26)

  2. GPU on Dell n series ? (Replies: 11)

  3. External GPU issue (Replies: 0)

  4. GPU for CFE/CE/QCC (Replies: 1)

  5. Next-gen GPU's? (Replies: 3)