Are GPUs Really Faster Than CPUs for Scientific Computing?

hotvette · Mar 19, 2010

I've seen references to using the graphics processing unit (GPU) of computers instead of the CPU for scientific / general purpose computing:

http://groups.google.com/group/sci....46d5c5e04deeb3?lnk=gst&q=GPU#0c46d5c5e04deeb3

but can't quite grasp the reason for the excitement. Are GPU's faster than CPU's? Can someone explain?

waht · Mar 19, 2010

yes, GPUs outperform CPUs in computation.

I have Nvidia GeForce 9800 which is one the lowest end graphic cards for gamers these days.

I ran a benchmark software that comes from CUDA driver set from Nvidia. The benchmark pulled about 230 gigaflops (single precision) from GPU, while the CPU pulls around 2 gigaflops or less.

hotvette · Mar 19, 2010

Thanks. I also found the following link that explains a lot:

http://developer.download.nvidia.com/compute/cuda/1_0/NVIDIA_CUDA_Programming_Guide_1.0.pdf

Here's what I don't understand. In order to use the GPU for an intensive activity like computational fluid dynamics (CFD), do I need to write or use specialized code for parallel processing, or will existing codes with no modification run ~100x faster? Is it that simple?

Does this mean I could buy a new Nvidia graphics card for my ~5 year old desktop computer and turn it into a computing monster?

waht · Mar 19, 2010

hotvette said:

Thanks. I also found the following link that explains a lot:

http://developer.download.nvidia.com/compute/cuda/1_0/NVIDIA_CUDA_Programming_Guide_1.0.pdf

Here's what I don't understand. In order to use the GPU for an intensive activity like computational fluid dynamics (CFD), do I need to write or use specialized code for parallel processing, or will existing codes with no modification run ~100x faster? Is it that simple?

You have to modify the code, unless the software your are trying to run is CUDA compatible.

The CUDA driver set gives you all the necessary tools, and libraries to program in C right away. From what I understand it's very easy to learn if you know C. In fact, the code that talks to GPU follows same syntax as C. I haven't actually learned to program the GPU yet, but it's on my to do.

Browse around Nvidia site. They have lots of tutorials, video lectures, and stuff on parallel algorithms on matrix multiplication, and n-body problems.

hotvette · Mar 19, 2010

Ah, so limited to C. I guess you need a compiler that is specific to the GPU, thus the need for the SDK. I'm beginning to get the picture. Rats, I was hoping I could have some fun with the vast library of existing Fortran programs that are out there.

Anyway, I'll browse. Thanks.

ranger · Mar 20, 2010

You can also try reading some papers in the area. This conference was an absolute delight:
http://portal.acm.org/citation.cfm?id=1656132&dl=GUIDE&coll=GUIDE&CFID=80954696&CFTOKEN=91188263
Generally the rule is that computationally intensive applications benefit from the hardware acceleration offered by application specific processors such as GPUs, whereas it my be safe to keep algorithmic intensive algorithms in the general purpose processor.

Negatron · Mar 24, 2010

It's not limited to C anymore. CUDA 3 officially supports C++ and Fortran. There are also unofficial implementations for C#, Python and others, but that could get unpleasant. And yes, CPUs are pretty much done for, both in scientific and desktop computing.

GF100 processors have programmable L2 cache, this makes them more suitable for native programming and code translation. Essentially, they work much more like a CPU from a programmer's perspective. They're also a boatload faster than the last generation, especially with 64-bit floats.

Hurkyl · Mar 24, 2010

Parallel programming is hard even in the best of cases. GPUs are not the best of cases -- they put rather significant constraints on how you can store, access, and manipulate data. They are very good at doing things similar to their designed purpose, but TMK it is difficult to use them efficiently for dissimilar tasks.

hamster143 · Mar 25, 2010

CUDA is easy to learn, but it's hard to achieve the "theoretical" peak performance. There are all sorts of quirks and bottlenecks that limit what you can do and how fast you can do it on a GPU. For example, there's no such thing as random memory access. Or rather, all access to global memory is serialized and carries significant penalties. Individual threads can only concurrently access what's called "shared memory", which is limited in pre-GF100 processors to 16 kilobytes per multiprocessor (GeForce 9800 has 7 multiprocessors, GTX 285 has 15), or effectively <1 kb/thread. And even then there are limitations to what you can access without penalties.

On top of that, "theoretical" performance is always quoted in terms of 32-bit floats. Mainly because that's what GPUs are optimized for. If you want to work with 8-bit integers (maybe you're doing video processing?), modern CPUs will provide you with SIMD instructions that operate on 16 of those at once. On a GPU, you have to do everything byte-by-byte.

But there's hope. GF100 family should be nicer than its predecessors for programming purposes (but, as everyone knows, NVIDIA is having severe problems with yields and no one knows if they are going to ship even 10,000 of those worldwide before the end of 2010). Eventually GPUs will learn SIMD, will adapt to all data types, things will get better.

while the CPU pulls around 2 gigaflops or less.

Modern multi-core CPUs can pull on the order of 50 gigaflops.

digitalblggr · Apr 18, 2010

http://digitalblggr.blogspot.com/2010/04/cpu-gpu.html

dgibsontx · Aug 10, 2010

Leveraging GPUs does not always require C, C++ and Fortran, for some applications MATLAB and Jacket from AccelerEyes can get a pretty good performance return - http://www.accelereyes.com

rcgldr · Aug 10, 2010

There is also a math library for ATI GPUs:

http://developer.amd.com/gpu/acmlgpu/pages/default.aspx

mgb_phys · Aug 10, 2010

openCL is a little simpler than CUDA, it's more C like. While a lot of CUDA is more the openGL shader language and more assembler-ish.
openCL is cross platform, it will run on NVidia and ATI cards and will also transparently run on multi-core CPUs if there is no GPU available.

On NVidia it's basically translated into the same CUDA instructions by the cl compiler at run time.

dreiter · Aug 16, 2010

Does anyone know if there are any code/software/hardware platforms out there that will do DFT (density functional theory) calculations using OpenCL?

Are GPUs Really Faster Than CPUs for Scientific Computing?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Use of AI (ML/DL) in Science

Other than just FizzBuzz to test programmer candidates

File Structure vs Data Structure

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight