Are GPUs Really Faster Than CPUs for Scientific Computing?

  • Thread starter Thread starter hotvette
  • Start date Start date
  • Tags Tags
    Gpu Programming
Click For Summary

Discussion Overview

The discussion centers around the performance comparison between GPUs and CPUs for scientific computing, exploring whether GPUs are indeed faster and the implications for programming and application development. Topics include computational fluid dynamics (CFD), parallel processing, and the programming requirements for leveraging GPU capabilities.

Discussion Character

  • Debate/contested
  • Technical explanation
  • Exploratory

Main Points Raised

  • Some participants assert that GPUs outperform CPUs in computation, citing benchmarks that show significant differences in performance metrics like gigaflops.
  • There is a question regarding the necessity of modifying existing code to utilize GPUs effectively, with some suggesting that specialized code is required unless the software is CUDA compatible.
  • Participants discuss the programming languages supported by CUDA, noting that it has evolved to include C++, Fortran, and unofficially other languages like Python.
  • Some argue that while GPUs excel in specific computational tasks, they may not be suitable for all types of algorithms, particularly those that do not align with their design.
  • Concerns are raised about the challenges of achieving theoretical peak performance on GPUs due to various limitations, including memory access patterns and data type optimizations.
  • There are mentions of alternative tools and libraries, such as MATLAB and Jacket, that can leverage GPU performance without requiring extensive programming in C or C++.
  • OpenCL is introduced as a simpler alternative to CUDA, with cross-platform capabilities that can run on both GPUs and multi-core CPUs.
  • A participant inquires about existing platforms for density functional theory calculations using OpenCL, indicating interest in practical applications of the discussed technologies.

Areas of Agreement / Disagreement

Participants express differing views on the ease of using GPUs for scientific computing, the necessity of code modification, and the types of applications that benefit from GPU acceleration. The discussion remains unresolved regarding the best practices for leveraging GPU capabilities and the extent of their advantages over CPUs.

Contextual Notes

Limitations include the dependency on specific programming languages and frameworks, the unresolved nature of performance bottlenecks, and the varying degrees of compatibility with existing software. The discussion reflects a range of experiences and opinions on the practicalities of GPU utilization in scientific computing.

Technology news on Phys.org
yes, GPUs outperform CPUs in computation.

I have Nvidia GeForce 9800 which is one the lowest end graphic cards for gamers these days.

I ran a benchmark software that comes from CUDA driver set from Nvidia. The benchmark pulled about 230 gigaflops (single precision) from GPU, while the CPU pulls around 2 gigaflops or less.
 
Thanks. I also found the following link that explains a lot:

http://developer.download.nvidia.com/compute/cuda/1_0/NVIDIA_CUDA_Programming_Guide_1.0.pdf

Here's what I don't understand. In order to use the GPU for an intensive activity like computational fluid dynamics (CFD), do I need to write or use specialized code for parallel processing, or will existing codes with no modification run ~100x faster? Is it that simple?

Does this mean I could buy a new Nvidia graphics card for my ~5 year old desktop computer and turn it into a computing monster?
 
hotvette said:
Thanks. I also found the following link that explains a lot:

http://developer.download.nvidia.com/compute/cuda/1_0/NVIDIA_CUDA_Programming_Guide_1.0.pdf

Here's what I don't understand. In order to use the GPU for an intensive activity like computational fluid dynamics (CFD), do I need to write or use specialized code for parallel processing, or will existing codes with no modification run ~100x faster? Is it that simple?

You have to modify the code, unless the software your are trying to run is CUDA compatible.

The CUDA driver set gives you all the necessary tools, and libraries to program in C right away. From what I understand it's very easy to learn if you know C. In fact, the code that talks to GPU follows same syntax as C. I haven't actually learned to program the GPU yet, but it's on my to do.

Browse around Nvidia site. They have lots of tutorials, video lectures, and stuff on parallel algorithms on matrix multiplication, and n-body problems.
 
Ah, so limited to C. I guess you need a compiler that is specific to the GPU, thus the need for the SDK. I'm beginning to get the picture. Rats, I was hoping I could have some fun with the vast library of existing Fortran programs that are out there.

Anyway, I'll browse. Thanks.
 
It's not limited to C anymore. CUDA 3 officially supports C++ and Fortran. There are also unofficial implementations for C#, Python and others, but that could get unpleasant. And yes, CPUs are pretty much done for, both in scientific and desktop computing.

GF100 processors have programmable L2 cache, this makes them more suitable for native programming and code translation. Essentially, they work much more like a CPU from a programmer's perspective. They're also a boatload faster than the last generation, especially with 64-bit floats.
 
Last edited:
Parallel programming is hard even in the best of cases. GPUs are not the best of cases -- they put rather significant constraints on how you can store, access, and manipulate data. They are very good at doing things similar to their designed purpose, but TMK it is difficult to use them efficiently for dissimilar tasks.
 
CUDA is easy to learn, but it's hard to achieve the "theoretical" peak performance. There are all sorts of quirks and bottlenecks that limit what you can do and how fast you can do it on a GPU. For example, there's no such thing as random memory access. Or rather, all access to global memory is serialized and carries significant penalties. Individual threads can only concurrently access what's called "shared memory", which is limited in pre-GF100 processors to 16 kilobytes per multiprocessor (GeForce 9800 has 7 multiprocessors, GTX 285 has 15), or effectively <1 kb/thread. And even then there are limitations to what you can access without penalties.

On top of that, "theoretical" performance is always quoted in terms of 32-bit floats. Mainly because that's what GPUs are optimized for. If you want to work with 8-bit integers (maybe you're doing video processing?), modern CPUs will provide you with SIMD instructions that operate on 16 of those at once. On a GPU, you have to do everything byte-by-byte.

But there's hope. GF100 family should be nicer than its predecessors for programming purposes (but, as everyone knows, NVIDIA is having severe problems with yields and no one knows if they are going to ship even 10,000 of those worldwide before the end of 2010). Eventually GPUs will learn SIMD, will adapt to all data types, things will get better.

while the CPU pulls around 2 gigaflops or less.

Modern multi-core CPUs can pull on the order of 50 gigaflops.
 
Last edited:
  • #10
http://digitalblggr.blogspot.com/2010/04/cpu-gpu.html
 
  • #11
Leveraging GPUs does not always require C, C++ and Fortran, for some applications MATLAB and Jacket from AccelerEyes can get a pretty good performance return - http://www.accelereyes.com
 
  • #12
There is also a math library for ATI GPUs:

http://developer.amd.com/gpu/acmlgpu/pages/default.aspx
 
Last edited by a moderator:
  • #13
openCL is a little simpler than CUDA, it's more C like. While a lot of CUDA is more the openGL shader language and more assembler-ish.
openCL is cross platform, it will run on NVidia and ATI cards and will also transparently run on multi-core CPUs if there is no GPU available.

On NVidia it's basically translated into the same CUDA instructions by the cl compiler at run time.
 
  • #14
Does anyone know if there are any code/software/hardware platforms out there that will do DFT (density functional theory) calculations using OpenCL?
 

Similar threads

  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 20 ·
Replies
20
Views
3K
Replies
4
Views
2K
Replies
10
Views
4K
  • · Replies 34 ·
2
Replies
34
Views
9K
  • · Replies 3 ·
Replies
3
Views
18K
Replies
1
Views
3K
Replies
22
Views
5K
  • · Replies 13 ·
Replies
13
Views
4K