GPU Programming

  • Thread starter hotvette
  • Start date
  • #1
hotvette
Homework Helper
990
3

Answers and Replies

  • #2
1,482
3
yes, GPUs outperform CPUs in computation.

I have Nvidia GeForce 9800 which is one the lowest end graphic cards for gamers these days.

I ran a benchmark software that comes from CUDA driver set from Nvidia. The benchmark pulled about 230 gigaflops (single precision) from GPU, while the CPU pulls around 2 gigaflops or less.
 
  • #3
hotvette
Homework Helper
990
3
Thanks. I also found the following link that explains a lot:

http://developer.download.nvidia.com/compute/cuda/1_0/NVIDIA_CUDA_Programming_Guide_1.0.pdf

Here's what I don't understand. In order to use the GPU for an intensive activity like computational fluid dynamics (CFD), do I need to write or use specialized code for parallel processing, or will existing codes with no modification run ~100x faster? Is it that simple?

Does this mean I could buy a new Nvidia graphics card for my ~5 year old desktop computer and turn it into a computing monster?
 
  • #4
1,482
3
Thanks. I also found the following link that explains a lot:

http://developer.download.nvidia.com/compute/cuda/1_0/NVIDIA_CUDA_Programming_Guide_1.0.pdf

Here's what I don't understand. In order to use the GPU for an intensive activity like computational fluid dynamics (CFD), do I need to write or use specialized code for parallel processing, or will existing codes with no modification run ~100x faster? Is it that simple?
You have to modify the code, unless the software your are trying to run is CUDA compatible.

The CUDA driver set gives you all the necessary tools, and libraries to program in C right away. From what I understand it's very easy to learn if you know C. In fact, the code that talks to GPU follows same syntax as C. I haven't actually learned to program the GPU yet, but it's on my to do.

Browse around Nvidia site. They have lots of tutorials, video lectures, and stuff on parallel algorithms on matrix multiplication, and n-body problems.
 
  • #5
hotvette
Homework Helper
990
3
Ah, so limited to C. I guess you need a compiler that is specific to the GPU, thus the need for the SDK. I'm beginning to get the picture. Rats, I was hoping I could have some fun with the vast library of existing Fortran programs that are out there.

Anyway, I'll browse. Thanks.
 
  • #6
ranger
Gold Member
1,676
1
  • #7
61
0
It's not limited to C anymore. CUDA 3 officially supports C++ and Fortran. There are also unofficial implementations for C#, Python and others, but that could get unpleasant. And yes, CPUs are pretty much done for, both in scientific and desktop computing.

GF100 processors have programmable L2 cache, this makes them more suitable for native programming and code translation. Essentially, they work much more like a CPU from a programmer's perspective. They're also a boatload faster than the last generation, especially with 64-bit floats.
 
Last edited:
  • #8
Hurkyl
Staff Emeritus
Science Advisor
Gold Member
14,916
17
Parallel programming is hard even in the best of cases. GPUs are not the best of cases -- they put rather significant constraints on how you can store, access, and manipulate data. They are very good at doing things similar to their designed purpose, but TMK it is difficult to use them efficiently for dissimilar tasks.
 
  • #9
907
2
CUDA is easy to learn, but it's hard to achieve the "theoretical" peak performance. There are all sorts of quirks and bottlenecks that limit what you can do and how fast you can do it on a GPU. For example, there's no such thing as random memory access. Or rather, all access to global memory is serialized and carries significant penalties. Individual threads can only concurrently access what's called "shared memory", which is limited in pre-GF100 processors to 16 kilobytes per multiprocessor (GeForce 9800 has 7 multiprocessors, GTX 285 has 15), or effectively <1 kb/thread. And even then there are limitations to what you can access without penalties.

On top of that, "theoretical" performance is always quoted in terms of 32-bit floats. Mainly because that's what GPUs are optimized for. If you want to work with 8-bit integers (maybe you're doing video processing?), modern CPUs will provide you with SIMD instructions that operate on 16 of those at once. On a GPU, you have to do everything byte-by-byte.

But there's hope. GF100 family should be nicer than its predecessors for programming purposes (but, as everyone knows, NVIDIA is having severe problems with yields and no one knows if they are going to ship even 10,000 of those worldwide before the end of 2010). Eventually GPUs will learn SIMD, will adapt to all data types, things will get better.

while the CPU pulls around 2 gigaflops or less.
Modern multi-core CPUs can pull on the order of 50 gigaflops.
 
Last edited:
  • #11
1
0
Leveraging GPUs does not always require C, C++ and Fortran, for some applications MATLAB and Jacket from AccelerEyes can get a pretty good performance return - http://www.accelereyes.com
 
  • #12
rcgldr
Homework Helper
8,674
508
There is also a math library for ATI GPUs:

http://developer.amd.com/gpu/acmlgpu/pages/default.aspx [Broken]
 
Last edited by a moderator:
  • #13
mgb_phys
Science Advisor
Homework Helper
7,774
12
openCL is a little simpler than CUDA, it's more C like. While alot of CUDA is more the openGL shader language and more assembler-ish.
openCL is cross platform, it will run on NVidia and ATI cards and will also transparently run on multi-core CPUs if there is no GPU available.

On NVidia it's basically translated into the same CUDA instructions by the cl compiler at run time.
 
  • #14
32
0
Does anyone know if there are any code/software/hardware platforms out there that will do DFT (density functional theory) calculations using OpenCL?
 

Related Threads for: GPU Programming

Replies
3
Views
759
  • Last Post
Replies
4
Views
697
Top