# 16-core powerhouse required

natski
Hi all,

I am looking around for an Intel-based 16-core based-desktop (hyperthreading required). I am inspired by the MacPro which has 12 cores by using two Westmere's and was thinking two octo-core processors would do the job for me here. Googling around the Xeon 7500 seems to be the only Intel 8-core processor I could find (notably none of the iCore series seem to have 8 cores). Now price really isn't an issue here, and since I have never built a machine myself before I would probably prefer to pickup a premade desktop. However, I cannot find any desktops with 2xXeon 7500 processors loaded up which makes me wonder if the only option would be to indeed build it. In which case, I imagine I would need a very special motherboard, right?

natski

## Answers and Replies

NeoDevin
If cost isn't a problem, just call up a local computer shop and ask them if they can build one for you, or recommend someone else local who can do it.

dreiter
You will need a server motherboard with server memory and server CPU's. Everything else should be the same...

natski
Ok thanks for the help! I guess this is really looking like a custom-build job then...

NobodySpecial
If you just want it for number crunching then a server board is going to have lots of expensive features that you don't need - ecc memory, raid controllers etc.

A dual i7 desktop board might be better.
or if the problem is more parralelizable - you might be able to build 4x single CPUi7 machines in a cluster for the price of a dual xeon server

Science Advisor
Gold Member
A dual i7 desktop board might be better.

I don't think you can do dual i7 CPU's, at least not LGA1366 ones.

MaximumPC.com said:
If you’re wondering why we didn’t just use two Core i7-980X chips—both versions are LGA1366 processors, after all—it’s because that’s impossible. The Xeon X5680 features two Quick Path Interfaces—one to communicate with the chipset and the other to talk to an additional CPU. A Core i7-980X intended for desktop use has the second QPI disabled at the factory to prevent its use in a multiprocessor setup.

dreiter
Mech_E is correct, in order to use more than 1 CPU in a computer you need a server system.

Staff Emeritus
Science Advisor
Mech_E is correct, in order to use more than 1 CPU in a computer you need a server system.
The OP asked about multiple cores, not multiple independent CPUs. He wants to do threading (actually, "hyperthreading"), which I presume excludes multiple heavyweight processes.

natski, it sounds like you might be chasing after fine grain parallelism. People are now using their GPUs for things other than making pretty pictures / playing video games. A link from NVIDIA: http://www.nvidia.com/object/GPU_Computing.html. A comparison of ATI versus NVIDIA for general purpose computing on graphics processing units (GPGPU): http://www.pcper.com/article.php?aid=745.

Last edited by a moderator:
Science Advisor
Gold Member
The OP asked about multiple cores, not multiple independent CPUs. He wants to do threading (actually, "hyperthreading"), which I presume excludes multiple heavyweight processes.

The OP specifically mentions that he was eyeing a pair of Xeon 7500 (octo-core) CPUs, but couldn't find a motherboard to support them. I'm not sure about the availability of the octocore cpus though, he might be better off doing like you mention and looking into GPU computing...

dreiter
A comparison of ATI versus NVIDIA for general purpose computing on graphics processing units (GPGPU): http://www.pcper.com/article.php?aid=745.

That is a good tech article but it is a little dated, as AMD Stream now supports OpenCL 1.1, which is a big step in GPGPU computing...

natski
Thank-you for everyone's help.

The GPU idea sounds interesting but I need some decent clock speeds (say > 2 Ghz at least) along with multi-cores. Can GPUs really deliver high clock speeds in practice?

The tasks I will be running are typically a few dozen independent number crunching tasks written in C or Fortran, executed for around 1-10 days. Memory is important to me as well with each task sometimes requiring ~1Gb each or more.

I have recently been considering the alternative approach of a cluster. What are the pros/cons of cluster vs multi-core? I am guessing for independent tasks a cluster would perform just as well? As a newbie to clusters I googled around and found some talk about building mac mini clusters using xgrid as a possible easy-ish solution, any thoughts on this? I guess they would be wired up using the ethernet and then the master would run OS X server and the others would run the standard OS X...?

NeoDevin
I have recently been considering the alternative approach of a cluster. What are the pros/cons of cluster vs multi-core? I am guessing for independent tasks a cluster would perform just as well? As a newbie to clusters I googled around and found some talk about building mac mini clusters using xgrid as a possible easy-ish solution, any thoughts on this? I guess they would be wired up using the ethernet and then the master would run OS X server and the others would run the standard OS X...?

Clusters are independent computers linked by a master node. Because of this, it's easier to build large many-core clusters (though with AMD's 12 core processors, this is becoming less of an issue as up to 48 cores can be present in one machine). The downside is that the RAM is distributed, rather than shared. This means that you need to buy RAM for each node separately, and if you're performing a task that only uses have the cores, you only get to use half the RAM. With multiple cores in a single machine the RAM is (almost always) shared, which means that anyone core can access all of the RAM. On the other hand, because of the limited number of slots on each board, having a single machine puts an upper limit on the total amount of RAM you can have.

Summary: If you can get the performance you need from a multi-core machine, that's probably the better and cheaper way to go.

natski
I see your point about the RAM constraints but if each task requires ~1Gb then actually it shouldn't be an issue for a moderately upgraded mini. Also, from an economic point of view, the mini case is quite strong... let us say our requirement is 12 cores, for the mac mini cluster...

6 x Mac Mini = $650 x 6 =$3900
6 x 2GB ram = 12GB ram
6 x 320 GB disk = 1.92TB disk
12 x 2.4 Ghz

Whereas for the Mac Pro (which I understand is a very competitively priced 12 core workstation)...

1 x $5000 =$5000
1 x 6GB ram = 6GB ram
1 x 1TB disk = 1TB disk
12 x 2.66 Ghz

So on paper the mac mini looks a more economic choice. In fact, according to the buyer's guide its due for an imminent update which would likely leave the price grossly unadjusted but the specs largely improved. The only caveat is that I assumed the base model for mac mini, whereas if every single node needs to be running OS X server then this would bump the price to $950 per node, which is still actually cheaper + more ram and disk space. NeoDevin 1 x$5000 = \$5000
1 x 6GB ram = 6GB ram
1 x 1TB disk = 1TB disk
12 x 2.66 Ghz

Moral of the story: Don't get a Mac. A custom built machine from a local computer shop will likely be able to meet your needs for much cheaper. We were pricing out systems for work here, and this was the conclusion we came to.

NobodySpecial
If you have a task that parallelizes well - so individual tasks don't need to communicate much or rely on the results from other tasks then a cluster works well.
If you design your application using a cluster toolkit like MPI then it will still work well on a single (or dual) CPU machine but will transfer to a cluster with no changes.
MPI on a multicore is almost as efficent as something like intel TBB since it automatically uses shared memory to communicate rather than the network stack on a cluster

For large memory tasks a cluster can be cheaper than a multi-core machine because it is very cheap to fit 4Gb of RAM to a 4 or 8 core single desktop (and buy 4 of them) rather than buying a single dual CPU server with 16Gb of ram.

GPU solutions are amazingly fast IF you don't need much memory per task and the tasks are completely independant

natski
MPI on a multicore is almost as efficent as something like intel TBB since it automatically uses shared memory to communicate rather than the network stack on a cluster

What is TBB?

I am guessing then coarray implementations of Fortran would run even faster than MPI on a cluster?