Why not more cores in today's CPUs?

  • Thread starter Thread starter Vanadium 50
  • Start date Start date
Click For Summary
SUMMARY

The discussion centers on the limitations of increasing core counts in CPUs, specifically referencing Intel's i7 and Pentium architectures. While theoretically, a 200-core processor could be constructed using existing technology, practical challenges such as memory bandwidth and heat dissipation hinder its viability for consumer use. The conversation highlights the performance of Intel's 80-core CPU prototype, which achieved 1 teraflop at 3.16GHz, and contrasts it with the capabilities of modern GPUs like NVIDIA's Fermi architecture, which excels in parallel processing. Ultimately, the consensus is that higher clock speeds may be more beneficial than merely increasing core counts.

PREREQUISITES
  • Understanding of CPU architecture, specifically Intel i7 and Pentium series
  • Knowledge of parallel processing and its implications in computing
  • Familiarity with GPU architectures, particularly NVIDIA's Fermi
  • Basic concepts of memory bandwidth and heat dissipation in processors
NEXT STEPS
  • Research Intel's 80-core CPU prototype and its implications for future CPU designs
  • Explore the architecture and performance metrics of NVIDIA's Fermi GPU
  • Investigate the challenges of memory bandwidth in multi-core processors
  • Learn about the trade-offs between clock speed and core count in CPU performance
USEFUL FOR

Hardware engineers, computer architects, and technology enthusiasts interested in CPU and GPU performance optimization.

Vanadium 50
Staff Emeritus
Science Advisor
Education Advisor
Gold Member
Dearly Missed
Messages
35,004
Reaction score
21,705
Perhaps someone can explain this to me.

An i7 has a transistor count of 731M. A Pentium has a transistor count of 3M. So in principle, Intel could build a 200 core processor today. Each core would be no "smarter" than a Pentium, but that's pretty much all you need.

Now, I recognize that this is extreme, and such a processor would be bottlenecked by memory bandwidth, but the basic idea seems sound - gain throughput by putting more, less capable, cores on the chip: 20 or 25 Pentium 4's seems not to be outside the realm of possibility.

Why don't we see this on the market?
 
Computer science news on Phys.org
I thought we did? I know our lab had a 35 processor "supercomputer" that were all parallel mac G4's back in the day.
I think the construction of such a thing, while useful, is not useful on a consumer level. And as such, while they DO exist (thats what supercomputers are)

Prior to May 2007 the CAS supercomputer was a linux beowulf cluster. In 2002 this became only the second machine in Australia to exceed 1 Tflops in performance and by 2004 further expansion provided a theoretical peak speed of 2 Tflops. The cluster comprised the following hardware:
200 Pentium 4 3.2 GHz nodes
32 Pentium 4 3.0 GHz nodes
90 Dual Pentium 4 2.2 GHz server class nodes.

so they do it, but its expensive to build and cool. 32 processors is still a LOT of heat.
 
There are many parallel clusters. I'm talking about a single chip - allocating the feature count differently.

You do bring up a good point about heat, though.
 
Too many bottlenecks on consumer level motherboards and devices, and the heat problem. There are already ridiculously sized cooling devies on existing motherboards. As an aside, click here to see Mark Russinovich running Windows 7 on a 256-processor computer (not a processor with 256 cores).
 
slider142 said:
running Windows 7 on a 256-processor computer

256-processors... I don't think my room is large enough for Windows 7 machine, I will stick to my XP.
 
Borek said:
256-processors... I don't think my room is large enough for Windows 7 machine, I will stick to my XP.

LoL. Seriously, Windows 7 is more efficient than XP at memory management and core management, so if you get a chance, try it out. Unlike Vista, it does not assume you have a PC with a fast graphics processor, and loses or replaces most of the background services that make Windows Vista respond like molasses to simple commands. If you're running it on a laptop, Windows 7 also has better power management (it is designed to run efficiently on netbooks, as opposed to XP which has no optimizations for such low resource devices).
I'm running 7 on a 8-year old desktop PC and a 5-year old laptop and multitask much faster than the snail that was Vista, and I have the additional user interface features and hardware management that is not present in XP.
 
intel built an 80 core CPU, but it comes with a new set of problem:

http://news.cnet.com/Intel-shows-off-80-core-processor/2100-1006_3-6158181.html

Intel used 100 million transistors on the chip, which measures 275 millimeters squared. By comparison, its Core 2 Duo chip uses 291 million transistors and measures 143 millimeters squared. The chip was built using Intel's 65-nanometer manufacturing technology, but any likely product based on the design would probably use a future process based on smaller transistors. A chip the size of the current research chip is likely too large for cost-effective manufacturing.
Intel demonstrated the chip running an application created for solving differential equations. At 3.16GHz and with 0.95 volts applied to the processor, it can hit 1 teraflop of performance while consuming 62 watts of power. Intel constructed a special motherboard and cooling system for the demonstration in a San Francisco hotel.
 
But GPU graphics cards can surpass that in computation:


The Next Generation CUDA Architecture, Code Named Fermi
The Soul of a Supercomputer in the Body of a GPU
The next generation CUDA architecture, code named “Fermi”, is the most advanced GPU computing architecture ever built. With over three billion transistors and featuring up to 512 CUDA cores, Fermi delivers supercomputing features and performance at 1/10th the cost and 1/20th the power of traditional CPU-only servers.

http://www.nvidia.com/object/fermi_architecture.html
 
Graphics cards jumped on the "massively parallel" bandwagon several years ago now. Take for example ATI's new top of the line card, the Radeon 5870-

MaximumPC said:
This new chip is no shrinking violet in the numbers department. Every number associated with the new Radeon 5800 series is staggering: 2.15 billion transistors, 2.7 trillion floating-point operations a second [TFlops], more than 20 gigapixels per second throughput, 1,600 shader units

The card has 1,600 parallel processing units, albeit simple in form when compared to a primary CPU. When used on software optimized for full parallel operation, 2-order of magnitude increases in speed are possible. ATI's definition of a "processing unit" differs somewhat from nVidia, but suffice to say with a mere 512 processors the nVidia card will be pretty ridiculous.

Keep in mind the FLOPS quoted for graphics cards are usually single-precision calculations, while CPU's may be preforming double-precision. This difference can make comparisons difficult.
 
  • #10
I'm no expert, but if you built a cpu out of 200 pentium cores its total clock would not be faster than the individual pentium chip itself. In regards to this, I think it is more important to have a chip with higher clock speeds rather than parallel cores. Since todays cpus seem to have hit a limit at around 3.2 ghz, the introduction of more cores was there to advance the system past the clock limitations.

Hardware wise it is better to have a 3.1 ghz duel core cpu than a 2.4 ghz quad core considering many programs still don't benefit from a duel core cpu or quad core let alone any N-core cpu.
 
  • #11
When will we see a CPU for a personal computer with TFLOP performance? I don't understand why graphics cards have so many more FLOPS than a CPU has.
 
  • #12
The_Absolute said:
I don't understand why graphics cards have so many more FLOPS than a CPU has.
Because they are highly parrallelized, they can o a lot of FLOPS if you want the same FLOP applied to 6 or 256 memory locations at the same time. If you want a different FLOP applied in each case they are slow.

It's like the difference between a printer, that is slow but you can have each new character printed differently, and a printing press that can print an entire page in the time it takes to pritn one character - but you have to have all the characters preset.

The difficulty of putting more cores onto a single chip is the I/O (apart form how to program the thing) - if they all share the same memory bus then each core's access to RAM is 200x slower while it waits for it's turn - making memory about as fast as disk access. to get round this you have to put a lot of cache onto the chip so that each core's data is already on board, this is what uses most of the i&s Transistor count - it has a huge amount of L1 cache.
The other alternative is to put separate buses, but that means a chip a lot more pins - which is tricky given that an i7 socket already has 1360.
 
Last edited:

Similar threads

  • · Replies 40 ·
2
Replies
40
Views
5K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
4K
  • · Replies 6 ·
Replies
6
Views
3K