Can Engineers Overcome the Impending Limitations of Moore's Law?

joema · Jan 25, 2015

ElliotSmith said:

The scientific limit on to how small you can make a functionally viable transistor is very fast approaching and should hit a stone wall within the next 10 years or less. How will electronic engineers and computer scientists compensate for this problem?...Are there any workarounds on the table being discussed and researched for this issue?...

There are four main limits re CPU performance:

(1) Clock speed scaling, related to Dennard Scaling, which plateaued in the mid-2000s. This prevents much faster clock speeds: https://en.wikipedia.org/wiki/Dennard_scaling

(2) Hardware ILP (Instruction Level Parallelism) limits: A superscalar out-of-order CPU cannot execute more than approx. eight instructions in parallel. The latest CPUs (Haswell, IBM Power8) are already at this limit. You cannot go beyond about an 8-wide CPU because of several issues: dependency checking, register renaming, etc. These tasks escalate (at least) quadratically, and there's no way around them for a conventional out-of-order superscalar machine. There will likely never be a 16-wide superscalar out-of-order CPU.

(3) Software ILP limits on existing code: Even given infinite superscalar resources, existing code will typically not have over 8 independent instructions in any group. If the intrinsic parallelism isn't present in a single-threaded code path, nothing can be done. Newly-written software and compilers can theoretically generate higher ILP code but if the hardware is limited to 8, there's no compelling reason to undertake this.

(4) Multicore CPUs limited by (a) Heat: The highest-end Intel Xeon E5-2699 v3 has 18 cores but the clock speed of each core is limited by TDP: https://en.wikipedia.org/wiki/Thermal_design_power
(b) Amdahl's Law. As core counts increase to 18 and beyond, even a tiny fraction of serialized code will "poison" the speedup and cap improvement: https://en.wikipedia.org/wiki/Amdahl's_law
(c) Coding practices: It's harder to write effective multi-threaded code, however newer software frameworks help some.

While transistor scaling will continue for a while, increasingly heat will limit how much of that functional capacity can be simultaneously used. This is called the "dark silicon" problem. You can have lots of on-chip functionality but it cannot all be simultaneously be used. See paper "Dark Silicon and the end of Multicore Scaling": https://www.google.com/url?sa=t&rct...=k_D1De2gUp79VwMVcTIdwQ&bvm=bv.84349003,d.eXY

What can be done? There are several possibilities along different lines:

(1) Increasingly harness high transistor counts for specialized functional units. E.g, Intel core CPUs since Sandy Bridge have had a Quick Sync dedicated video transcoder: https://en.wikipedia.org/wiki/Intel_Quick_Sync_Video This is about 4-5x faster than other methods. Intel's Skylake CPU will have a greatly improved Quick Sync which handles many additional codecs. Given sufficient transistor budgets you can envision similar specialized units for diverse tasks. These could simply sit idle until called on, then render great performance in that narrow area. This general direction is integrated heterogeneous processing.

(2) Enhance existing instruction set with specialized instructions for justifiable cases. E.g, Intel Haswell CPUs have 256-bit vector instructions and Skylake will have AVX-512 instructions. In part due to these instructions a Xeon E5-2699 v3 can do about 800 linpack gigaflops, which is about 10,000 faster than the original Cray-1. Obviously that requires vectorization of code, but that's a well-established practice.

(3) Use more aggressive architectural methods to squeeze out additional single-thread performance. Although most items have already been exploited, a few are left, such as data speculation. Data speculation differs from control speculation, which is currently used to predict a branch. In theory data speculation could provide an additional 2x performance on single-threaded code, but it would require significant added complexity. See "Limits of Instruction Level Parallelism with Data Speculation": http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47.9196&rep=rep1&type=pdf

(4) Use VLIW (Very Long Instruction Word) methods. This side steps the hardware limits on dependency checking, etc by doing it at compile time. In theory a progressively wider CPU could be designed as technology improves which could run single-threaded code 32 or more instructions wide. This approach was unsuccesfully attempted by Intel with Itanium and CPU architects still debate whether a fresh approach would work. A group at Stanford is actively pursuing bringing a VLIW-like CPU to the commercial market. It is called the Mill CPU: http://millcomputing.com/ VLIW approaches require software be re-written, but using conventional techniques and languages, not different paradigms like vectorization, multiple threads, etc.

Carno Raar · Jan 25, 2015

phinds said:

Right. And Bill Gates was SURE that 64K would be all the memory anyone would ever need. It was just inconceivable that more could be required for a single person.

He actually never said that but it's a popular urban legend that he did.

Back on topic, Moore's Law seems to be reaching the end of its life now. We're moving to distributed systems and multicore machines and Amdahl's Law is the new one to watch.

http://en.wikiquote.org/wiki/Bill_Gates

http://en.wikipedia.org/wiki/Amdahl's_law

phinds · Jan 25, 2015

Carno Raar said:

He actually never said that but it's a popular urban legend that he did.

Either way my point remains exactly the same

SixNein · Jan 25, 2015

Amdahl's law is algorithm dependent. So it's not the same kind of thing as Moore's Law.

Carno Raar · Jan 25, 2015

SixNein said:

Amdahl's law is algorithm dependent. So it's not the same kind of thing as Moore's Law.

It's an appropriate answer for the OP's question.

"The scientific limit on to how small you can make a functionally viable transistor is very fast approaching and should hit a stone wall within the next 10 years or less. How will electronic engineers and computer scientists compensate for this problem?"

A valid answer is we spin up more cloud instances and learn to write concurrent code. Right now Amdahl and Moore are limiting factors in the growth of large computer systems. Moore will doubtless become less important in the near future, while we're only just starting to get our heads around concurrency issues. I say concurrency not parallelism as I don't yet have access to properly parallel hardware ... :-)

phinds · Jan 25, 2015

Carno Raar said:

I say concurrency not parallelism as I don't yet have access to properly parallel hardware ... :)

If you don't have parallel hardware, concurrency is just sequential but with extra overhead. That is, if you have a single-thread process in a single CPU and you make it multi-threaded but still on the single CPU, all you have done is add thread overhead.

Carno Raar · Jan 26, 2015

phinds said:

If you don't have parallel hardware, concurrency is just sequential but with extra overhead. That is, if you have a single-thread process in a single CPU and you make it multi-threaded but still on the single CPU, all you have done is add thread overhead.

Managing multiple downloads + many other use-cases.

"Take this list of URLs and download them all". You don't want to sit there doing nothing while your 1st and only download times out.

Edit: Yes you can implement this single threaded with async and a non-blocking downloader but that's a bit weird - and most libraries implement nonblocking download with threads anyway.

phinds · Jan 26, 2015

Carno Raar said:

Managing multiple downloads + many other use-cases.

Good point. Thanks. I had not thought about I/O bound processes.

Can Engineers Overcome the Impending Limitations of Moore's Law?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

France to ditch Windows for Linux

Is This Music AI?

Help me build my server with a laptop that has a broken screen

Gmail AI summaries

Warning: Bad actors may already be in store-now-decrypt-later mode

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect