What Data Is Stored in CPU Cache and How Does It Impact Performance?

fluidistic · Feb 12, 2017

Hello,
I wonder what kind of data usually go into the CPU cache. Generally the size of the cache is in kB or MB in contrast with RAM which is in GB.
I understand that the cpu accesses the RAM much slower than the cache so it's better for it to find the data in the cache instead of in the RAM. But the cache is so small in size I wonder how it can be that useful.
Is it possible to store a small .txt file into the cache? If so, accessing the .txt should be much faster than accessing it if it was into the RAM; is there an easy way to benchmark the time it takes to "read" this .txt file and thus determine whether the file is indeed into the cache?
Thank you.

phinds · Feb 12, 2017

One of the main speed advantages of cache is for programs with loops. If you are executing the same few or few dozens of instructions over and over in a loop, your program executes noticeably faster if the instructions are in the cache than if they are in main store.

rootone · Feb 12, 2017

The CPU cache is used for holding variables that the CPU requires access to very frequently.
As Phinds said this would typically be things such loop counters, or constant pointers to frequently addressed items in main memory.
At the level of files there is a different kind of cache, whereby some or even all of a file held on disk is copied into RAM.
Accessing the data that way is significantly faster than continually reading from the disk.

phinds · Feb 12, 2017

rootone said:

The CPU cache is used for holding variables that the CPU requires access to very frequently.
As Phinds said this would typically be things such loop counters, or constant pointers to frequently addressed items in main memory.
At the level of files there is a different kind of cache, whereby some or even all of a file held on disk is copied into RAM.
Accessing the data that way is significantly faster than continually reading from the disk.

But it is NOT just data. Not just things like loop counters, it is the program code as well as data. It does no good to keep a loop counter in cache if you have to fetch all of the instructions in the loop out of main store every time through the loop.

StoneTemplePython · Feb 12, 2017

fluidistic said:

I wonder what kind of data usually go into the CPU cache.

A special case answer, albeit a very important one, involves how people optimize matrix multiplication. E.g. look at blocked matrix multiplication. You might enjoy these slides (from 1999 apparently but still quite relevant).

https://people.eecs.berkeley.edu/~demmel/cs267_Spr99/Lectures/Lect_02_1999b.pdf

EngWiPy · Feb 13, 2017

There is something called memory hierarchy in computer architecture. You have the hard drive, the RAM, then the cache. As you go up, the memory level gets smaller, but faster and cost more per storage unit. But with the proper use of these different levels, you will have something called virtual memory, which will be the size of the hard drive, but it's accessed faster than if the CPU would access the hard drive directly. The CPU accesses the cache faster than the RAM, and accesses the RAM faster than the hard drive. There are different algorithms what data to fetch and put in each upper level of this hierarchy. For example, one method says that any given instruction is likely to be executed more than once, so, it will be kept in the cache. Another method says that it's likely that adjacent instructions will be executed, so a block of instruction will be fetch from the RAM to the cache. I vaguely remember this stuff from my bachelor's degree. But I believe the concepts I described is somewhat correct.

FactChecker · Feb 14, 2017

It should be mentioned that cache is generally for temporary storage. There are different levels of cache, L3, L2, L1 of increasing speeds. L3 feeds L2 feeds L1. L1 is on the microprocessor chip. L2 and L3 can be in the CPU module or on the motherboard. The computer can look ahead and start pulling data in from the slower memory to faster memory before it is needed. So the contents of the different levels of cache are always changing. This is all carefully thought out. Messing with it can cause unexpected and counter-intuitive results.

EDIT: Changed all misspelled "cash" to "cache"

phinds · Feb 14, 2017

FactChecker said:

It should be mentioned that cash is generally for temporary storage.

Well, I don't know about that. I NEVER use MY money for temporary storage.

FactChecker · Feb 14, 2017

phinds said:

Well, I don't know about that. I NEVER use MY money for temporary storage.

AAARRRRRGGGGGHHHHH! Ok. I'm a moron at spelling. I'll correct it.

rootone · Feb 14, 2017

FactChecker said:

Messing with it can cause unexpected and counter-intuitive results.

I bet it can. I doubt if anyone other than an OS developer could do much about it anyway though.

phinds · Feb 14, 2017

rootone said:

I bet it can. I doubt if anyone other than an OS developer could do much about it anyway though.

I don't think cache management is part of the O.S. it is part of the on-board processing, the "computer within the computer" as it were. I'm not 100% sure that that's always the case.

phinds · Feb 14, 2017

FactChecker said:

AAARRRRRGGGGGHHHHH! Ok. I'm a moron at spelling. I'll correct it.

I, of course, niver misspolle anythung.

EngWiPy · Feb 14, 2017

I think cache management is part of the architecture of the CPU.

Rive · Feb 15, 2017

S_David said:

I think cache management is part of the architecture of the CPU.

The/most CPU provides basic cache management, but also there are instructions for cache management (so the code can fetch data into the cache which will be needed in the future) and also: by organizing the data usage to fit with the cache structure the programmer and the compiler (maybe even the OS) can affect the cache efficiency.

DrClaude · Feb 15, 2017

Most of the posts here appear to be from computer scientists, focussing on the caching of instructions. As someone who has simulated big systems, let me tell you that proper cache management is also very important for data. For instance, accessing arrays following their storage order in memory is very important to avoid cache misses, where the data needed is not in cache and must be fetched from main RAM. This is because, as previously said, one of the main assumption for caching algorithms is that the next requested part of memory is most likely near the previously accessed part of memory. Cache misses can severely affect a program efficiency!

DrClaude · Feb 15, 2017

http://computer.howstuffworks.com/cache.htm

FactChecker · Feb 15, 2017

DrClaude said:

Most of the posts here appear to be from computer scientists, focussing on the caching of instructions. I someone who has simulated big systems, let me tell you that proper cache management is also very important for data. For instance, accessing arrays following their storage order in memory is very important to avoid cache misses, where the data needed is not in cache and must be fetched from main RAM. This is because, as previously said, one of the main assumption for caching algorithms is that the next requested part of memory is most likely near the previously accessed part of memory. Cache misses can severely affect a program efficiency!

I have seen work in time-critical applications where the cache activity was monitored and successfully optimized. My only first-hand attempt was when I naively tried to optimize and got bad results. In that effort, a large amount of data was being looped through repeatedly. Trying to be smart, I alternated looping forward and backward so that the data last looped through would be the first looped through in the next iteration. To my surprise, that was significantly slower. It taught me a lesson and after that I left it to the compiler.

Rive · Feb 15, 2017

FactChecker said:

I have seen work in time-critical applications where the cache activity was monitored and successfully optimized.

Called 'code profiling'. There are various tools to monitor cache usage, branch prediction and many other things.
Also, many CPUs (and: GPUs also) has HW support for this.

jbriggs444 · Feb 15, 2017

phinds said:

Well, I don't know about that. I NEVER use MY money for temporary storage.

I've been known to temporarily transcribe a phone number to cache on a dollar bill. There is also a cache of cash in my car for tolls, parking, burgers and such.

phinds · Feb 15, 2017

S_David said:

I think cache management is part of the architecture of the CPU.

Yes, that is exactly what I had just said.

Rive said:

The/most CPU provides basic cache management, but also there are instructions for cache management (so the code can fetch data into the cache which will be needed in the future) and also: by organizing the data usage to fit with the cache structure the programmer and the compiler (maybe even the OS) can affect the cache efficiency.

Ah, good to know. I was not aware of that but it makes sense.

Mark44 · Feb 15, 2017

Somewhat OT, but I once worked with a guy who pronounced cache as "caché" (i.e., as two syllables, with the accent on the second syllable). He most likely was confusing the pronunciation of this French word with another French word, cachet, which is pronounced the way he was pronouncing cache.

phinds · Feb 15, 2017

Mark44 said:

Somewhat OT, but I once worked with a guy who pronounced cache as "caché" (i.e., as two syllables, with the accent on the second syllable). He most likely was confusing the pronunciation of this French word with another French word, cachet, which is pronounced the way he was pronouncing cache.

Yes, he had to have been. Cachet has been absorbed into, and is used in, English, but I can't see it ever being confused with cache by anyone who understands the two words.

Mark44 · Feb 15, 2017

phinds said:

Well, I don't know about that. I NEVER use MY money for temporary storage.

But if you cache money, is that the same as "cash money"? :oldbiggrin:

phinds · Feb 15, 2017

Mark44 said:

But if you cache money, is that the same as "cash money"?

Nah, if you cache money, it's then called "dormant assests"

FactChecker · Feb 15, 2017

I'm afraid that my spelling error has hijacked this thread.

phinds · Feb 15, 2017

FactChecker said:

I'm afraid that my spelling error has hijacked this thread.

Oh ... what were we talking about?

[/QUOTE]

DrClaude · Feb 15, 2017

To bring things back on track, here is a figure from C. W. Ueberhuber, Numerical Computation: Methods, Software, and Analysis (Springer, Berlin, 1997):

There is significant increase in computing time when the data no longer fit in the cache.

newjerseyrunner · Feb 15, 2017

You should never worry about the cache as a programmer. Do that once you start profiling your software and update your algorithm if there are problems.

You can not access the cache in any language other than assembly, and even then, you have no assurance of what actually happens. CPUs may not even have certain caches, so usually even the operating system is agnostic to the on-chip cache. The only way I know to even hint to the computer that you want something stored in cache is to use the PREFETCH assembly command in IA-32. But again, it's only a hint, not an instruction.

Storing a small text file would likely not be very useful, unless you plan on accessing the parts of it trillions of times, in which case your hardware will have likely already placed it in cache for you.

nsaspook · Feb 15, 2017

newjerseyrunner said:

You should never worry about the cache as a programmer. Do that once you start profiling your software and update your algorithm if there are problems.

It depends on what type of programmer you are.

Embedded/systems programming in C at the kernel/driver level even on devices like the 4 core RPi3 ARM8 with cache requires very close attention to cache issues that affect processor pipeline, branch execution times, memory barriers for cache coherency and such.

http://www.geeksforgeeks.org/branch-prediction-macros-in-gcc/
https://www.kernel.org/doc/Documentation/memory-barriers.txt

Rive · Feb 15, 2017

newjerseyrunner said:

You should never worry about the cache as a programmer.

Except when it becomes problematic to tell your customers to buy bigger/stronger HW...

EngWiPy · Feb 16, 2017

I don't remember studying anything about cache in the OS course. It was taught in the computer architecture course.

nsaspook · Feb 16, 2017

Cache in the news, in a bad way. While the effects of caching are usually transparent at the user level and higher OS levels it's possible to exploit cache operation in computer attacks even from JavaScript.
https://www.vusec.net/projects/anc/

FactChecker · Feb 16, 2017

nsaspook said:

It depends on what type of programmer you are. Embedded/systems programming in C at the kernel/driver level even on devices like the 4 core RPi3 ARM8 with cache requires very close attention to cache issues that affect processor pipeline, branch execution times, memory barriers for cache coherency and such.

I agree. But that being said, I think that it is a subject for an advanced, specialized programmer. A typical programmer would have a long time to learn about cache before anyone expects him to deal with these issues. And they would probably do it with frequent consultation with the manufacturer.

Mark44 · Feb 16, 2017

FactChecker said:

A typical programmer would have a long time to learn about cash before anyone expects him to deal with these issues.

Here we go again...

phinds · Feb 16, 2017

Mark44 said:

Here we go again...

Rive · Feb 16, 2017

FactChecker said:

And they would probably do it with frequent consultation with the manufacturer.

Common folk has to rely on the various 'performance optimization guide' documents, like this. Almost all CPU (and: GPU) manufactuer provides similar documents.
You have to have some serious background to get (real) personal attention.

Apart from professionals there are many amateurs who tries to give it a try on this field. It's just the matter of performance bottlenecks, and without access to professional programming materials it's surprisingly frequent.

FactChecker · Feb 16, 2017

Rive said:

Apart from professionals there are many amateurs who tries to give it a try on this field. It's just the matter of performance bottlenecks, and without access to professional programming materials it's surprisingly frequent.

Ok. I'll buy that. My experience was in an unusual environment.

Vanadium 50 · Feb 16, 2017

phinds said:

But it is NOT just data. Not just things like loop counters, it is the program code as well as data.

True, but this takes care of itself. When an instruction is executed, the odds are very high that the next instruction executed is the next instruction in memory, and that was read into the cache at the same time the instruction in question was loaded. The data cache is what the programmer needs to think about.

newjerseyrunner said:

You can not access the cache in any language other than assembly

Not true. In the Intel Knights Landing, there is high speed memory (called MCDRAM) that is used as a cache between the main memory and the chip. The programmer can let it cache automatically, or she can use a variation on malloc to allocate memory directly on the cache, thus pinning those objects into the fast memory.

In general, one can do cache management indirectly by careful placement of objects - one can calculate c = a + b in such a way that when one of a or b is read into the cache, the other is as well.

FactChecker · Feb 16, 2017

Vanadium 50 said:

True, but this takes care of itself. When an instruction is executed, the odds are very high that the next instruction executed is the next instruction in memory, and that was read into the cache at the same time the instruction in question was loaded.

But there are important clever exceptions. For instance, they usually assume that the last instruction of a loop is followed by the first instruction of the loop, because there are usually several loops before the loop is done. It is very hard to do better than the automatic optimization. It's usually best to respect it and work with it. Unfortunately, in a lot of safety critical work, the optimization must be kept at very low levels or turned off completely.

.Scott · Feb 16, 2017

phinds said:

I don't think cache management is part of the O.S. it is part of the on-board processing, the "computer within the computer" as it were. I'm not 100% sure that that's always the case.

Yes, for most processors, cache is primarily a processor feature that operates with little or no direct encouragement from the software.

Here are some situations where the knowledge of the cache performance is important:
1) Compiler-writing. This is perhaps the most important.
2) Debugging when using hardware debugging tools. The host is the processor where the debugging software is running. The target is the processor being debugged. When the host and target are the same processor, the caching can be invisible. But when they are not, the target may have asynchronous debugging features. Without awareness of the caching, the debugging environment can often produce perplexing situations.
3) Multicore environments. When you have several processor on a single chip that share memory, you will be provided machine-level instructions such as "instruction sync" and "data sync" that force cache to become synced with memory. You may also have mechanisms (such as supplemental address spaces) for accessing memory without caching.
4) If instruction timing becomes critical, you will need to consider caching - and that can be impractical. What you really need to do, is make the instruction timing to be non-critical.

So getting back to the part of the original question:

fluidistic said:

Is it possible to store a small .txt file into the cache?"

Kind of, but not really.
If you read a text file into RAM and begin searching through it, it will be pulled into cache memory. If its less than half the size of cache, it is likely to be pulled in in its entirety.

But it gets pulled in implicitly, not because of explicit instructions. And if you continuously interrupt the process with other memory-intensive procedures, it may never be wholly included in cache.

rbelli1 · Feb 17, 2017

Intel said:

MCDRAM is a ... low capacity (up to 16GB) memory

https://software.intel.com/en-us/bl...dram-high-bandwidth-memory-on-knights-landing
[PLAIN]https://software.intel.com/e...dram-high-bandwidth-memory-on-knights-landing[/PLAIN]
In what world is 16GB a small amount of RAM? It's not an impressively large amount but still quite a lot.

Swing that at me in a few years and it will probably be a whole different story.

BoB

rootone · Feb 17, 2017

The primitive video game 'space invaders' could be done in one kilobyte.
At the time that was very impressive

rbelli1 · Feb 17, 2017

rootone said:

The primitive video game 'space invaders' could be done in one kilobyte.

1/8 kilobyte on the Atari 2600. Not as nice as the arcade version but still impressive what you can do with 128 bytes.

BoB

FactChecker · Feb 17, 2017

rbelli1 said:

https://software.intel.com/en-us/bl...dram-high-bandwidth-memory-on-knights-landing
In what world is 16GB a small amount of RAM? It's not an impressively large amount but still quite a lot.

Swing that at me in a few years and it will probably be a whole different story.

BoB

Interesting. It looks like the MCDRAM is Level-3 memory that can be used entirely as cache, entirely as addressable memory, or split between the two, depending on the BIOS settings. https://colfaxresearch.com/knl-mcdram/ has examples of how to use it in each case. So it can be directly controlled by the programmer as addressable memory and be faster than Level-3 cache.

rootone · Feb 17, 2017

rbelli1 said:

1/8 kilobyte on the Atari 2600. Not as nice as the arcade version but still impressive what you can do with 128 bytes.

BoB

Atari yeah, things like CPU directly addressing video RAM.
What, er?, video RAM?

rbelli1 · Feb 17, 2017

rootone said:

CPU directly addressing video RAM

No video RAM on the 2600 and contemporary machines. They directly addressed the beam.

Direct VRAM access was standard on all CGA XGA EGA and VGA systems. Also most of that era systems of all brands. It was mapped into the normal address space. Some systems used that ability to access more colors than were possible with datasheet operation.

FactChecker said:

programmer as addressable memory and be faster than Level-3 cache

16GB of close DRAM is certainly a performance opportunity. Bump that to SRAM and you can fly.

BoB

jim mcnamara · Feb 17, 2017

I think this link would be really appropriate as an answer to the first question, as the thread seems to have 'wandered'. Look for a discussion of data locality.

https://www.akkadia.org/drepper/cpumemory.pdf

This is a bit old, but still very relevant.

Vanadium 50 · Feb 18, 2017

rbelli1 said:

In what world is 16GB a small amount of RAM?

In a world where it is shared by 256 processes.

rbelli1 · Feb 18, 2017

Vanadium 50 said:

In a world where it is shared by 256 processes.

I just looked at the Intel Xeon Phi series. I had no idea anything like that existed.

BoB

What Data Is Stored in CPU Cache and How Does It Impact Performance?

Similar threads

Hot Threads

Touch-typing for programmers

How to calculate Tension for a series of connected points?

Python Complaining About Python

Fortran Reading files in pre-f77 - handling end of file

Sequential Analog Computers?

Recent Insights

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers