Cache Controller: Exploring Its Mechanics and Functionality

PainterGuy · Apr 18, 2019

Hi,

I have always thought that BIOS program, which is firmware, of any computing device serves as an initiating set of instructions which launches the main program such as Windows.

Please have a look on this attachment. The following text is from the attachment.

"The concept of cache memory is based on the idea that computer programs tend to get instructions or data from one area of main memory before moving to another area. Basically, the cache controller “guesses” which area of the slow dynamic memory the CPU (central-processing unit) will need next and moves it to the cache memory so that it is ready when needed. If the cache controller guesses right, the data are immediately available to the microprocessor. If the cache controller guesses wrong, the CPU must go to the main memory and wait much longer for the correct instructions or data. Fortunately, the cache controller is right most of the time."

When I read the text above first time, I got the impression as if a cache controller also has a separate 'guiding' firmware, i.e. an algorithm, which helps it how to 'guess'. Now I think that I was wrong. A cache controller has an added ability to access several required locations of memory at a faster rate. Like it has 10 hands compared to two hands and each hand could reach to a different memory location at a very fast rate.

To get a better understanding I was trying to see how a GPU works. I have boldfaced the sentence which, I think, is the gist.

"A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobile phones, personal computers, workstations, and game consoles. Modern GPUs are very efficient at manipulating computer graphics and image processing. Their highly parallel structure makes them more efficient than general-purpose central processing units (CPUs) for algorithms that process large blocks of data in parallel. In a personal computer, a GPU can be present on a video card or embedded on the motherboard. In certain CPUs, they are embedded on the CPU die.[1]" [5]

Here, the architecture of a GPU helps algorithms to process data efficiently compared to a general purpose CPU. I believe that Artificial Intelligence Accelerator also facilitates artificial intelligence algorithms to process data more efficiently.

I'm still little confused about the working of a cache controller like how it guesses. Could you please help me with this? Also in the block diagram FIGURE 11-16 there is a double sided arrow between cache controller and microprocessor, what does it mean? That microprocessor and cache controller interacts with each other but in what way and how?

Thank you.Helpful links:
1: https://en.wikipedia.org/wiki/AI_accelerator
2: https://www.quora.com/Does-a-smartphone-have-a-BIOS-chip
3: https://www.sciencedirect.com/topics/engineering/cache-controller
4: https://www.datacenterknowledge.com...ge-nvidia-s-ai-chip-dominance-facebook-s-help
5: https://en.wikipedia.org/wiki/Graphics_processing_unit

Rive · Apr 18, 2019

The cache controllers (in the CPU) are working with wired logic, there is no real firmware there.

That text from the attachment is a bit misguiding. The real concept is based on the statistics that usually a localized area of the memory is used more than once as the program runs. Like in case of a program loop: the memory occupied by the instructions of the loop will be re-read every time the loop turns. Something similar goes with the data too: usually there are small, frequently used data areas (but when it is about a sequential read of a GB long data field then no cache will help - not much, anyway).
If those frequently used areas are moved to a fast memory (access delay is 5-15 CPU cycle instead of 100+ cycle like for the main memory) then the program will run faster.

The actual algorithm will depend on the CPU type. The process is called 'prefetch'. The 'guess' part is actually not really big - if you want to see a difficult guess based on logic then you should check out the branch prediction instead.

PainterGuy · Apr 18, 2019

Thank you!

Rive said:

The cache controllers (in the CPU) are working with wired logic, there is no real firmware there.

That text from the attachment is a bit misguiding. The real concept is based on the statistics that usually a localized area of the memory is used more than once as the program runs. Like in case of a program loop: the memory occupied by the instructions of the loop will be re-read every time the loop turns. Something similar goes with the data too: usually there are small, frequently used data areas (but when it is about a sequential read of a GB long data field then no cache will help - not much, anyway).

Okay. So there is no firmware involved and it's all wired logic. The wired logic is also used in case of a look-ahead carry adder (that's the closest example I can think of).

In the block diagram FIGURE 11-16 there is a double sided arrow between cache controller and microprocessor. I believe it means that cache controller and microprocessors also interact directly with each other. It's really difficult to comprehend the guessing job of cache controller for a person like me. I don't think that a cache controller can just grab the current address from address bus and fetch all data surrounding or related that address expecting that the microprocessor is going to need a certain chunk of data. Or, the microprocessor might guide the cache controller itself because it could communicate directly with it to fetch the most expected data. But the only problem is that a microprocessor itself is a bare set of transistors and whatever without any intelligence of itself. It's the operating system which endow it with intelligence, right? I understand that at the very bottom level, an operation system, OS, also gets translated into an assembly language instructions which are a kind of 'hardware' commands to activate certain transistors etc. So, it might be possible that when a certain program is written, including OS, it is written in such a way to help a microprocessor to coordinate with a cache controller to speed up the action.

I tried to read Wikipedia article on branch predictor without much success. The following text is from that article and I have boldfaced the portion which looks important to me.

"Static prediction is the simplest branch prediction technique because it does not rely on information about the dynamic history of code executing. Instead, it predicts the outcome of a branch based solely on the branch instruction.[7]

The early implementations of SPARC and MIPS (two of the first commercial RISC architectures) used single-direction static branch prediction: they always predict that a conditional jump will not be taken, so they always fetch the next sequential instruction. Only when the branch or jump is evaluated and found to be taken, does the instruction pointer get set to a non-sequential address.

Both CPUs evaluate branches in the decode stage and have a single cycle instruction fetch. As a result, the branch target recurrence is two cycles long, and the machine always fetches the instruction immediately after any taken branch. Both architectures define branch delay slots in order to utilize these fetched instructions.

A more advanced form of static prediction presumes that backward branches will be taken and that forward branches will not. A backward branch is one that has a target address that is lower than its own address. This technique can help with prediction accuracy of loops, which are usually backward-pointing branches, and are taken more often than not taken.

Some processors allow branch prediction hints to be inserted into the code to tell whether the static prediction should be taken or not taken. The Intel Pentium 4 accepts branch prediction hints, but this feature was abandoned in later Intel processors.[8]"
https://en.wikipedia.org/wiki/Branch_predictor

Rive said:

The actual algorithm will depend on the CPU type. The process is called 'prefetch'. The 'guess' part is actually not really big - if you want to see a difficult guess based on logic then you should check out the branch prediction instead.

You also used the word "algorithm" and the use of word algorithm often suggests some kind of software/program but in this case, as you said, there is no firmware involved.

My apologies about the layman terminology and verbosity.

Thanks a lot for your help and time!

Tom.G · Apr 19, 2019

A very simple example of the cache process:

Cache Controller is monitoring CPU memory read requests
CPU accesses memory for an instruction
Cache Controller checks to see if that address content is in the Cache
If "No"
- Read that memory location
- return that content to CPU
- Read content of next 16 memory locations into Cache
If "Yes"
- return that content to CPU

Note that the Cache Controller and CPU can operate concurrently without interferring with each other. This means the Cache Controller can read the next memory locations above while the CPU is processing the data it was just given.

Hope this helps.

Cheers,
Tom

p.s. Of course when the CPU Writes to memory, the Cache Controller also has to monitor those to possibly update its own copy.

anorlunda · Apr 19, 2019

PainterGuy said:

I tried to read Wikipedia article on branch predictor without much success.

Good for you. It takes real effort to understand difficult topics. You're doing exactly the right thing.

Very helpful replies from others. This is PF at its best. :bow:

PainterGuy · Apr 20, 2019

Thank you for the helpful reply!

I wanted to clarify few points so if you don't mind, please help me.

Tom.G said:

A very simple example of the cache process:

Cache Controller is monitoring CPU memory read requests

CPU accesses memory for an instruction

Cache Controller checks to see if that address content is in the Cache

I'll just try to make few changes to see if I understand it correctly though I understand it's like splitting hairs.

CPU accesses memory for an instruction/Data

Tom.G said:

If "No"

Read that memory location

return that content to CPU

Read content of next 16 memory locations into Cache

If "Yes"

return that content to CPU

Would CPU wait on the cache controller to fetch it the required instruction/data from memory? Wouldn't it fetch it itself? But, yes, the cache controller would read the next 16 memory locations into the cache which CPU might find handy for next read cycles.

Thanks a lot!

harborsparrow · Apr 20, 2019

Cache design is a black hole. I once took a graduate semester at Columbia on cache design alone (and it made my brain hurt). And I still had questions.

You can learn how a given computer's cache system works in great detail, but buy a different motherboard/CPU combination, and all bets of the details are off.

For example, one of the ways companies such as Acer have been able to sell laptops at lower cost is to skimp on the cache subsystem. Cache memory specs are seldom available to the purchasing public, although if you dig, you can possibly find out the L1 cache size and L2 cache size (if they've even got 2 layers). It wouldn't matter for a typical desktop user, perhaps, but would affect results for gamers or heavy computational programs.

Rive · Apr 20, 2019

harborsparrow said:

You can learn how a given computer's cache system works in great detail, but buy a different motherboard/CPU combination, and all bets of the details are off.

Well, last time I had cache on my motherboard was in the P1 era: that time it was common to have an L2 cache on the motherboard. The most fun part was with a late K6-III, which also had an L2 cache inside, making the MoBo cache actually an L3.
It was long time ago and it was a last one. From that on, cache is what's inside the CPU.

PainterGuy said:

You also used the word "algorithm" and the use of word algorithm often suggests some kind of software/program but in this case, as you said, there is no firmware involved.

This usage of 'algorithm' might not be conform with mathematical accuracy, but it is still common. Anything what can be described by bunch events with the exact sequence depending on external variables is an 'algorithm', regardless of the underlying HW or SW.

PainterGuy said:

I tried to read Wikipedia article on branch predictor without much success.

Don't take that too seriously. Branch prediction was brought up as an example, that how it looks like when the CPU seriously guessing something... Compared to that cache prefetch mechanism is quite dumb, with far less guess.

PainterGuy said:

Would CPU wait on the cache controller to fetch it the required instruction/data from memory? Wouldn't it fetch it itself?

Originally it was a parallel process: the cache got the data off the bus the same time as the CPU (in case of a cache miss). Later on the cache became the part of the CPU, and by now the execution part of the CPU can't even 'see' directly the memory anymore: everything is done by the relevant subsection of the CPU which manages the cache and the memory too.

PainterGuy said:

I don't think that a cache controller can just grab the current address from address bus and fetch all data surrounding or related that address expecting that the microprocessor is going to need a certain chunk of data.

But it does...

For the PC world it started here, with the 82385 which provided cache control for early 80386 CPUs. Check on page 8, the section about 'Read misses': it does nothing else but fetching the full line which caused a cache miss and hopes that it'll do. And it did, a 386 with cache was quite a beast compared to one without cache...

And here is what it became. Read page 82, section 2.5.4.2. It is about some Intel 'Core' family CPUs. Some algorithms, but still not much what you would call real 'guessing'... So just don't overthink it.

Rive · Apr 20, 2019

PainterGuy said:

So, it might be possible that when a certain program is written, including OS, it is written in such a way to help a microprocessor to coordinate with a cache controller to speed up the action.

In modern CPUs there are usually instructions which can modify cache behavior and trigger certain functionality, but it is hard to use them efficiently. For most programmers they are just kind of 'eyecandy'. Compilers are regularly using them as far as I know, but still most cache management is done by HW.

There was once one CPU family in the PC world which might had some possibility of real SW based cache management. The Transmeta Crusoe had a SW based run-time translation layer built into execute x86 code on a VLIW core. (Most modern CPUs does something similar based on HW, anyway.) Since for this CPU family there was an always running software layer present to provide translation and optimization (!), maybe it had some cache management functionality too.
The CPU family discontinued after two members due the lack of real success.
Due the lack of documentation I know no details.

anorlunda · Apr 20, 2019

harborsparrow said:

Cache design is a black hole. I once took a graduate semester at Columbia on cache design alone (and it made my brain hurt).

That's a great comment.-----
It's not impossible to do cache controller logic using microcode. However, it would probably make the function too slow and is thus isn't used.

But the point is that the boundary between hardware and software can be fuzzy; almost arbitrary. Don't assume that you can't implement an algorithm with hard wired logic.

PainterGuy · Apr 20, 2019

Thank you so much! You guys are so helpful.

harborsparrow said:

I once took a graduate semester at Columbia on cache design alone (and it made my brain hurt). And I still had questions.

Well, I'd say that one needs to have a PhD to get started working on cache controller. Companies like AMD and Intel need professionals who could modify and improve the cache system so one first needs to have full command of the working and understanding of present cache system technology so I think either PhD and/or years of experience is a must.

Rive said:

This usage of 'algorithm' might not be conform with mathematical accuracy, but it is still common. Anything what can be described by bunch events with the exact sequence depending on external variables is an 'algorithm', regardless of the underlying HW or SW.

I understand it now. Thanks a lot for all the detailed replied and references.

anorlunda said:

It's not impossible to do cache controller logic using microcode. However, it would probably make the function too slow and is thus isn't used.

Agreed.

Once again, thanks a lot for the time and patience!

Cache Controller: Exploring Its Mechanics and Functionality

Attachments

1. What is a cache controller?

2. How does a cache controller work?

3. What are the benefits of using a cache controller?

4. Are there different types of cache controllers?

5. Can a cache controller improve the performance of all computer systems?

Similar threads

Hot Threads

Recent Insights