Cache controller

  • Thread starter PainterGuy
  • Start date
483
16
Hi,

I have always thought that BIOS program, which is firmware, of any computing device serves as an initiating set of instructions which launches the main program such as Windows.

Please have a look on this attachment. The following text is from the attachment.

"The concept of cache memory is based on the idea that computer programs tend to get instructions or data from one area of main memory before moving to another area. Basically, the cache controller “guesses” which area of the slow dynamic memory the CPU (central-processing unit) will need next and moves it to the cache memory so that it is ready when needed. If the cache controller guesses right, the data are immediately available to the microprocessor. If the cache controller guesses wrong, the CPU must go to the main memory and wait much longer for the correct instructions or data. Fortunately, the cache controller is right most of the time."

When I read the text above first time, I got the impression as if a cache controller also has a separate 'guiding' firmware, i.e. an algorithm, which helps it how to 'guess'. Now I think that I was wrong. A cache controller has an added ability to access several required locations of memory at a faster rate. Like it has 10 hands compared to two hands and each hand could reach to a different memory location at a very fast rate.

To get a better understanding I was trying to see how a GPU works. I have boldfaced the sentence which, I think, is the gist.

"A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobile phones, personal computers, workstations, and game consoles. Modern GPUs are very efficient at manipulating computer graphics and image processing. Their highly parallel structure makes them more efficient than general-purpose central processing units (CPUs) for algorithms that process large blocks of data in parallel. In a personal computer, a GPU can be present on a video card or embedded on the motherboard. In certain CPUs, they are embedded on the CPU die.[1]" [5]

Here, the architecture of a GPU helps algorithms to process data efficiently compared to a general purpose CPU. I believe that Artificial Intelligence Accelerator also facilitates artificial intelligence algorithms to process data more efficiently.

I'm still little confused about the working of a cache controller like how it guesses. Could you please help me with this? Also in the block diagram FIGURE 11-16 there is a double sided arrow between cache controller and microprocessor, what does it mean? That microprocessor and cache controller interacts with each other but in what way and how?

Thank you.


Helpful links:
1: https://en.wikipedia.org/wiki/AI_accelerator
2: https://www.quora.com/Does-a-smartphone-have-a-BIOS-chip
3: https://www.sciencedirect.com/topics/engineering/cache-controller
4: https://www.datacenterknowledge.com/machine-learning/intel-steps-its-challenge-nvidia-s-ai-chip-dominance-facebook-s-help
5: https://en.wikipedia.org/wiki/Graphics_processing_unit
 

Attachments

Last edited:
1,211
589
The cache controllers (in the CPU) are working with wired logic, there is no real firmware there.

That text from the attachment is a bit misguiding. The real concept is based on the statistics that usually a localized area of the memory is used more than once as the program runs. Like in case of a program loop: the memory occupied by the instructions of the loop will be re-read every time the loop turns. Something similar goes with the data too: usually there are small, frequently used data areas (but when it is about a sequential read of a GB long data field then no cache will help - not much, anyway).
If those frequently used areas are moved to a fast memory (access delay is 5-15 CPU cycle instead of 100+ cycle like for the main memory) then the program will run faster.

The actual algorithm will depend on the CPU type. The process is called 'prefetch'. The 'guess' part is actually not really big - if you want to see a difficult guess based on logic then you should check out the branch prediction instead.
 
483
16
Thank you!

The cache controllers (in the CPU) are working with wired logic, there is no real firmware there.

That text from the attachment is a bit misguiding. The real concept is based on the statistics that usually a localized area of the memory is used more than once as the program runs. Like in case of a program loop: the memory occupied by the instructions of the loop will be re-read every time the loop turns. Something similar goes with the data too: usually there are small, frequently used data areas (but when it is about a sequential read of a GB long data field then no cache will help - not much, anyway).
Okay. So there is no firmware involved and it's all wired logic. The wired logic is also used in case of a look-ahead carry adder (that's the closest example I can think of).

In the block diagram FIGURE 11-16 there is a double sided arrow between cache controller and microprocessor. I believe it means that cache controller and microprocessors also interact directly with each other. It's really difficult to comprehend the guessing job of cache controller for a person like me. I don't think that a cache controller can just grab the current address from address bus and fetch all data surrounding or related that address expecting that the microprocessor is going to need a certain chunk of data. Or, the microprocessor might guide the cache controller itself because it could communicate directly with it to fetch the most expected data. But the only problem is that a microprocessor itself is a bare set of transistors and whatever without any intelligence of itself. It's the operating system which endow it with intelligence, right? I understand that at the very bottom level, an operation system, OS, also gets translated into an assembly language instructions which are a kind of 'hardware' commands to activate certain transistors etc. So, it might be possible that when a certain program is written, including OS, it is written in such a way to help a microprocessor to coordinate with a cache controller to speed up the action.

I tried to read Wikipedia article on branch predictor without much success. The following text is from that article and I have boldfaced the portion which looks important to me.

"Static prediction is the simplest branch prediction technique because it does not rely on information about the dynamic history of code executing. Instead, it predicts the outcome of a branch based solely on the branch instruction.[7]

The early implementations of SPARC and MIPS (two of the first commercial RISC architectures) used single-direction static branch prediction: they always predict that a conditional jump will not be taken, so they always fetch the next sequential instruction. Only when the branch or jump is evaluated and found to be taken, does the instruction pointer get set to a non-sequential address.

Both CPUs evaluate branches in the decode stage and have a single cycle instruction fetch. As a result, the branch target recurrence is two cycles long, and the machine always fetches the instruction immediately after any taken branch. Both architectures define branch delay slots in order to utilize these fetched instructions.

A more advanced form of static prediction presumes that backward branches will be taken and that forward branches will not. A backward branch is one that has a target address that is lower than its own address. This technique can help with prediction accuracy of loops, which are usually backward-pointing branches, and are taken more often than not taken.

Some processors allow branch prediction hints to be inserted into the code to tell whether the static prediction should be taken or not taken. The Intel Pentium 4 accepts branch prediction hints, but this feature was abandoned in later Intel processors.
[8]"
https://en.wikipedia.org/wiki/Branch_predictor

The actual algorithm will depend on the CPU type. The process is called 'prefetch'. The 'guess' part is actually not really big - if you want to see a difficult guess based on logic then you should check out the branch prediction instead.
You also used the word "algorithm" and the use of word algorithm often suggests some kind of software/program but in this case, as you said, there is no firmware involved.

My apologies about the layman terminology and verbosity.

Thanks a lot for your help and time!
 

Tom.G

Science Advisor
2,511
1,345
A very simple example of the cache process:

  • Cache Controller is monitoring CPU memory read requests
  • CPU accesses memory for an instruction
  • Cache Controller checks to see if that address content is in the Cache
  • If "No"
    • Read that memory location
    • return that content to CPU
    • Read content of next 16 memory locations into Cache
  • If "Yes"
    • return that content to CPU

Note that the Cache Controller and CPU can operate concurrently without interferring with each other. This means the Cache Controller can read the next memory locations above while the CPU is processing the data it was just given.

Hope this helps.

Cheers,
Tom

p.s. Of course when the CPU Writes to memory, the Cache Controller also has to monitor those to possibly update its own copy.
 

anorlunda

Mentor
Insights Author
Gold Member
6,340
3,515
I tried to read Wikipedia article on branch predictor without much success.
Good for you. It takes real effort to understand difficult topics. You're doing exactly the right thing.

Very helpful replies from others. This is PF at its best. :bow:
 
483
16
Thank you for the helpful reply!

I wanted to clarify few points so if you don't mind, please help me.

A very simple example of the cache process:

  • Cache Controller is monitoring CPU memory read requests
  • CPU accesses memory for an instruction
  • Cache Controller checks to see if that address content is in the Cache
I'll just try to make few changes to see if I understand it correctly though I understand it's like splitting hairs.

  • CPU accesses memory for an instruction/Data
  • If "No"
    • Read that memory location
    • return that content to CPU
    • Read content of next 16 memory locations into Cache
  • If "Yes"
    • return that content to CPU
Would CPU wait on the cache controller to fetch it the required instruction/data from memory? Wouldn't it fetch it itself? But, yes, the cache controller would read the next 16 memory locations into the cache which CPU might find handy for next read cycles.

Thanks a lot!
 

harborsparrow

Gold Member
523
102
Cache design is a black hole. I once took a graduate semester at Columbia on cache design alone (and it made my brain hurt). And I still had questions.

You can learn how a given computer's cache system works in great detail, but buy a different motherboard/CPU combination, and all bets of the details are off.

For example, one of the ways companies such as Acer have been able to sell laptops at lower cost is to skimp on the cache subsystem. Cache memory specs are seldom available to the purchasing public, although if you dig, you can possibly find out the L1 cache size and L2 cache size (if they've even got 2 layers). It wouldn't matter for a typical desktop user, perhaps, but would affect results for gamers or heavy computational programs.
 
1,211
589
You can learn how a given computer's cache system works in great detail, but buy a different motherboard/CPU combination, and all bets of the details are off.
Well, last time I had cache on my motherboard was in the P1 era: that time it was common to have an L2 cache on the motherboard. The most fun part was with a late K6-III, which also had an L2 cache inside, making the MoBo cache actually an L3.
It was long time ago and it was a last one. From that on, cache is what's inside the CPU.

You also used the word "algorithm" and the use of word algorithm often suggests some kind of software/program but in this case, as you said, there is no firmware involved.
This usage of 'algorithm' might not be conform with mathematical accuracy, but it is still common. Anything what can be described by bunch events with the exact sequence depending on external variables is an 'algorithm', regardless of the underlying HW or SW.

I tried to read Wikipedia article on branch predictor without much success.
Don't take that too seriously. Branch prediction was brought up as an example, that how it looks like when the CPU seriously guessing something... Compared to that cache prefetch mechanism is quite dumb, with far less guess.

Would CPU wait on the cache controller to fetch it the required instruction/data from memory? Wouldn't it fetch it itself?
Originally it was a parallel process: the cache got the data off the bus the same time as the CPU (in case of a cache miss). Later on the cache became the part of the CPU, and by now the execution part of the CPU can't even 'see' directly the memory anymore: everything is done by the relevant subsection of the CPU which manages the cache and the memory too.

I don't think that a cache controller can just grab the current address from address bus and fetch all data surrounding or related that address expecting that the microprocessor is going to need a certain chunk of data.
But it does...

For the PC world it started here, with the 82385 which provided cache control for early 80386 CPUs. Check on page 8, the section about 'Read misses': it does nothing else but fetching the full line which caused a cache miss and hopes that it'll do. And it did, a 386 with cache was quite a beast compared to one without cache...

And here is what it became. Read page 82, section 2.5.4.2. It is about some Intel 'Core' family CPUs. Some algorithms, but still not much what you would call real 'guessing'... So just don't overthink it.
 
1,211
589
So, it might be possible that when a certain program is written, including OS, it is written in such a way to help a microprocessor to coordinate with a cache controller to speed up the action.
In modern CPUs there are usually instructions which can modify cache behavior and trigger certain functionality, but it is hard to use them efficiently. For most programmers they are just kind of 'eyecandy'. Compilers are regularly using them as far as I know, but still most cache management is done by HW.

There was once one CPU family in the PC world which might had some possibility of real SW based cache management. The Transmeta Crusoe had a SW based run-time translation layer built in to execute x86 code on a VLIW core. (Most modern CPUs does something similar based on HW, anyway.) Since for this CPU family there was an always running software layer present to provide translation and optimization (!), maybe it had some cache management functionality too.
The CPU family discontinued after two members due the lack of real success.
Due the lack of documentation I know no details.
 
Last edited:

anorlunda

Mentor
Insights Author
Gold Member
6,340
3,515
Cache design is a black hole. I once took a graduate semester at Columbia on cache design alone (and it made my brain hurt).
:DD That's a great comment.


-----
It's not impossible to do cache controller logic using microcode. However, it would probably make the function too slow and is thus isn't used.

But the point is that the boundary between hardware and software can be fuzzy; almost arbitrary. Don't assume that you can't implement an algorithm with hard wired logic.
 
483
16
Thank you so much! You guys are so helpful.

I once took a graduate semester at Columbia on cache design alone (and it made my brain hurt). And I still had questions.
Well, I'd say that one needs to have a PhD to get started working on cache controller. Companies like AMD and Intel need professionals who could modify and improve the cache system so one first needs to have full command of the working and understanding of present cache system technology so I think either PhD and/or years of experience is a must.

This usage of 'algorithm' might not be conform with mathematical accuracy, but it is still common. Anything what can be described by bunch events with the exact sequence depending on external variables is an 'algorithm', regardless of the underlying HW or SW.
I understand it now. Thanks a lot for all the detailed replied and references.

It's not impossible to do cache controller logic using microcode. However, it would probably make the function too slow and is thus isn't used.
Agreed.

Once again, thanks a lot for the time and patience!
 

Want to reply to this thread?

"Cache controller" You must log in or register to reply here.

Physics Forums Values

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving
Top