Mark44
Mentor
- 38,065
- 10,580
willem2 said:The cpu can have dozens of instructions waiting for other instructions or memory in the pipeline at the same time. The cpu will have no problems starting a multiply, a subtraction, a load (2 loads for the newest types) and a store in one clock cycle, even if these belong to different iterations of the loop.
We have some control over this, which is what optimizations using loop unrolling or loop unwinding are about.EngWiPy said:OK, I see. But we have no control over this. I mean this is a hardware design architecture how the CPU executes different instructions, because they are done at different units. But what if you are executing the same function on independent data, but using the same instructions?
The processor executes instructions in one or more pipelines, and tries to guess what the next instruction will be. If it guesses correctly, everything is fine, since that instruction is in the pipeline. If it guesses wrong, it has to flush the pipeline, which takes several clock cycles to refill.
Loops such as for or while loops can be problematic, as are branches such as if and if ... else.
Here's a simple example from this wiki article: https://en.wikipedia.org/wiki/Loop_unrolling
C:
int x;
for (x = 0; x < 100; x++)
{
delete(x);
}
The same loop, after unrolling:
C:
int x;
for (x = 0; x < 100; x += 5 )
{
delete(x);
delete(x + 1);
delete(x + 2);
delete(x + 3);
delete(x + 4);
}