Instruction scheduling in multiple-issue RISCV-processor

bremenfallturm · Dec 27, 2024

Hello!

(this problem considers a multiple-issue RISCV-processor)

My attempt at a solution is identical to the solution key (screenshoted below), but I scheduled the sw instruction in slot 1 and the bne instruction in slot 2:

Since it is not given in the question that slot 1 and slot 2 are dedicated for different kinds of instructions (for example ALU operations), I would assume that it is completely arbitrary that the branch instruction is in slot 1 and the sw instruction is in slot 2? Am I correct or have I missed something?

Bonus question: If we look at clock cycle 5, then the bne and the sw instuctions will execute in parallell. I would assume that when the bne evaluates to true, it will update the PC to jump to L1:. From what I am used to, this will happen in the Execute stage or in the Decode stage, i.e. before the result from the sw would be written to memory. A properly functional multiple-issue processor would handle this, but I am a little confused about how this process works. Does the whole pipeline need to stall, and wait until the sw result is ready before the branch is performed? This does not match what I've seen in other questons where they also ask me to calculate IPC, considering control hazards. But if the branch is performed in slot 1 before slot 2 is finished, how does the processor handle the fact that flushing the pipeline in the decode stage, would, from what I assume, also flush the sw instruction, ending up in it not saving the data?

pbuk · Dec 27, 2024

bremenfallturm said:

Since it is not given in the question that slot 1 and slot 2 are dedicated for different kinds of instructions (for example ALU operations), I would assume that it is completely arbitrary that the branch instruction is in slot 1 and the sw instruction is in slot 2? Am I correct or have I missed something?

This is implementation dependent: do your course materials refer to a specific implementation, or define this specifically?

bremenfallturm said:

Bonus question: If we look at clock cycle 5, then the bne and the sw instuctions will execute in parallell. I would assume that when the bne evaluates to true, it will update the PC to jump to L1:. From what I am used to, this will happen in the Execute stage or in the Decode stage, i.e. before the result from the sw would be written to memory.

This is implementation dependent, however one would normally expect that any instruction updating the Program Counter would not take effect until the fetch at the start of the next full cycle i.e. following completion of the full current fetch-decode-execute-memory-writeback cycle so in this question the PC is updated in the execute stage of Cycle 5 and the result from the sw is written to memory in the writeback stage of Cycle 5. At the start of Cycle 6 ##\equiv ## Cycle 1 the next instruction is fetched from [PC] = L1.

bremenfallturm · Dec 30, 2024

pbuk said:

This is implementation dependent: do your course materials refer to a specific implementation, or define this specifically?

No, it is not mentioned anywhere. In other assignments in the course where no restriction on what each slot can do has been done, branch instruction has sometimes ended up in slot 2. But the exercise question doesn't define it and we've not gone deeper than talked about "multiple-issue processors" as a general term. So no specific implementation.

pbuk said:

in this question the PC is updated in the execute stage of Cycle 5 and the result from the sw is written to memory in the writeback stage of Cycle 5

I see. I think something quite crucial which is missing from my conceptual understanding is how long a full F-D-E-M-W cycle takes. So assuming no stalls, going from F->W takes 1 cycle, and hence, the next instruction will not be fetched until all stages in cycle 5 has been executed?

pbuk · Jan 1, 2025

bremenfallturm said:

No, it is not mentioned anywhere. In other assignments in the course where no restriction on what each slot can do has been done, branch instruction has sometimes ended up in slot 2.

Then I think you can assume that your answer with the BNE in slot 2 is also correct.

bremenfallturm said:

I see. I think something quite crucial which is missing from my conceptual understanding is how long a full F-D-E-M-W cycle takes.

I think you are looking for tricks where there are none: a full cycle takes exactly one cycle.

bremenfallturm said:

So assuming no stalls, going from F->W takes 1 cycle, and hence, the next instruction will not be fetched until all stages in cycle 5 has been executed?

Yes.

bremenfallturm · Jan 2, 2025

Thanks for your help! Just one last question to make sure I understand, since I read more about pipelining and CPI!

pbuk said:

Yes.

From my understanding what I have read online (after posting here), in a F-D-E-M-W cycle, I've understood it as it takes 5 cycles from one particular instruction to go from the F to the W stage. For example, executing "addi t0, s0, 3" would in cycle 1 be in F, cycle 2 be in D, and so on (assuming no stalls).

Wikipedia writes this about CPI:

> With pipelining, a new instruction is fetched every clock cycle by exploiting instruction-level parallelism, therefore, since one could theoretically have five instructions in the five pipeline stages at once (one instruction per stage), a different instruction would complete stage 5 in every clock cycle and on average the number of clock cycles it takes to execute an instruction is 1 (CPI = 1)

(source: https://en.wikipedia.org/wiki/Cycles_per_instruction)

If I understand your answer correctly, you say that it instead takes 1 cycle for an instruction to go through F-D-E-M-W. Did I misunderstand? (I also understand that this is implementation-dependent!)

bremenfallturm · Jan 8, 2025

Follow-up question: why can not sw t1, 0(s) and ori t1, t0, 7 be in the same slot?

I would assume that the result of ori could be forwarded to the beginning of the M stage of the sw instruction? Trying to illustate this with a little arrow:
...........| ...........
F D E V M W #sw
F D E . M W # ori

Or can forwarding not happen within the same cycle on multiple-issue processors?

pbuk · Jan 16, 2025

bremenfallturm said:

If I understand your answer correctly, you say that it instead takes 1 cycle for an instruction to go through F-D-E-M-W. Did I misunderstand? (I also understand that this is implementation-dependent!)

Sorry, my mistake. It is confusing using the word 'cycle' to mean both the five stages of the pipeline as in

bremenfallturm said:

F-D-E-M-W cycle

and a clock cycle. From the beginning of F to the end of W the 5 stages of the pipeline do indeed take 5 (clock) cycles.

pbuk · Jan 16, 2025

bremenfallturm said:

Follow-up question: why can not sw t1, 0(s) and ori t1, t0, 7 be in the same slot?

I would assume that the result of ori could be forwarded to the beginning of the M stage of the sw instruction? Trying to illustate this with a little arrow:
...........| ...........
F D E V M W #sw
F D E . M W # ori

Or can forwarding not happen within the same cycle on multiple-issue processors?

No, at the end of the E(xecute) stage the ori result is hidden inside the ALU of Core 2: it is not written to r1 until the W(riteback) stage.

Instruction scheduling in multiple-issue RISCV-processor

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

Bending Stress and Shear Stress

Four L-shaped members: Mechanical Analysis Problem

Engineering Joint and Marginal Distributions of a Randomly Selected Test Answer

Engineering Full bridge circuit with inductor and resistor

Engineering AM-AM and AM-PM graph generation in LTSpice

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight