Instruction scheduling in multiple-issue RISCV-processor

  • Context: Comp Sci 
  • Thread starter Thread starter bremenfallturm
  • Start date Start date
  • Tags Tags
    Instruction Scheduling
Click For Summary
SUMMARY

This discussion focuses on instruction scheduling in a multiple-issue RISC-V processor, specifically regarding the placement of the 'sw' and 'bne' instructions in execution slots. The participants clarify that the assignment does not restrict instruction types to specific slots, allowing for flexibility in scheduling. It is established that the Program Counter (PC) updates during the Execute stage of Cycle 5, while the 'sw' instruction writes to memory in the Writeback stage of the same cycle. The conversation emphasizes the importance of understanding the full Fetch-Decode-Execute-Memory-Writeback (F-D-E-M-W) cycle, which takes five clock cycles to complete.

PREREQUISITES
  • Understanding of RISC-V architecture and instruction types
  • Familiarity with multiple-issue processor concepts
  • Knowledge of pipeline stages: Fetch, Decode, Execute, Memory, Writeback
  • Comprehension of control hazards and instruction-level parallelism
NEXT STEPS
  • Study RISC-V instruction scheduling techniques
  • Learn about control hazards and their impact on pipeline performance
  • Explore the concept of forwarding in multiple-issue processors
  • Investigate the implications of instruction-level parallelism on CPI (Cycles Per Instruction)
USEFUL FOR

Computer architecture students, RISC-V developers, and engineers working on optimizing multiple-issue processor designs will benefit from this discussion.

bremenfallturm
Messages
81
Reaction score
13
Homework Statement
Exercise: Assume we have a VLIW processor
that can issue two instructions in the same clock
cycle. For the following code, schedule the
instructions into issue slots. Assume that no
hazards occurs.
Code:
addi s1, zero, 10
L1: lw t0, 0(s2)
ori t1, t0, 7
sw t1, 0(s2)
addi s2, s2, 4
addi s1, s1, -1
bne s1, zero, L1
Relevant Equations
N/A
Hello!

(this problem considers a multiple-issue RISCV-processor)

My attempt at a solution is identical to the solution key (screenshoted below), but I scheduled the sw instruction in slot 1 and the bne instruction in slot 2:
1735316769239.png

Since it is not given in the question that slot 1 and slot 2 are dedicated for different kinds of instructions (for example ALU operations), I would assume that it is completely arbitrary that the branch instruction is in slot 1 and the sw instruction is in slot 2? Am I correct or have I missed something?

Bonus question: If we look at clock cycle 5, then the bne and the sw instuctions will execute in parallell. I would assume that when the bne evaluates to true, it will update the PC to jump to L1:. From what I am used to, this will happen in the Execute stage or in the Decode stage, i.e. before the result from the sw would be written to memory. A properly functional multiple-issue processor would handle this, but I am a little confused about how this process works. Does the whole pipeline need to stall, and wait until the sw result is ready before the branch is performed? This does not match what I've seen in other questons where they also ask me to calculate IPC, considering control hazards. But if the branch is performed in slot 1 before slot 2 is finished, how does the processor handle the fact that flushing the pipeline in the decode stage, would, from what I assume, also flush the sw instruction, ending up in it not saving the data?
 
Physics news on Phys.org
bremenfallturm said:
Since it is not given in the question that slot 1 and slot 2 are dedicated for different kinds of instructions (for example ALU operations), I would assume that it is completely arbitrary that the branch instruction is in slot 1 and the sw instruction is in slot 2? Am I correct or have I missed something?
This is implementation dependent: do your course materials refer to a specific implementation, or define this specifically?

bremenfallturm said:
Bonus question: If we look at clock cycle 5, then the bne and the sw instuctions will execute in parallell. I would assume that when the bne evaluates to true, it will update the PC to jump to L1:. From what I am used to, this will happen in the Execute stage or in the Decode stage, i.e. before the result from the sw would be written to memory.
This is implementation dependent, however one would normally expect that any instruction updating the Program Counter would not take effect until the fetch at the start of the next full cycle i.e. following completion of the full current fetch-decode-execute-memory-writeback cycle so in this question the PC is updated in the execute stage of Cycle 5 and the result from the sw is written to memory in the writeback stage of Cycle 5. At the start of Cycle 6 ##\equiv ## Cycle 1 the next instruction is fetched from [PC] = L1.
 
Last edited:
  • Like
Likes   Reactions: FactChecker and berkeman
pbuk said:
This is implementation dependent: do your course materials refer to a specific implementation, or define this specifically?
No, it is not mentioned anywhere. In other assignments in the course where no restriction on what each slot can do has been done, branch instruction has sometimes ended up in slot 2. But the exercise question doesn't define it and we've not gone deeper than talked about "multiple-issue processors" as a general term. So no specific implementation.
pbuk said:
in this question the PC is updated in the execute stage of Cycle 5 and the result from the sw is written to memory in the writeback stage of Cycle 5
I see. I think something quite crucial which is missing from my conceptual understanding is how long a full F-D-E-M-W cycle takes. So assuming no stalls, going from F->W takes 1 cycle, and hence, the next instruction will not be fetched until all stages in cycle 5 has been executed?
 
bremenfallturm said:
No, it is not mentioned anywhere. In other assignments in the course where no restriction on what each slot can do has been done, branch instruction has sometimes ended up in slot 2.
Then I think you can assume that your answer with the BNE in slot 2 is also correct.

bremenfallturm said:
I see. I think something quite crucial which is missing from my conceptual understanding is how long a full F-D-E-M-W cycle takes.
I think you are looking for tricks where there are none: a full cycle takes exactly one cycle.

bremenfallturm said:
So assuming no stalls, going from F->W takes 1 cycle, and hence, the next instruction will not be fetched until all stages in cycle 5 has been executed?
Yes.
 
Thanks for your help! Just one last question to make sure I understand, since I read more about pipelining and CPI!
pbuk said:
Yes.
From my understanding what I have read online (after posting here), in a F-D-E-M-W cycle, I've understood it as it takes 5 cycles from one particular instruction to go from the F to the W stage. For example, executing "addi t0, s0, 3" would in cycle 1 be in F, cycle 2 be in D, and so on (assuming no stalls).

Wikipedia writes this about CPI:

> With pipelining, a new instruction is fetched every clock cycle by exploiting instruction-level parallelism, therefore, since one could theoretically have five instructions in the five pipeline stages at once (one instruction per stage), a different instruction would complete stage 5 in every clock cycle and on average the number of clock cycles it takes to execute an instruction is 1 (CPI = 1)

(source: https://en.wikipedia.org/wiki/Cycles_per_instruction)


If I understand your answer correctly, you say that it instead takes 1 cycle for an instruction to go through F-D-E-M-W. Did I misunderstand? (I also understand that this is implementation-dependent!)
 
Follow-up question: why can not sw t1, 0(s) and ori t1, t0, 7 be in the same slot?

I would assume that the result of ori could be forwarded to the beginning of the M stage of the sw instruction? Trying to illustate this with a little arrow:
...........| ...........
F D E V M W #sw
F D E . M W # ori


Or can forwarding not happen within the same cycle on multiple-issue processors?
 
Last edited:
bremenfallturm said:
If I understand your answer correctly, you say that it instead takes 1 cycle for an instruction to go through F-D-E-M-W. Did I misunderstand? (I also understand that this is implementation-dependent!)
Sorry, my mistake. It is confusing using the word 'cycle' to mean both the five stages of the pipeline as in
bremenfallturm said:
F-D-E-M-W cycle
and a clock cycle. From the beginning of F to the end of W the 5 stages of the pipeline do indeed take 5 (clock) cycles.
 
bremenfallturm said:
Follow-up question: why can not sw t1, 0(s) and ori t1, t0, 7 be in the same slot?

I would assume that the result of ori could be forwarded to the beginning of the M stage of the sw instruction? Trying to illustate this with a little arrow:
...........| ...........
F D E V M W #sw
F D E . M W # ori


Or can forwarding not happen within the same cycle on multiple-issue processors?
No, at the end of the E(xecute) stage the ori result is hidden inside the ALU of Core 2: it is not written to r1 until the W(riteback) stage.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
5K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
8
Views
2K
Replies
6
Views
4K
  • · Replies 14 ·
Replies
14
Views
5K
  • · Replies 1 ·
Replies
1
Views
13K
  • · Replies 12 ·
Replies
12
Views
4K