Core design

Fetch

Instruction Fetch (rrv64_fetch) is the first pipeline stage in RRV64. This block is responsible for initiating requests for instruction data by sending requests to the instruction buffer and loop buffer. If one of the two buffers hits, the instruction data will be available in the next cycle. Otherwise, the instruction buffer will send a request to I-Cache to obtain the instruction data. Such process will take several cycles of delay. The IF module is also responsible for generating the address of the next instruction. It receives PC requests from other pipeline stages and arbitrates using a fixed priority scheme. The modules that act as PC sources are listed below, from the highest priority to the lowest.

rrv64_csr: Sends PC on exceptions, interrupts and trap return instructions.

rrv64_execute: Sends PC when a branch instruction taken.

rrv_mem_access: Sends PC when completing a fence.i instruction and when some of CSR registers have been modified. For fences, the PC request is delayed until all fetches before the fence instruction are completed and I-Cache is flushed. This is in case of any self-modifying code. For CSR modifications, delaying the PC request ensures that the CSR operation will use the correct values.

rrv_fetch: Sends PC for the normal case (next PC=PC+4 or PC+2 for compressed instructions), immediate jumps and register jumps.

Interfaces

if2ic/ic2if: These interfaces are used for sending PC fetch requests from IF to instruction buffer and loop buffer. This interface uses an enable signal to send requests. This enable signal is held high until a response is received. There are 2 signals in if2ic interface:

  1. pc: The address of the requested instruction.

  2. valid: If this request is valid.

On the response side (ic2if), the main signals are:

  1. inst/rvc_inst: The instruction data.

  2. valid: Whether this response is valid.

  3. is_rvc: Whether the instruction is RVC or not.

  4. excp_cause: Contains the exception cause of the instruction, if any.

  5. excp_valid: Whether this instruction was found to have an exception.

if2id: This interface contains all the data that is passed from IF to ID. It works using a valid/ready handshake. There are 2 signals in this interface.

  1. inst: The instruction data.

  2. pc: The PC of the instruction.

cs2if_npc/ma2if_npc/ex2if_npc/id2if_npc: These interfaces are used for sending PC redirection request to IF. They work using a valid/ready handshake. There are 2 signals in these interfaces.

  1. pc: The new value of the PC register.

  2. valid: Whether the request is valid.

Decode

Decode (ID) is the second stage in RRV64’s pipeline. It receives instruction data from the IF stage and hold it if necessary, expands C-extension instructions, decodes instruction data to set the control signals, and sends read requests to the regfile. When encountering an illegal instruction, the decoder will generate an exception signal, which will be handled when the current instruction reaches the MA stage.

The RRV64 implements the standard compressed extension to the RISC-V architecture, which allows for 16-bit, in addition to the normal 32-bit instruction size. To handle this new size of instructions, ID contains a submodule that takes the 16-bit instructions and expand it to its 32-bit equivalent. This module acts as the first layer of decoding.

After ID has the final instruction data, either the expanded compressed instruction, or the initial instruction data, it will begin to decode the instruction to determine how to set the control signals that will be used throughout the pipeline. In the RTL, you can find a case statement that will call different functions depending on the instruction’s opcode, funct7 field, funct5 field, etc. These functions will output the appropriate control signals. If the instruction needs to read the register, ID will asynchronously read the registers in rrv64_regfile (IRF). Since IRF doesn’t contain a real entry for x0, ID will instead substitute this read with a hardwired 0 signal.

If ID decodes its current instruction as a JAL instruction, it will calculate the destination address and send a redirect request to the IF stage. If it is a fence_i, mret, or a csr operation on the PMP related registers, the ID will stall the IF stage until the instruction is retired.

There is a Regfile Scoreboard in this stage. Its purpose is to track which registers still have pending writes. This is used to resolve data hazards. When ID decodes that its instruction will eventually write to the regfile, it indexes into the scoreboard using rd (the index of the destination register) and marks that entry, to signal that there is a pending write, and thus a possible data hazard. When that instruction eventually writes to the regfile, that scoreboard entry is cleared. If ID has an instruction and with one, or both, of its source registers indicating pending writes, it will use the data pushed forward from EX stage or wait for the data retrieved from the memory.

Interfaces

id2irf: This interface is for requesting the data in the IRF. There are 4 signals.

  1. rs1_addr: The address of source register 1.

  2. rs2_addr: The address of source register 2.

  3. rs1_re: Control signal. High when read to rs1_addr is valid.

  4. rs2_re: Control signal. High when read to rs2_addr is valid.

id2ex: This interface contains all the data passed from ID to EX. It works on a valid/ready handshake. There are 6 signals in this interface.

  1. pc: The PC of the instruction.

  2. inst: The instruction data.

  3. rs1_addr: The address of source register 1.

  4. rs2_addr: The address of source register 2.

  5. is_rvc: Signals whether this instruction is RVC, used to calculate npc in EX and MA stage, if needed.

ex2id_bps/ma2id_bps: These interfaces are used for data forwarding: send the execution result of the EX/MA stage back to the EX stage to solve data hazard. There are 4 signals in this interface.

  1. valid_addr: Indicating whether the address of register accessing or memory accessing is valid.

  2. valid_data: Indicating whether the data of register accessing or memory accessing is valid.

  3. addr: The address of register accessing or memory accessing. Used to compare with the address to be accessed by the instruction in the ID stage.

  4. data: The data in register accessing or memory accessing.

Execute

The execute stage is responsible for calculations and sending memory requests to the LSU. This stage consists of an arithmetic and logic unit (ALU), a pair of multi-cycle multiplier and divider, a branch address calculation unit and a load/store address calculation unit.

ALU: The ALU is responsible for additions, subtractions, shifts, data comparisons (for branches and slt instructions), and bit-wise logical operations (AND, OR, XOR). The ALU is fed with the operands as well as the operation type. The logic in ALU is purely combinational.

Multiplier: The multiplier is used for multiplications. It is fed the operands as well as the multiplication type. The start_pulse input of the multiplier is set to 1 for 1 cycle to trigger the multiplication operation. The complete output is set to 1 when the multiplication is done. For multiplications where only the lower 64 bits of the result are needed, the calculation completes in the same cycle the start_pulse is set to 1. For multiplications where the upper 64 bits of the result are needed, the calculation completes in 3 cycles.

Divider: The divider is used for division operations. The divider is fed with the operands as well and the division type. The divider triggers the calculation when start_pulse input is set to 1. The complete output is set to 1 when DIV is done. DIV takes 17 cycles to accomplish a division operation.

The target address of the branch and the address of load/store instructions are calculated by the branch address calculation unit. For a branch instruction, if the branch is taken, a flush signal will be sent to IF and ID to “flush” the instructions in those stage, and a redirection signal will be sent to IF and the value of PC will change accordingly. For load/store instruction, the memory access request will be sent to D-Cache, so if D-Cache hits, we can the get the memory access result at MA stage in the next cycle.

Interfaces

ex2ma: This interface contains all the data passed from EX to MA. It works on a valid/ready handshake. There are 6 signals in this interface.

  1. pc: The PC of the instruction.

  2. inst: The instruction data.

  3. ex_out: The result of EX’s calculation.

  4. rd_addr: The address of destination register 1, if any.

  5. csr_addr: The address of csr register, if any.

  6. is_rvc: Whether this instruction is RVC.

ex2dc: This is the interface between EX and D-Cache, used for sending memory requests. It uses a valid/ready handshake. There are 5 signals in this interface.

  1. rw: 1 if the request is a write, 0 if it is a read.

  2. mask: The byte mask for Store operation.

  3. addr: The memory request address.

  4. wdata: The write data of the memory request.

  5. width: The width of the operand of Load/Store operation.

Memory Access

This stage is responsible for receiving memory responses from D-Cache, interfacing with rrv_csr (CSR), sending redirection requests to IF in certain cases, and committing instructions and writing data to Register Files.

For load and store instructions, MA will receive memory responses from D-Cache. Only 1 memory response is accepted per instruction. Loads will respond with the data read from memory, while stores will respond with 0 data. The data will be pushed forward to the ID stage through the bypass network to solve possible data hazard.

For CSR instructions, the MA stage will read and write the CSR Registers.

For fence or those csr operations on the PMP related registers, MA will send a npc signal to the IF stage to release the stall state of the IF, ID and EX stages.

For instructions with destination register and without any exceptions, it is at MA stage that the result will write to the regfile. Regfile writes are synchronous.

Interfaces

dc2ma: This interface is the memory response interface between D-Cache and MA. There are 4 signals in this interface.

  1. rdata: The read data requested by load instructions.

  2. excp_valid: Signals whether the memory access operation cause an exception (e.g. violated a PMP check).

  3. excp_cause: Contains the exception cause of the instruction, if any.

  4. valid: Whether the response is valid.

ma2cs/ma2cs_ctrl: These interfaces are used by MA for sending read/write requests to CSR. The ma2cs_ctrl is for controlling transactions with CSR. In ma2cs_ctrl, there are 3 signals in this interface:

  1. csr_op: CSR operation type. It can be set to RRV64_CSR_OP_RW (read and write), RRV64_CSR_OP_RS (read and set), RRV64_CSR_OP_RC (read and clear) and CSR_OP_NONE if MA does not have a request to CSR.

  2. ret_type: Return instruction type (mret or uret). It will be set to RET_TYPE_NONE if the instruction is not either of the ret type instructions mentioned.

  3. is_wfi: Set to 1 if the instruction is a WFI instruction.

For ma2cs, there are 5 signals in this interface:

  1. pc: PC of the current instruction. Used mainly for exception handling.

  2. csr_addr: Request CSR address.

  3. csr_wdata: Data used for do some calculation with data in CSR, the calculation result will be written back to the CSR.

  4. rs1_addr: rs1 address of the instruction. Used for checking if the CSR operation should be considered a write.

  5. mem_addr: Memory address of the load or store instruction. Used for updating the MTVAL CSR on load/store PMP exceptions.

ma2irf: This interface is used by MA to send regfile writes to IRF. Writes will be validated using an active high write enable signal. Including the enable signal, there are 3 signals in this interface:

  1. rd: Write data.

  2. rd_addr: Regfile write address.

  3. rd_we: Write enable.

Instruction Buffer

The instruction buffer is mainly used to prefetch instructions from L1 Cache. In addition to the instruction requested by the IF, the instruction buffer also fetches the instructions of the next two cache lines. If the execution flow is sequential, or there is a forward jump whose span is less than two cache lines, the instruction buffer will hit and return the instruction data within one cycle since we have already fetch it before. When a branch or jump instruction is taken and the instruction corresponding to the destination address is not currently in instruction buffer, the instruction buffer will be flushed and send a request to ICache.

Loop Buffer

Loop buffer is a high speed D-Cache type memory that is used for holding up to 64 of the most recently fetched instructions. It is maintained by the IF stage of the pipeline. If a branch instruction is taken, we can first check the loop buffer to see if the instruction exists. If the loop buffer hits, the instruction data will be returned to IF within a cycle. If not, the loop buffer will wait for the instruction data be fetched from instruction buffer or L1 Cache and use this instruction to replace the oldest instruction in loop buffer.

Address Translation

To support an operating system, RRV64 features full hardware support for address translation via a Memory Management Unit (MMU). It has separate configurable data and instruction TLBs. The TLBs are fully set-associative memories. On each instruction and data access, they are checked for a valid address translation. If none exists, RRV64’s hardware PTW queries the main memory for a valid address translation. The replacement strategy of TLB entries is Pseudo Least Recently Used (LRU).

Both instruction cache and data cache are virtually indexed and physically tagged and fully parametrizable. The address is split into page offset (lower 12 bit) and virtual page number (bit 12 up to 39). The page offset is used to index into the cache while the virtual page number is simultaneously used for address translation through the TLB. In case of a TLB miss the pipeline is stalled until the translation is valid.

Exception Handling

Exceptions can occur throughout the pipeline and are hence linked to a particular instruction. The first exception can occur during instruction fetch when the PTW detects an illegal TLB entry or the address is not aligned. During decoding, exceptions can occur when the decoder detects an illegal instruction. As soon as an exception has occurred, the corresponding instruction is marked and auxiliary information is saved. Such excepting instruction will be handled by the exception handler at the MA stage.

Interrupts are asynchronous exceptions, in RRV64, they are synchronized to a particular instruction. Like exception, the interrupt signal will be processed in the MA stage.

Privileged Extensions

The privileged specification defines more CSRs governing the execution mode of the hart. The base supervisor ISA defines an additional interrupt stack for supervisor mode interrupts as well as a restricted view of machine mode CSRs. Accesses to these registers are restricted to the same or a higher privilege level.

CSR accesses are executed in the MA stage. Furthermore, a CSR access can have side-effects on subsequent instructions which are already in the pipeline e.g. altering the address translation infrastructure. This makes it necessary to completely flush the pipeline on such accesses.