eggy/eifueo

Fork 1

eggy e02bd8cc46 ece222: add pipelining

2023-11-07 12:57:16 -05:00

4.7 KiB

Raw Blame History

ECE 222: Digital Computers

Exceptions

In ARM, anything that interrupts the normal control flow of a program is an exception.

An interrupt from an interrupt request (IRQ) occurs when a peripheral wants to interrupt the current flow
A fault indicates a CPU error (e.g., division by zero) and returns to the faulty instruction
A trap runs the interrupt handler and returns to the next instruction

Exceptions are handled by running an exception handler then returning to the original line.

Vector table

A vector table is an array of handler addresses. Each index contains a number (a “vector”) and a priority.

Exception handling

First, in hardware: If the exception priority is higher than the current operating priority, the exception is immediately handled.

the current context is pushed to the stack
the operating mode is set to privileged
the operating priority is set to the exception priority
the program counter is set to the address of the exception (vector_table[exception_num])

Next, the handler runs, and it should:

preserve the any R4-R11 it modifies
clear the interrupt request (IRQ)
restore R4-R11
return with BX LR

Finally, in hardware:

the previous context is restored
the previous operating priority and mode are restored

!!! warning Interrupts can interrupt other interrupts, if their priority is sufficiently high!

!!! example How to interrupt-driven I/O:

**Write the ISR:** Assuming that the IRQ bit is cleared if `R0` is read:

```asm
ISR     PUSH    {R4-R11}    ; save previous state onto stack
        LDR     R3, [R0]    ; clear the IRQ by reading from it
        POP     {R4-R11}    ; restore state
        BX      LR          ; return to original address
```

**Store the interrupt handler in the vector table:** Assuming that the vector number is `22` and the vector table starts 16 addresses after the 0x00:

```asm
MOV32   R0, #ISR    ; handler address
MOV     R1, #38 * 4 ; offset: (16 + 22) * 4 bytes per address
STR     R1, [R0]    ; save address to table
```

**Enable interrupt requests:**

```asm
MOV32   R0, #ADDRESS_INTERRUPT_ENABLE
MOV     R1, #1
STR     R1, [R0]    ; enable interrupts
```

Processor design

Comparing the complex instruction set computer architecture to the reduced instruction set computer architecture:

Task	CISC	RISC
ALU operands can come from?	registers, memory	registers (load/store)
Addressing mode	complex	simple
Binary size	small	large (~30% larger)
Instruction size	variable	fixed
Pipelining	difficult	simple

Operation encoding

The R-format is used for operations of the form ADD Rd, Rn, Rm:

\[\underbrace{\text{op-code}}_\text{11 b}\ \ \overbrace{\text{Rm}}^\text{5 b}\ \ \underbrace{\text{shift amount}}_\text{6 b}\ \ \overbrace{\text{Rn}}^\text{5 b}\ \ \underbrace{\text{Rd}}_\text{5 b}\]

The D-format is used for operations of the form LDR Rt, [Rn, #offset]:

\[\underbrace{\text{op-code}}_\text{11 b}\ \ \overbrace{\text{offset}}^\text{9 b}\ \ 00\ \ \overbrace{\text{Rn}}^\text{5 b}\ \ \underbrace{\text{Rt}}_\text{5 b}\]

The CB-format is used for operations of the form CBZ Rt, LABEL:

\[\underbrace{\text{op-code}}_\text{8 b}\ \ \overbrace{\text{offset}}^\text{19 b}\ \ \underbrace{\text{Rt}}_\text{5 b}\]

Instruction data path

To execute an instruction, the following steps are observed:

Instruction fetch (IF)

fetch the instruction from instruction memory
increment the instruction address (PC += 4), latchedd into PC register at the end of the CPU cycle

Instruction decode (ID)

decode fields like the op-code, offset
read recoded registers

Execute (EX)

ALU calculates ADD, SUB, etc, as well as addresses for LDR/STR, sets zero status for CBZ
branch adder calculates any branch target addresses

Memory (ME)

if memory needs to be reached, either Write or Read must be asserted to prepare for it
write to memory

Writeback (WB)

write results to registers from memory, the ALU, or another register

Performance

Each step in the instruction data path has a varying time, so the clock period must be at least as long as the slowest step.

Performance is usually compared by comparing the execution times of standard benchmarks, such that:

\[\text{time}=n_{instructions}\times\underbrace{\frac{\text{cycles}}{\text{instruction}}}_\text{CPI}\times\frac{\text{seconds}}{\text{cycle}}\]

Pipelining

Pipelining changes the granularity of a clock cycle to be per step, instead of per-instruction. This allows multiple instructions to be processed concurrently.

(Source: Wikimedia Commons)

4.7 KiB Raw Blame History