ece222: add perf and memory
This commit is contained in:
parent
4252a734e2
commit
adedb0c1ad
@ -119,7 +119,7 @@ Performance is usually compared by comparing the execution times of standard ben
|
|||||||
|
|
||||||
$$\text{time}=n_{instructions}\times\underbrace{\frac{\text{cycles}}{\text{instruction}}}_\text{CPI}\times\frac{\text{seconds}}{\text{cycle}}$$
|
$$\text{time}=n_{instructions}\times\underbrace{\frac{\text{cycles}}{\text{instruction}}}_\text{CPI}\times\frac{\text{seconds}}{\text{cycle}}$$
|
||||||
|
|
||||||
### Pipelining
|
## Pipelining
|
||||||
|
|
||||||
Pipelining changes the granularity of a clock cycle to be per step, instead of per-instruction. This allows multiple instructions to be processed concurrently.
|
Pipelining changes the granularity of a clock cycle to be per step, instead of per-instruction. This allows multiple instructions to be processed concurrently.
|
||||||
|
|
||||||
@ -128,3 +128,48 @@ Pipelining changes the granularity of a clock cycle to be per step, instead of p
|
|||||||
### Data forwarding
|
### Data forwarding
|
||||||
|
|
||||||
If data needs to be used from a prior operation, a pipeline stall would normally be required to remove the hazard and wait for the desired result (a **read-after-write** data hazard). However, a processor can mitigate this hazard by allowing the stalled instrution to read from the prior instruction's result instead.
|
If data needs to be used from a prior operation, a pipeline stall would normally be required to remove the hazard and wait for the desired result (a **read-after-write** data hazard). However, a processor can mitigate this hazard by allowing the stalled instrution to read from the prior instruction's result instead.
|
||||||
|
|
||||||
|
### Load hazards
|
||||||
|
|
||||||
|
If a value is produced in memory access (e.g., loads) that is required in the next instruction's EX. a stall is for the dependent instruction. This can be detected in the ID stage by testing if the current instruction sets the memory read flag and the next instruction accesses the destination register.
|
||||||
|
|
||||||
|
A processor **stalls** by disabling the PC and IF/ID write to prevent fetching the next instruction. Additionally, it sets the control in ID/EX to 0 to insert a no-op in the pipeline.
|
||||||
|
|
||||||
|
## Memory
|
||||||
|
|
||||||
|
### Static RAM (SRAM)
|
||||||
|
|
||||||
|
- retains data as long as power is supplied
|
||||||
|
- compared to DRAM, it is faster but more expensive, so it is used for cache
|
||||||
|
|
||||||
|
- To **read**: set word line = 1, turning on transistors, then read the **bit line**'s voltage
|
||||||
|
- To **write**: set word line = 1, turning on transistors, then drive the **bit line**'s voltage
|
||||||
|
|
||||||
|
<img src="https://2.bp.blogspot.com/-dCCrTGB-c6U/T1zaY5TG1oI/AAAAAAAAAu8/MutoYbjglvs/s640/SRAM.gif" width=500 />
|
||||||
|
|
||||||
|
### Dynamic RAM (DRAM)
|
||||||
|
|
||||||
|
- DRAM capacitors lose their charge over time so must be periodically **refreshed**
|
||||||
|
- Roughly 5x slower than SRAM, but cheaper, so it is used for main memory
|
||||||
|
|
||||||
|
- To **read**: precharge the bit line to $V_{DD}/2$, then set word line = 1, then sense and amplify the voltage change on the bit line. This also writes back the value.
|
||||||
|
- To **write**: along the bit line, drive $V_DD$ to charge the capacitor (write a $1$) or $GND$ to discharge (write a $0$).
|
||||||
|
|
||||||
|
<img src="https://www.electronics-notes.com/images/ram-dynamic-dram-basic-cell-01.svg" width=500 />
|
||||||
|
|
||||||
|
### Large DRAM chips
|
||||||
|
|
||||||
|
Each bit cell is placed into a symmetric 2D matrix to avoid linear searching. Assuming each addressing pin can address one byte (8 bits), including one bit to select row or column:
|
||||||
|
|
||||||
|
$$\text{\# addr bits} = \log_2(2\times\text{\# bytes})$$
|
||||||
|
|
||||||
|
The matrix would store a total of eight times the number of bytes / words, so each edge is the square root of that.
|
||||||
|
|
||||||
|
$$
|
||||||
|
\text{\# bits} = 2\times\text{\# bytes}\times\frac{\pu{8 bits}}{\pu{1 word}} \\
|
||||||
|
\text{matrix length}=\sqrt{\text{\# bits}}
|
||||||
|
$$
|
||||||
|
|
||||||
|
!!! example
|
||||||
|
A 16 Mib machine stores 2 MiB, or $1024^2$ bytes. Thus the bits are arranged in a $\sqrt{2\times1024^2\times8}=2^{12}$ by $2^{12}$ matrix, where each row holds $2^9$ 8-bit words.
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user