ECEN 5593: Advanced Computer Architecture Midterm Topic Review

Size: px

Start display at page:

Download "ECEN 5593: Advanced Computer Architecture Midterm Topic Review"

Randolf Heath
7 years ago
Views:

1 ECEN 5593: Advanced Computer Architecture Midterm Topic Review Below is a list of topics that may be included in the midterm examination on Thursday, October 20, Disclaimer: While I have tried to make the list below complete I do not guarantee that the list below is complete or exhaustive. Also note that I emphasize certain important things to know for some topics. This does not mean that these are the only important topics nor does not guarantee that these items will be on the exam. Technically, all material covered in lecture (unless I specifically stated otherwise), in Chapters 1 through 5.5 in the book, and in the required readings, is fair game for the exam. Including all the equations and such, but most of these equations can be derived in a rather straight-forward manner. Also, note that I believe in understanding more than memorizing (though some memorizing is important), so don t feel obligated to commit every formula and example to memory. Concentrate on the important concepts behind the examples and understand how the formulas were derived, especially what approximations are being made. Also, I find the Fallacies and Pitfalls section in each chapter to be enlightening and a good aid for reinforcing concepts. I. Computer Architecture Basics a. Assembly/Machine language i. The relationship between C-language source code, assembly language and machine code. (Really Important!) ii. Stack manipulation in assembly code iii. Register issues in calling conventions iv. Encoding of machine instructions into instruction words (instruction formats, their purpose, RISC vs. CISC, etc.) 1. Note that I won t ask you to encode an instruction into machine code from rote memory b. Performance Analysis Basics i. Amdahl s Law and its implications 1. Make the common case fast 2. Optimizing the uncommon case buys very little unless the uncommon case is uncommonly slow 3. etc. ii. Bottlenecks and their implications iii. Basic Bayesian performance estimation formulae 1. For a given operation, if p is the fraction of time that something happens during the operation, C is its cost, and C is the cost when p doesn t happen, then the total cost of the operation is p*c + (1-p)*C. 2. Many formula s in the book can be derived using this principle

2 II. 3. Sometimes the principle is applied but p and C are divided up differently. iv. Trade-off analysis Cost, Power, Complexity, Performance c. Pipelining (Really important!) i. Why is pipelining good 1. How does it affect clock rate 2. How does it affect throughput 3. How does it affect latency 4. How does one evaluate trade-offs between these three things (latency is the tricky one) ii. Pipelining as a means to extract parallelism iii. Implications of pipelining 1. Data hazards a. RAW, WAR, WAW b. How are they handled in the basic 5-stage pipe 2. Structural hazards a. What are they? b. How can they be mitigated 3. Control hazards iv. Stall conditions in the 5-stage pipeline 1. Forwarding vs. stalling for RAW hazards 2. Branch prediction for control hazards v. Controlling the 5-stage pipeline 1. Scoreboard scheduling of the pipeline in decode 2. Handling variable latency instructions, in particular memory ops vi. Basic understanding how structure size and sequential operations lead to slower clock rates and increased power consumption. 1. E.g., large memories are slower 2. Many ports on a memory make it slower, especially write ports 3. Associative memories are slower than direct mapped memories, with fully associative structures being the slowest and most complex Exploiting ILP Dynamically at Run-time a. What is instruction-level parallelism b. Branch prediction strategies for control hazards i. Trade-offs in the number of bits per predictor ii. One-level vs. two-level predictors iii. Common two-level schemes 1. PAg, GAg, PAp, GAp 2. Effects of aliasing, both positive and negative 3. Performance trends outlined in the Yale N. Patt paper iv. Branch-target-buffer 1. Role of the BTB 2. Implications of aliasing

3 3. Jump-register instructions and the BTB v. Implications of when branches are predicted (Fetch vs. Decode) vi. Understanding of the Alpha way prediction scheme and how it allows them to predict branches in the decode stage with no penalty in the common case vii. Tournament branch prediction basics 1. Predictor predictor selects which predictor to use. 2. Each predictor is tuned to target different types of branch behavior, could even be standard two-level predictors c. Tomasulo s Machine i. Why is Out-of-Order (OoO) execution good? ii. Basic Tomasulo s machine structures 1. Reservation Stations 2. Common-data Bus 3. Rename table (integrated into the register file or otherwise) iii. Tomasulo s machine and data hazards (Really important) 1. How does Tomasulo s machine handle RAW hazards 2. How does Tomasulo s machine avoid WAW hazarads 3. How does Tomasulo s machine eliminate WAR hazards iv. Memory and out-of-order execution 1. Why is ordering memory operations a problem with Tomasulo s machine, while register-register operations are ok? 2. Load-store unit design a. How to keep load s and stores in order v. Speculation and Exceptions and Tomasulo s machine 1. Why does the base Tomasulo s machine not deliver precise exceptions 2. The relationship between precise exceptions and recovery from misspeculation 3. The role of the reorder buffer in preserving in-order precise exception semantics. 4. The future-file/shadow-file register value recovery scheme 5. Handling memory operations during speculation a. Suppressing writes until commit b. Store-buffering to allow load-store bypass c. What is data-speculation and why is it needed for performance? 6. Recovering at commit versus recovery at executed, or decode, or any other stage for that matter (note that recovering in certain stages might not make sense) a. Can the future/shadow-file scheme recover at any time before commit? vi. Superscalar Tomasulo s machine 1. What structures need to be altered to allow multiple instructions to be fetch/issue-to-reservation-stations

4 a. Which structures need extra ports, what kind of ports b. What additional long global wires are needed c. What special bypasses are needed (think rename bypass like things) 2. What structures are needed to allow multiple instructions to writeback in a given cycle a. Answer structure sizing questions above 3. What structures are needed/need to be modified to allow multiple instructions to commit in a cycle a. Answer structure sizing questions above vii. Complexity and delay in Tomasulo s machine 1. Why is the CDB a bottleneck in Tomasulo s machine d. Modern Superscalar OoO machine design the Alpha i. Basic pipeline stages of the basic pipeline 1. IF, REN, S, REG, EX*, M*, WB, COM 2. What is the role of each stage 3. What hardware is accessed in each stage 4. Why are the stages in this order? In particular, why is the register read stage after the scheduling stage? ii. Detailed Execution of the Machine 1. Operations of each stage a. In each stage, what is the sequence of operations performed for a non-superscalar design b. What are the data-dependences between these operations 2. Physical-register-based renaming 3. What are the modifications required when going to superscalar execution a. How many extra ports/wires per instruction are needed to make the machine superscalar b. What extra bypasses are needed when more than one instruction can pass through a stage in a given cycle i. Consider the rename stage in particular. c. What is the complexity of each stage in terms of the number of instructions per stage i. E.g., linear, quadratic (n-squared), factorial, etc. 4. Speculation a. How is speculation handled with physical-registerbased renaming. i. Shadow-file scheme ii array of rename tables b. Resolving speculation at execute instead of commit i. How is it done

5 III. ii. What extra hardware is needed c. Consider what can be speculated i. So far we have branch direction and target, load-store conflicts, cache hits, i-cache wayprediction (21264) and i-cache lineprediction (21264). ii. Consider what makes sense to speculate (i.e., could lead to performance improvement) 1. In particular, when must a prediction occur to be beneficial, (e.g., target prediction must occur in fetch to be useful in the 5-stage machine) iii. Consider the hardware required to detect misspeculation and recover Memory Subsystem a. The book is pretty good so I have less detail in this section, make sure you read the text. b. Caching basics i. Cache size, versus block size versus associativity ii. Trade-off between each cache parameter and clock speed iii. What is a blocking vs. non-blocking cache 1. You need to know the difference. 2. Details regarding non-blocking caches are not required iv. Latency versus bandwidth trade-offs in main memory v. Access size vs. cache line size. c. Cache write policy i. Write-back vs. write-through ii. Allocate on write vs. no-allocate on write iii. Allocate on write with partial writes to the line d. Eviction policies i. Optimal eviction policy ii. LRU iii. Random iv. Others? e. Cache misses i. 3C s, compulsory misses, capacity misses, conflict misses ii. How to vary cache parameters to address each of these iii. How way prediction can reduce miss-rate in pseudo-associative caches (recall that the does this for the i-cache) f. Techniques to reduce miss penalty i. Victim Caching ii. Multiple levels of cache iii. Critical Word First and Early Restart iv. Giving read misses priority over write-misses g. Parallelism and miss penalty

6 i. How does OoO affect miss penalty ii. The role of OoO in reducing effect of long latency ops.

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

Course on: Advanced Computer Architectures INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER Prof. Cristina Silvano Politecnico di Milano cristina.silvano@polimi.it Prof. Silvano, Politecnico di Milano