COSC 243. Computer Architecture 2. Lecture 13 Computer Architecture 2. COSC 243 (Computer Architecture)

COSC 243 1

Overview This Lecture Architectural topics CISC RISC Multi-core processors Source: lecture notes Next Lecture Operating systems 2

Moore s Law 3

CISC What is the best thing to do with all those transistors? Add extra instructions? Make the CPU do more (integrated cache, etc)? Pipelines? We call these Complex Instruction Set Computers 4

High Level Languages As the cost of a computer dropped the relative cost of software went up As computers became more ubiquitous the need to port software from one machine to another increased As the complexity of software went up the need to use high level languages increased Programs today are almost always written in high level languages As time went on languages became higher You could do more in the same number of lines of code 5

The Semantic Gap A semantic gap appeared. Programming languages are disconnected from CPU architecture This is part of the purpose of high level languages New instructions were added to the CPU, but: They were not being used by programmers Who wrote in high level languages They were not being used by the compilers It wasn t worthwhile re-writing the compiler for each release of a CPU The new instructions were being ignored We need a CPU optimized for high level language use 6

Some research Compiled higher level programs: Do a lot of branching and procedure calls Mostly operate on a small number of local variables In fact: Almost a third of the CPU's time is spent making procedure calls and branches Most functions have fewer than 6 local variables Most memory access is due to procedure calls Can we use these observations to speed up the CPU? 7

RISC Reduced Instruction Set Computers Three design principles: Large number of registers This reduces the number of memory accesses Careful design of the pipeline for conditional branches Better handling of if statements and procedure calls Simplified (reduced) instruction set Each instruction does less Fewer addressing modes Often just as many instructions as in a CISC CPU Reduced complexity does not mean reduced number of instructions 8

RISC Characteristics: One instruction per cycle (all instructions take the same time) This keeps the pipeline simple Register to register operations All memory access is via dedicate load and store instrucitons Simple addressing modes Simple instruction formats Fixed instruction length Aligned on machine word boundaries (for fast CPU load) 9

RISC Register Windows Most memory access is because of procedure calls: Local variables on the stack Function parameters on the stack Some RISC processors move the stack into the CPU as a special bank of registers This means that we don't need to spend time on memory access: Writing parameters onto the stack Accessing local variables on the stack 10

CISC vs. RISC RISC requires more program instructions than CISC RISC instructions are simplified Have fewer addressing modes Take less memory space to store CISC does more per instruction But the control unit is more complex (and so slower) The microcode is more complex (and so slower) The microcode is often a RISC program! 11

CISC vs. RISC CISC: Minimize instructions per program Increase cycles per instruction RISC: Minimise cycles per instruction Increase instructions per program time program time cycle cycles instruction instructions program 12

CISC vs. RISC: Who won? CISC: Intel architecture in PCs, Macs (now), servers RISC: ARM architecture in phones, tablets, almost everything else MIPS, SPARC, PowerPC in some unix systems and old Macs Hybrid systems: Modern Intel CPUs have a RISC core with a translation layer High performance RISC chips have adopted some CISC characteristics like more instructions and variable length instructions 13

Superpipelines We can make pipeline stages very simple by adding more pipeline stages If every pipeline stage is simple (short gate delay) then we can increase the clock speed Double number of pipeline stages, double the clock speed, more instructions complete per second But longer pipelines increase the likelihood of hazards, and the cost of mistakes in branch prediction 14

Superscalar Why use only one pipeline, lets have two. Then we can execute two instructions at once! This is called instruction-level parallelism. Five limitations to instruction-level parallelism True data dependency (write after read, WAR) Output dependency (write after write, WAW) Antidependency (read after write, RAW) Procedural dependency Conditional branches require a pipeline reload Resource conflicts Both pipelines require access to memory at the same time 15

Superscalar However, if there are no dependencies then the instructions need not be executed in the correct order This is known as out-of-order execution In-order issue with in-order completion Instructions must start and finish in the correct order In-order issue with out-of-order completion The CPU starts the instructions in order but the second one finished before the first one! Out-of-order issue with out-of-order completion The CPU does the next instruction before the current one! E.g. (TSX then TYA) does the order matter? 16

Superscalar A program is a linear sequence of instructions Instruction fetch with branch prediction Produces an instruction stream Stream examined for dependencies Instructions are re-ordered by their dependencies Instructions are executed based on dependencies on each other and the hardware resources Results are recorded or discarded Discarded in the case of speculative prediction 17

Superscalar Window of Execution Instruction Fetch and Branch Prediction Produces an Instruction Stream Instruction Execution Instruction Re-order And Commit Static Program Instruction Dispatch Instruction Issue 18

Hyperthreading (SMT) We can do more!!!! The CPU slows down when we access memory The pipeline slows down when we have dependencies Can we write programs that do more than one thing at a time but whose parts don t interact (much)? Yes! We can use threading Actually, the OS switches between programs too Perhaps we can build that into the CPU too 19

Hyperthreading (SMT) Imagine a superscalar architecture with 2 pipelines Each pipeline reads from a different part of memory Each pipeline has a separate set of registers If one pipeline becomes stalled the other keeps going Two programs are executed at the same time! This is called Simultaneous Multithreading (SMT) This is the approach of the Intel Hyperthreading CPUs Such as the Pentium 4 20

Heat! The heat dissipation in transistor is linear in the switching rate. The faster you switch the more heat you get The total amount of heat generated is linear in the number of transistors on the silicon die Both have been following Moore s law! But transistors have been getting smaller too, so they dissipate less heat each Overall a huge increase in heat: ~1 watt for early single chip CPUs > 150 watts for current top end CPUs 21

Multi-Core How can we reduce the heat? The obvious solution is to go slower How can we go slower and faster at the same time? Instead of having one CPU on the silicon die we put two. They share certain resources including: The buses The level 2 cache 22

Vectors What if you want to do the same operation over and over again? One way is to tell the CPU how many times to repeat the operation. (i.e. make a loop) Other way is to have a special CPU that performs the same instruction on many chunks of data at once the instruction decoding only occurs once. We call this Single Instruction Multiple Data (SIMD) 23

Classification of Architectures Single instruction, single data (SISD) Normal computer Single instruction, multiple data (SIMD) Intel SSE instructions / Cray / etc Graphics processors Multiple instruction, multiple data (MIMD) Multi-core 24

That s All Folks If you need help I m on email I m in room 248 The Tutors / Teaching Fellows can also help Good luck with the exam! 25