Giving credit where credit is due

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Giving credit where credit is due"

Transcription

1 CSCE 230J Computer Organization Processor Architecture VI: Wrap-Up Dr. Steve Goddard Giving credit where credit is due ost of slides for this lecture are based on slides created by Dr. Bryant, Carnegie ellon University. I have modified them and added new slides. 2 Page 1

2 Overview Wrap-Up of PIPE Design Performance analysis Fetch stage design Exceptional conditions odern High-Performance Processors Out-of-order execution 3 Performance etrics Clock rate easured in egahertz or Gigahertz Function of stage partitioning and circuit design Keep amount of work per stage small Rate at which instructions executed CPI: cycles per instruction On average, how many clock cycles does each instruction require? Function of pipeline design and benchmark programs E.g., how frequently are branches mispredicted? 4 Page 2

3 CPI for PIPE CPI 1.0 Fetch instruction each clock cycle Effectively process new instruction almost every cycle Although each individual instruction has latency of 5 cycles CPI > 1.0 Sometimes must stall or cancel branches Computing CPI C clock cycles I instructions executed to completion B bubbles injected (C = I + B) CPI = C/I = (I+B)/I = B/I Factor B/I represents average penalty due to bubbles 5 CPI for PIPE (Cont.) B/I = LP + P + RP LP: Penalty due to load/use hazard stalling Typical Values Fraction of instructions that are loads 0.25 Fraction of load instructions requiring stall 0.20 Number of bubbles injected each time 1 LP = 0.25 * 0.20 * 1 = 0.05 P: Penalty due to mispredicted branches Fraction of instructions that are cond. jumps 0.20 Fraction of cond. jumps mispredicted 0.40 Number of bubbles injected each time 2 P = 0.20 * 0.40 * 2 = 0.16 RP: Penalty due to ret instructions Fraction of instructions that are returns 0.02 Number of bubbles injected each time 3 RP = 0.02 * 3 = 0.06 Net effect of penalties = 0.27 CPI = 1.27 (Not bad!) 6 Page 3

4 Fetch Logic Revisited During Fetch Cycle 1. Select PC 2. Read bytes from instruction memory 3. Examine icode to determine instruction length 4. Increment PC D Instr valid icode ifun ra Split Byte 0 rb Align Instruction memory valc Bytes 1-5 Need valc Need regids valp PC increment Predict PC _icode _Bch _vala W_icode W_val Timing Steps 2 & 4 require significant amount of time F Select PC predpc 7 Standard Fetch Timing Select PC need_regids, need_valc em. Read Increment 1 clock cycle ust Perform Everything in Sequence Can t compute incremented PC until know how much to increment it by 8 Page 4

5 A Fast PC Increment Circuit incrpc High-order 29 bits UX 0 1 carry Low-order 3 bits Slow 29-bit incrementer 3-bit adder Fast High-order 29 bits need_regids 0 need_valc Low-order 3 bits PC 9 odified Fetch Timing Select PC em. Read need_regids, need_valc 3-bit add UX Incrementer Standard cycle 29-Bit Incrementer 1 clock cycle Acts as soon as PC selected Output not needed until final UX Works in parallel with memory read 10 Page 5

6 ore Realistic Fetch Logic Other PC Controls Fetch Control Instr. Length Byte 0 Bytes 1-5 Current Instruction Fetch Box Instruction Cache Integrated into instruction cache Fetches entire cache block (16 or 32 bytes) Selects current instruction from current block Works ahead to fetch next block As reaches end of current block At branch target Current Block Next Block 11 Exceptions Conditions under which pipeline cannot continue normal operation Causes Halt instruction Bad address for instruction or data Invalid instruction Pipeline control error Desired Action (Current) (Previous) (Previous) (Previous) Complete some instructions Either current or previous (depends on exception type) Discard others Call exception handler Like an unexpected procedure call 12 Page 6

7 Exception Examples Detect in Fetch Stage jmp $-1 # Invalid jump target.byte 0xFF # Invalid instruction code halt # Halt instruction Detect in emory Stage irmovl $100,%eax rmmovl %eax,0x10000(%eax) # invalid address 13 Exceptions in Pipeline Processor #1 # demo-exc1.ys irmovl $100,%eax rmmovl %eax,0x10000(%eax) # Invalid address nop.byte 0xFF # Invalid instruction code 0x000: irmovl $100,%eax F D E 0x006: rmmovl %eax,0x1000(%eax) F D E 0x00c: nop F D 0x00d:.byte 0xFF F Exception detected Desired Behavior rmmovl should cause exception 5 W E D Exception detected 14 Page 7

8 Exceptions in Pipeline Processor #2 # demo-exc2.ys 0x000: xorl %eax,%eax # Set condition codes 0x002: jne t # Not taken 0x007: irmovl $1,%eax 0x00d: irmovl $2,%edx 0x013: halt 0x014: t:.byte 0xFF # Target 0x000: xorl %eax,%eax 0x002: jne t 0x014: t:.byte 0xFF 0x???: (I m lost!) 0x007: irmovl $1,%eax Desired Behavior No exception should occur F D E F D F Exception detected 4 E D F 5 W E D F 6 E D 7 W E 8 W 9 W 15 aintaining Exception Ordering W exc icode vale val dste dst exc icode Bch vale vala dste dst E exc icode ifun valc vala valb dste dst srca srcb D exc icode ifun ra rb valc valp F predpc Add exception status field to pipeline registers Fetch stage sets to either AOK, ADR (when bad fetch address), or INS (illegal instruction) Decode & execute pass values through emory either passes through or sets to ADR Exception triggered only when instruction hits write back 16 Page 8

9 Side Effects in Pipeline Processor # demo-exc3.ys irmovl $100,%eax rmmovl %eax,0x10000(%eax) # invalid address addl %eax,%eax # Sets condition codes x000: irmovl $100,%eax F D E 0x006: rmmovl %eax,0x1000(%eax) F D E 0x00c: addl %eax,%eax F D 5 W E Exception detected Desired Behavior rmmovl should cause exception Condition code set No following instruction should have any effect 17 Avoiding Side Effects Presence of Exception Should Disable State Update When detect exception in memory stage Disable condition code setting in execute ust happen in same clock cycle When exception passes to write-back stage Disable memory write in memory stage Disable condition code setting in execute stage Implementation Hardwired into the design of the PIPE simulator You have no control over this 18 Page 9

10 Rest of Exception Handling Calling Exception Handler Push PC onto stack Either PC of faulting instruction or of next instruction Usually pass through pipeline along with exception status Jump to handler address Usually fixed address Defined as part of ISA Implementation Haven t tried it yet! 19 odern CPU Design Register Updates Retirement Unit Register File Prediction OK? Instruction Control Fetch Control Instruction Decode Address Instruction Instructions Cache Operations Integer/ Branch General Integer FP Add FP ult/div Load Store Functional Units Operation Results Addr. Addr. Data Data Execution Data Cache 20 Page 10

11 Instruction Control Retirement Unit Register File Instruction Control Address Fetch Control Instruction Instructions Cache Instruction Decode Operations Grabs Instruction Bytes From emory Based on Current PC + Predicted Targets for Predicted Branches Hardware dynamically guesses whether branches taken/not taken and (possibly) branch target Translates Instructions Into Operations Primitive steps required to perform instruction Typical instruction requires 1 3 operations Converts Register References Into Tags Abstract identifier linking destination of one operation with sources of later operations 21 Execution Unit Register Updates Prediction OK? Operations Integer/ Branch General Integer FP Add FP ult/div Load Store Functional Units Operation Results Addr. Addr. Data Data Execution Execution Data Cache ultiple functional units Each can operate in independently Operations performed as soon as operands available Not necessarily in program order Within limits of functional units Control logic Ensures behavior equivalent to sequential program execution 22 Page 11

12 CPU Capabilities of Pentium III ultiple Instructions Can Execute in Parallel 1 load 1 store 2 integer (one may be branch) 1 FP Addition 1 FP ultiplication or Division Some Instructions Take > 1 Cycle, but Can be Pipelined Instruction Latency Cycles/Issue Load / Store 3 1 Integer ultiply 4 1 Integer Divide Double/Single FP ultiply 5 2 Double/Single FP Add 3 1 Double/Single FP Divide PentiumPro Block Diagram P6 icroarchitecture PentiumPro Pentium II Pentium III icroprocessor Report 2/16/95 Page 12

13 PentiumPro Operation Translates instructions dynamically into Uops 118 bits wide Holds operation, two sources, and destination Executes Uops with Out of Order engine Uop executed when Operands available Functional unit available Execution controlled by Reservation Stations Keeps track of data dependencies between uops Allocates resources 25 PentiumPro Branch Prediction Critical to Performance cycle penalty for misprediction Branch Target Buffer 512 entries 4 bits of history Adaptive algorithm Can recognize repeated patterns, e.g., alternating taken not taken Handling BTB misses Detect in cycle 6 Predict taken for negative offset, not taken for positive Loops vs. conditionals 26 Page 13

14 Example Branch Prediction Branch History Encode information about prior history of branch instructions Predict whether or not branch will be taken NT NT NT T Yes! Yes? No? No! NT State achine T T T Each time branch taken, transition to right When not taken, transition to left Predict branch taken when in state Yes! or Yes? 27 Pentium 4 Block Diagram Intel Tech. Journal Q1, 2001 Next generation microarchitecture 28 Page 14

15 Pentium 4 Features Trace Cache L2 Cache IA32 Instrs. Instruct. Decoder uops Trace Cache Operations Replaces traditional instruction cache Caches instructions in decoded form Reduces required rate for instruction decoder Double-Pumped ALUs Simple instructions (add) run at 2X clock rate Very Deep Pipeline 20+ cycle branch penalty Enables very high clock rates Slower than Pentium III for a given clock rate 29 Processor Summary Design Technique Create uniform framework for all instructions Want to share hardware among instructions Connect standard logic blocks with bits of control logic Operation State held in memories and clocked registers Computation done by combinational logic Clocking of registers/memories sufficient to control overall behavior Enhancing Performance Pipelining increases throughput and improves resource utilization ust make sure maintains ISA behavior 30 Page 15

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of CS:APP Chapter 4 Computer Architecture Wrap-Up William J. Taffe Plymouth State University using the slides of Randal E. Bryant Carnegie Mellon University Overview Wrap-Up of PIPE Design Performance analysis

More information

Giving credit where credit is due

Giving credit where credit is due JP 284H oundations of Computer Systems Processor Architecture VI: rap-up Giving credit where credit is due ost of slides for this lecture are based on slides created by r. Bryant, Carnegie ellon University.

More information

l C-Programming l A real computer language l Data Representation l Everything goes down to bits and bytes l Machine representation Language

l C-Programming l A real computer language l Data Representation l Everything goes down to bits and bytes l Machine representation Language 198:211 Computer Architecture Topics: Processor Design Where are we now? C-Programming A real computer language Data Representation Everything goes down to bits and bytes Machine representation Language

More information

Y86 Instruction Set. Operations. Moves. Branches. %eax %ecx %edx %ebx %esi %edi %esp %ebp R E G I S T E R S. jne. jge. jmp 7 0. cmovne 2 4.

Y86 Instruction Set. Operations. Moves. Branches. %eax %ecx %edx %ebx %esi %edi %esp %ebp R E G I S T E R S. jne. jge. jmp 7 0. cmovne 2 4. Y86 Instruction Set Byte 0 1 2 3 4 5 halt 0 0 nop 1 0 rrmovl ra, rb 2 0 ra rb irmovl V, rb 3 0 8F rb V rmmovl ra, D(rB) 4 0 ra rb D mrmovl D(rB), ra 5 0 ra rb D R E G I S T E R S %eax %ecx %edx %ebx %esi

More information

More on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction

More information

Instruction Set Architecture

Instruction Set Architecture CS:APP Chapter 4 Computer Architecture Instruction Set Architecture Randal E. Bryant adapted by Jason Fritts http://csapp.cs.cmu.edu CS:APP2e Hardware Architecture - using Y86 ISA For learning aspects

More information

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER Course on: Advanced Computer Architectures INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER Prof. Cristina Silvano Politecnico di Milano cristina.silvano@polimi.it Prof. Silvano, Politecnico di Milano

More information

Computer Organization and Components

Computer Organization and Components Computer Organization and Components IS5, fall 25 Lecture : Pipelined Processors ssociate Professor, KTH Royal Institute of Technology ssistant Research ngineer, University of California, Berkeley Slides

More information

Computer Organization and Architecture

Computer Organization and Architecture Computer Organization and Architecture Chapter 14 Instruction Level Parallelism and Superscalar Processors What does Superscalar mean? Common instructions (arithmetic, load/store, conditional branch) can

More information

Advanced Pipelining Techniques

Advanced Pipelining Techniques Advanced Pipelining Techniques 1. Dynamic Scheduling 2. Loop Unrolling 3. Software Pipelining 4. Dynamic Branch Prediction Units 5. Register Renaming 6. Superscalar Processors 7. VLIW (Very Large Instruction

More information

CS4617 Computer Architecture

CS4617 Computer Architecture 1/39 CS4617 Computer Architecture Lecture 20: Pipelining Reference: Appendix C, Hennessy & Patterson Reference: Hamacher et al. Dr J Vaughan November 2013 2/39 5-stage pipeline Clock cycle Instr No 1 2

More information

CISC 360. Computer Architecture. Seth Morecraft Course Web Site:

CISC 360. Computer Architecture. Seth Morecraft Course Web Site: CISC 360 Computer Architecture Seth Morecraft (morecraf@udel.edu) Course Web Site: http://www.eecis.udel.edu/~morecraf/cisc360 Static & Dynamic Scheduling Scheduling: act of finding independent instructions

More information

CS:APP Chapter 4 Computer Architecture Instruction Set Architecture. CS:APP2e

CS:APP Chapter 4 Computer Architecture Instruction Set Architecture. CS:APP2e CS:APP Chapter 4 Computer Architecture Instruction Set Architecture CS:APP2e Instruction Set Architecture Assembly Language View Processor state Registers, memory, Instructions addl, pushl, ret, How instructions

More information

Reduced Instruction Set Computer vs Complex Instruction Set Computers

Reduced Instruction Set Computer vs Complex Instruction Set Computers RISC vs CISC Reduced Instruction Set Computer vs Complex Instruction Set Computers for a given benchmark the performance of a particular computer: P = 1 I C 1 S where P = time to execute I = number of

More information

CS352H: Computer Systems Architecture

CS352H: Computer Systems Architecture CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline - Hazards October 1, 2009 University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell Data Hazards in ALU Instructions

More information

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture

More information

Assembly Language: Overview! Jennifer Rexford!

Assembly Language: Overview! Jennifer Rexford! Assembly Language: Overview! Jennifer Rexford! 1 Goals of this Lecture! Help you learn:! The basics of computer architecture! The relationship between C and assembly language! IA-32 assembly language,

More information

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX Overview CISC Developments Over Twenty Years Classic CISC design: Digital VAX VAXÕs RISC successor: PRISM/Alpha IntelÕs ubiquitous 80x86 architecture Ð 8086 through the Pentium Pro (P6) RJS 2/3/97 Philosophy

More information

VLIW Processors. VLIW Processors

VLIW Processors. VLIW Processors 1 VLIW Processors VLIW ( very long instruction word ) processors instructions are scheduled by the compiler a fixed number of operations are formatted as one big instruction (called a bundle) usually LIW

More information

Chapter 4 Processor Architecture

Chapter 4 Processor Architecture Chapter 4 Processor Architecture Modern microprocessors are among the most complex systems ever created by humans. A single silicon chip, roughly the size of a fingernail, can contain a complete high-performance

More information

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers CS 4 Introduction to Compilers ndrew Myers Cornell University dministration Prelim tomorrow evening No class Wednesday P due in days Optional reading: Muchnick 7 Lecture : Instruction scheduling pr 0 Modern

More information

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) Chapter Seven emories: Review SRA: value is stored on a pair of inverting gates very fast but takes up more space than DRA (4 to transistors) DRA: value is stored as a charge on capacitor (must be refreshed)

More information

Basic Computer Organization

Basic Computer Organization SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc Basic Computer Organization Main parts of a computer system: Processor: Executes programs Main memory:

More information

A Brief Review of Processor Architecture. Why are Modern Processors so Complicated? Basic Structure

A Brief Review of Processor Architecture. Why are Modern Processors so Complicated? Basic Structure A Brief Review of Processor Architecture Why are Modern Processors so Complicated? Basic Structure CPU PC IR Regs ALU Memory Fetch PC -> Mem addr [addr] > IR PC ++ Decode Select regs Execute Perform op

More information

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches: Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):

More information

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming Krste Asanovic Electrical Engineering and Computer Sciences University of California at

More information

Recap & Perspective. Sequential Logic Circuits & Architecture Alternatives. Register Operation. Building a Register. Timing Diagrams.

Recap & Perspective. Sequential Logic Circuits & Architecture Alternatives. Register Operation. Building a Register. Timing Diagrams. Sequential Logic Circuits & Architecture Alternatives Recap & Perspective COMP 251 Computer Organization and Architecture Fall 2009 So Far: ALU Implementation Next: Register Implementation Register Operation

More information

MACHINE ARCHITECTURE & LANGUAGE

MACHINE ARCHITECTURE & LANGUAGE in the name of God the compassionate, the merciful notes on MACHINE ARCHITECTURE & LANGUAGE compiled by Jumong Chap. 9 Microprocessor Fundamentals A system designer should consider a microprocessor-based

More information

Pipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1

Pipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1 Pipeline Hazards Structure hazard Data hazard Pipeline hazard: the major hurdle A hazard is a condition that prevents an instruction in the pipe from executing its next scheduled pipe stage Taxonomy of

More information

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu. Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.tw Review Computers in mid 50 s Hardware was expensive

More information

#5. Show and the AND function can be constructed from two NAND gates.

#5. Show and the AND function can be constructed from two NAND gates. Study Questions for Digital Logic, Processors and Caches G22.0201 Fall 2009 ANSWERS Digital Logic Read: Sections 3.1 3.3 in the textbook. Handwritten digital lecture notes on the course web page. Study

More information

CSC 2405: Computer Systems II

CSC 2405: Computer Systems II CSC 2405: Computer Systems II Spring 2013 (TR 8:30-9:45 in G86) Mirela Damian http://www.csc.villanova.edu/~mdamian/csc2405/ Introductions Mirela Damian Room 167A in the Mendel Science Building mirela.damian@villanova.edu

More information

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern: Pipelining HW Q. Can a MIPS SW instruction executing in a simple 5-stage pipelined implementation have a data dependency hazard of any type resulting in a nop bubble? If so, show an example; if not, prove

More information

Intel Pentium 4 Processor on 90nm Technology

Intel Pentium 4 Processor on 90nm Technology Intel Pentium 4 Processor on 90nm Technology Ronak Singhal August 24, 2004 Hot Chips 16 1 1 Agenda Netburst Microarchitecture Review Microarchitecture Features Hyper-Threading Technology SSE3 Intel Extended

More information

PROBLEMS #20,R0,R1 #$3A,R2,R4

PROBLEMS #20,R0,R1 #$3A,R2,R4 506 CHAPTER 8 PIPELINING (Corrisponde al cap. 11 - Introduzione al pipelining) PROBLEMS 8.1 Consider the following sequence of instructions Mul And #20,R0,R1 #3,R2,R3 #$3A,R2,R4 R0,R2,R5 In all instructions,

More information

response (or execution) time -- the time between the start and the finish of a task throughput -- total amount of work done in a given time

response (or execution) time -- the time between the start and the finish of a task throughput -- total amount of work done in a given time Chapter 4: Assessing and Understanding Performance 1. Define response (execution) time. response (or execution) time -- the time between the start and the finish of a task 2. Define throughput. throughput

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2 Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of

More information

Pipelining Review and Its Limitations

Pipelining Review and Its Limitations Pipelining Review and Its Limitations Yuri Baida yuri.baida@gmail.com yuriy.v.baida@intel.com October 16, 2010 Moscow Institute of Physics and Technology Agenda Review Instruction set architecture Basic

More information

Application Note 195. ARM11 performance monitor unit. Document number: ARM DAI 195B Issued: 15th February, 2008 Copyright ARM Limited 2007

Application Note 195. ARM11 performance monitor unit. Document number: ARM DAI 195B Issued: 15th February, 2008 Copyright ARM Limited 2007 Application Note 195 ARM11 performance monitor unit Document number: ARM DAI 195B Issued: 15th February, 2008 Copyright ARM Limited 2007 Copyright 2007 ARM Limited. All rights reserved. Application Note

More information

How to improve (decrease) CPI

How to improve (decrease) CPI How to improve (decrease) CPI Recall: CPI = Ideal CPI + CPI contributed by stalls Ideal CPI =1 for single issue machine even with multiple execution units Ideal CPI will be less than 1 if we have several

More information

Let s put together a Manual Processor

Let s put together a Manual Processor Lecture 14 Let s put together a Manual Processor Hardware Lecture 14 Slide 1 The processor Inside every computer there is at least one processor which can take an instruction, some operands and produce

More information

The Operating System Level

The Operating System Level The Operating System Level Virtual Memory File systems Parallel processing Case studies Due 6/3: 2, 3, 18, 23 Like other levels we have studied, the OS level is built on top of the next lower layer. Like

More information

COMP303 Computer Architecture Lecture 16. Virtual Memory

COMP303 Computer Architecture Lecture 16. Virtual Memory COMP303 Computer Architecture Lecture 6 Virtual Memory What is virtual memory? Virtual Address Space Physical Address Space Disk storage Virtual memory => treat main memory as a cache for the disk Terminology:

More information

COMP303 Computer Architecture Lecture 15 Calculating and Improving Cache Performance

COMP303 Computer Architecture Lecture 15 Calculating and Improving Cache Performance COMP303 Computer Architecture Lecture 15 Calculating and Improving Cache Performance Cache Performance Measures Hit rate : fraction found in the cache So high that we usually talk about Miss rate = 1 -

More information

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit

More information

Computer organization

Computer organization Computer organization Computer design an application of digital logic design procedures Computer = processing unit + memory system Processing unit = control + datapath Control = finite state machine inputs

More information

Measuring Cache Performance

Measuring Cache Performance Measuring Cache Performance Components of CPU time Program execution cycles Includes cache hit time Memory stall cycles Mainly from cache misses With simplifying assumptions: Memory stall cycles = = Memory

More information

Introduction to Microprocessors

Introduction to Microprocessors Introduction to Microprocessors Yuri Baida yuri.baida@gmail.com yuriy.v.baida@intel.com October 2, 2010 Moscow Institute of Physics and Technology Agenda Background and History What is a microprocessor?

More information

Branch Prediction Techniques

Branch Prediction Techniques Course on: Advanced Computer Architectures Branch Prediction Techniques Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it Outline Dealing with Branches in the Processor Pipeline

More information

CSE 141L Computer Architecture Lab Fall 2003. Lecture 2

CSE 141L Computer Architecture Lab Fall 2003. Lecture 2 CSE 141L Computer Architecture Lab Fall 2003 Lecture 2 Pramod V. Argade CSE141L: Computer Architecture Lab Instructor: TA: Readers: Pramod V. Argade (p2argade@cs.ucsd.edu) Office Hour: Tue./Thu. 9:30-10:30

More information

The LC-3. University of Texas at Austin CS310H - Computer Organization Spring 2010 Don Fussell

The LC-3. University of Texas at Austin CS310H - Computer Organization Spring 2010 Don Fussell The LC-3 University of Texas at Austin CS310H - Computer Organization Spring 2010 Don Fussell Instruction Set Architecture ISA = All of the programmer-visible components and operations of the computer

More information

2

2 1 2 3 4 5 For Description of these Features see http://download.intel.com/products/processor/corei7/prod_brief.pdf The following Features Greatly affect Performance Monitoring The New Performance Monitoring

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Out-of-Order Memory Access Dynamic Scheduling Summary Out-of-order execution: a performance technique Feature I: Dynamic scheduling (io OoO) Performance piece: re-arrange

More information

Fetch. Decode. Execute. Memory. PC update

Fetch. Decode. Execute. Memory. PC update nwpc PC Nw PC valm Mmory Mm. control rad writ Data mmory data out rmmovl ra, D(rB) Excut Bch CC ALU A vale ALU Addr ALU B Data vala ALU fun. valb dste dstm srca srcb dste dstm srca srcb Ftch Dcod Excut

More information

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING 2013/2014 1 st Semester Sample Exam January 2014 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc.

More information

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the

More information

Assembly Language: Function Calls" Jennifer Rexford!

Assembly Language: Function Calls Jennifer Rexford! Assembly Language: Function Calls" Jennifer Rexford! 1 Goals of this Lecture" Function call problems:! Calling and returning! Passing parameters! Storing local variables! Handling registers without interference!

More information

CS107 Handout 12 Spring 2008 April 18, 2008 Computer Architecture: Take I

CS107 Handout 12 Spring 2008 April 18, 2008 Computer Architecture: Take I CS107 Handout 12 Spring 2008 April 18, 2008 Computer Architecture: Take I Computer architecture Handout written by Julie Zelenski and Nick Parlante A simplified picture with the major features of a computer

More information

A s we saw in Chapter 4, a CPU contains three main sections: the register section,

A s we saw in Chapter 4, a CPU contains three main sections: the register section, 6 CPU Design A s we saw in Chapter 4, a CPU contains three main sections: the register section, the arithmetic/logic unit (ALU), and the control unit. These sections work together to perform the sequences

More information

Design of Pipelined MIPS Processor. Sept. 24 & 26, 1997

Design of Pipelined MIPS Processor. Sept. 24 & 26, 1997 Design of Pipelined MIPS Processor Sept. 24 & 26, 1997 Topics Instruction processing Principles of pipelining Inserting pipe registers Data Hazards Control Hazards Exceptions MIPS architecture subset R-type

More information

The Von Neumann Model. University of Texas at Austin CS310H - Computer Organization Spring 2010 Don Fussell

The Von Neumann Model. University of Texas at Austin CS310H - Computer Organization Spring 2010 Don Fussell The Von Neumann Model University of Texas at Austin CS310H - Computer Organization Spring 2010 Don Fussell The Stored Program Computer 1943: ENIAC Presper Eckert and John Mauchly -- first general electronic

More information

UNIVERSITY OF CALIFORNIA, DAVIS Department of Electrical and Computer Engineering. EEC180B Lab 7: MISP Processor Design Spring 1995

UNIVERSITY OF CALIFORNIA, DAVIS Department of Electrical and Computer Engineering. EEC180B Lab 7: MISP Processor Design Spring 1995 UNIVERSITY OF CALIFORNIA, DAVIS Department of Electrical and Computer Engineering EEC180B Lab 7: MISP Processor Design Spring 1995 Objective: In this lab, you will complete the design of the MISP processor,

More information

Central Processing Unit (CPU)

Central Processing Unit (CPU) Central Processing Unit (CPU) CPU is the heart and brain It interprets and executes machine level instructions Controls data transfer from/to Main Memory (MM) and CPU Detects any errors In the following

More information

Computer Architecture

Computer Architecture Computer Architecture Having studied numbers, combinational and sequential logic, and assembly language programming, we begin the study of the overall computer system. The term computer architecture is

More information

CHAPTER 7: The CPU and Memory

CHAPTER 7: The CPU and Memory CHAPTER 7: The CPU and Memory The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides

More information

CHAPTER 4 MARIE: An Introduction to a Simple Computer

CHAPTER 4 MARIE: An Introduction to a Simple Computer CHAPTER 4 MARIE: An Introduction to a Simple Computer 4.1 Introduction 195 4.2 CPU Basics and Organization 195 4.2.1 The Registers 196 4.2.2 The ALU 197 4.2.3 The Control Unit 197 4.3 The Bus 197 4.4 Clocks

More information

(Refer Slide Time: 00:01:16 min)

(Refer Slide Time: 00:01:16 min) Digital Computer Organization Prof. P. K. Biswas Department of Electronic & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture No. # 04 CPU Design: Tirning & Control

More information

CPU- Internal Structure

CPU- Internal Structure ESD-1 Elettronica dei Sistemi Digitali 1 CPU- Internal Structure Lesson 12 CPU Structure&Function Instruction Sets Addressing Modes Read Stallings s chapters: 11, 9, 10 esd-1-9:10:11-2002 1 esd-1-9:10:11-2002

More information

CSE 30321 Computer Architecture I Fall 2009 Final Exam December 18, 2009

CSE 30321 Computer Architecture I Fall 2009 Final Exam December 18, 2009 CSE 30321 Computer Architecture I Fall 2009 Final Exam December 18, 2009 Test Guidelines: 1. Place your name on EACH page of the test in the space provided. 2. every question in the space provided. If

More information

Lecture 11: Memory Hierarchy Design. CPU-Memory Performance Gap

Lecture 11: Memory Hierarchy Design. CPU-Memory Performance Gap Lecture 11: Memory Hierarchy Design Kunle Olukotun Gates 302 kunle@ogun.stanford.edu http://www-leland.stanford.edu/class/ee282h/ 1 CPU-Memory Performance Gap 2 The Memory Bottleneck Typical CPU clock

More information

ARM Cortex-A8 Processor

ARM Cortex-A8 Processor ARM Cortex-A8 Processor High Performances And Low Power for Portable Applications Architectures for Multimedia Systems Prof. Cristina Silvano Gianfranco Longi Matr. 712351 ARM Partners 1 ARM Powered Products

More information

The University of Nottingham

The University of Nottingham The University of Nottingham School of Computer Science A Level 1 Module, Autumn Semester 2007-2008 Computer Systems Architecture (G51CSA) Time Allowed: TWO Hours Candidates must NOT start writing their

More information

Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) Instruction Set Architecture (ISA) * Instruction set architecture of a machine fills the semantic gap between the user and the machine. * ISA serves as the starting point for the design of a new machine

More information

Performance monitoring with Intel Architecture

Performance monitoring with Intel Architecture Performance monitoring with Intel Architecture CSCE 351: Operating System Kernels Lecture 5.2 Why performance monitoring? Fine-tune software Book-keeping Locating bottlenecks Explore potential problems

More information

LSN 2 Computer Processors

LSN 2 Computer Processors LSN 2 Computer Processors Department of Engineering Technology LSN 2 Computer Processors Microprocessors Design Instruction set Processor organization Processor performance Bandwidth Clock speed LSN 2

More information

CPU Performance Equation

CPU Performance Equation CPU Performance Equation C T I T ime for task = C T I =Average # Cycles per instruction =Time per cycle =Instructions per task Pipelining e.g. 3-5 pipeline steps (ARM, SA, R3000) Attempt to get C down

More information

CSE 141 Midterm Exam

CSE 141 Midterm Exam CSE 141 Midterm Exam 2011 Winter Professor Steven Swanson 1. Please write your name at the top of each page 2. This is a close book, closed notes exam. No outside material may be used. 3. You may use a

More information

Architecture and Programming of x86 Processors

Architecture and Programming of x86 Processors Brno University of Technology Architecture and Programming of x86 Processors Microprocessor Techniques and Embedded Systems Lecture 12 Dr. Tomas Fryza December 2012 Contents A little bit of one-core Intel

More information

In the Beginning... 1964 -- The first ISA appears on the IBM System 360 In the good old days

In the Beginning... 1964 -- The first ISA appears on the IBM System 360 In the good old days RISC vs CISC 66 In the Beginning... 1964 -- The first ISA appears on the IBM System 360 In the good old days Initially, the focus was on usability by humans. Lots of user-friendly instructions (remember

More information

UNIT III CONTROL UNIT DESIGN

UNIT III CONTROL UNIT DESIGN UNIT III CONTROL UNIT DESIGN INTRODUCTION CONTROL TRANSFER FETCH CYCLE INSTRUCTION INTERPRETATION AND EXECUTION HARDWIRED CONTROL MICROPROGRAMMED CONTROL Slides Courtesy of Carl Hamacher, Computer Organization,

More information

WAR: Write After Read

WAR: Write After Read WAR: Write After Read write-after-read (WAR) = artificial (name) dependence add R1, R2, R3 sub R2, R4, R1 or R1, R6, R3 problem: add could use wrong value for R2 can t happen in vanilla pipeline (reads

More information

EE361: Digital Computer Organization Course Syllabus

EE361: Digital Computer Organization Course Syllabus EE361: Digital Computer Organization Course Syllabus Dr. Mohammad H. Awedh Spring 2014 Course Objectives Simply, a computer is a set of components (Processor, Memory and Storage, Input/Output Devices)

More information

Lecture: Pipelining Extensions. Topics: control hazards, multi-cycle instructions, pipelining equations

Lecture: Pipelining Extensions. Topics: control hazards, multi-cycle instructions, pipelining equations Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations 1 Problem 6 Show the instruction occupying each stage in each cycle (with bypassing) if I1 is R1+R2

More information

Execution Cycle. Pipelining. IF and ID Stages. Simple MIPS Instruction Formats

Execution Cycle. Pipelining. IF and ID Stages. Simple MIPS Instruction Formats Execution Cycle Pipelining CSE 410, Spring 2005 Computer Systems http://www.cs.washington.edu/410 1. Instruction Fetch 2. Instruction Decode 3. Execute 4. Memory 5. Write Back IF and ID Stages 1. Instruction

More information

This Unit: Code Scheduling. CIS 371 Computer Organization and Design. Pipelining Review. Readings. ! Pipelining and superscalar review

This Unit: Code Scheduling. CIS 371 Computer Organization and Design. Pipelining Review. Readings. ! Pipelining and superscalar review This Unit: Code Scheduling CIS 371 Computer Organization and Design App App App System software Mem CPU I/O! Pipelining and superscalar review! Code scheduling! To reduce pipeline stalls! To increase ILP

More information

Virtual Memory & Memory Management

Virtual Memory & Memory Management CS 571 Operating Systems Virtual Memory & Memory Management Angelos Stavrou, George Mason University Memory Management 2 Logical and Physical Address Spaces Contiguous Allocation Paging Segmentation Virtual

More information

The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy

The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy The Quest for Speed - Memory Cache Memory CSE 4, Spring 25 Computer Systems http://www.cs.washington.edu/4 If all memory accesses (IF/lw/sw) accessed main memory, programs would run 20 times slower And

More information

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 550):

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 550): Review From 550 The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 550): Motivation for The Memory Hierarchy: CPU/Memory Performance Gap The Principle Of Locality Cache Basics:

More information

Computer Organization and Architecture

Computer Organization and Architecture Computer Organization and Architecture Chapter 11 Instruction Sets: Addressing Modes and Formats Instruction Set Design One goal of instruction set design is to minimize instruction length Another goal

More information

Feature of 8086 Microprocessor

Feature of 8086 Microprocessor 8086 Microprocessor Introduction 8086 is the first 16 bit microprocessor which has 40 pin IC and operate on 5volt power supply. which has twenty address limes and works on two modes minimum mode and maximum.

More information

Systems I: Computer Organization and Architecture

Systems I: Computer Organization and Architecture Systems I: Computer Organization and Architecture Lecture : Microprogrammed Control Microprogramming The control unit is responsible for initiating the sequence of microoperations that comprise instructions.

More information

Instruction Set Architectures

Instruction Set Architectures Instruction Set Architectures Early trend was to add more and more instructions to new CPUs to do elaborate operations (CISC) VAX architecture had an instruction to multiply polynomials! RISC philosophy

More information

Pentium vs. Power PC Computer Architecture and PCI Bus Interface

Pentium vs. Power PC Computer Architecture and PCI Bus Interface Pentium vs. Power PC Computer Architecture and PCI Bus Interface CSE 3322 1 Pentium vs. Power PC Computer Architecture and PCI Bus Interface Nowadays, there are two major types of microprocessors in the

More information

Chapter 6 The Memory System. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Chapter 6 The Memory System. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Chapter 6 The Memory System Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Basic Concepts Semiconductor Random Access Memories Read Only Memories Speed,

More information

Computer Architecture TDTS10

Computer Architecture TDTS10 why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers

More information

Instruction Set Design

Instruction Set Design Instruction Set Design Instruction Set Architecture: to what purpose? ISA provides the level of abstraction between the software and the hardware One of the most important abstraction in CS It s narrow,

More information

Putting it all together: Intel Nehalem. http://www.realworldtech.com/page.cfm?articleid=rwt040208182719

Putting it all together: Intel Nehalem. http://www.realworldtech.com/page.cfm?articleid=rwt040208182719 Putting it all together: Intel Nehalem http://www.realworldtech.com/page.cfm?articleid=rwt040208182719 Intel Nehalem Review entire term by looking at most recent microprocessor from Intel Nehalem is code

More information

Influence of Technology and Software on Instruction Sets: Up to the dawn of IBM 360

Influence of Technology and Software on Instruction Sets: Up to the dawn of IBM 360 1 Influence of Technology and Software on Instruction Sets: Up to the dawn of IBM 360 Computer Science and Artificial Intelligence Laboratory M.I.T. Based on the material prepared by and Krste Asanovic

More information