Chapter 12 CPU Structure and Function

Similar documents
CHAPTER 6 TASK MANAGEMENT

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Intel 8086 architecture

Central Processing Unit (CPU)

LSN 2 Computer Processors

MICROPROCESSOR BCA IV Sem MULTIPLE CHOICE QUESTIONS

Computer Organization and Architecture

Microprocessor & Assembly Language

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

MICROPROCESSOR. Exclusive for IACE Students iacehyd.blogspot.in Ph: /422 Page 1

Chapter 1 Computer System Overview

Instruction Set Architecture

MICROPROCESSOR AND MICROCOMPUTER BASICS

8086 Microprocessor (cont..)

Chapter 2 Logic Gates and Introduction to Computer Architecture

CPU Organization and Assembly Language

Faculty of Engineering Student Number:

PART B QUESTIONS AND ANSWERS UNIT I

An Introduction to the ARM 7 Architecture

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z)

CHAPTER 7: The CPU and Memory

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

Central Processing Unit

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX


Exception and Interrupt Handling in ARM

CHAPTER 4 MARIE: An Introduction to a Simple Computer

CSC 2405: Computer Systems II

PROBLEMS. which was discussed in Section

How It All Works. Other M68000 Updates. Basic Control Signals. Basic Control Signals

Instruction Set Architecture (ISA)

TIMING DIAGRAM O 8085

MACHINE ARCHITECTURE & LANGUAGE

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

5.14. EXCEPTION AND INTERRUPT REFERENCE

COS 318: Operating Systems

Giving credit where credit is due

İSTANBUL AYDIN UNIVERSITY

Pentium vs. Power PC Computer Architecture and PCI Bus Interface

Traditional IBM Mainframe Operating Principles

1. Computer System Structure and Components

(Refer Slide Time: 00:01:16 min)

Z80 Microprocessors Z80 CPU. User Manual UM Copyright 2014 Zilog, Inc. All rights reserved.

1 The Java Virtual Machine

An Overview of Stack Architecture and the PSC 1000 Microprocessor

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

Microcontroller Basics A microcontroller is a small, low-cost computer-on-a-chip which usually includes:

Instruction Set Architecture (ISA) Design. Classification Categories

Systems Design & Programming Data Movement Instructions. Intel Assembly

8085 INSTRUCTION SET

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Computer organization

Exceptions in MIPS. know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 20: Stack Frames 7 March 08

Computer Architecture TDTS10

Computer Systems Design and Architecture by V. Heuring and H. Jordan

Phoenix Technologies Ltd.

Hardware Assisted Virtualization

Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013

Introduction. What is an Operating System?

Intel 64 and IA-32 Architectures Software Developer s Manual

Outline. Lecture 3. Basics. Logical vs. physical memory physical memory. x86 byte ordering

A s we saw in Chapter 4, a CPU contains three main sections: the register section,

Computer Architectures

Computer Organization and Components

picojava TM : A Hardware Implementation of the Java Virtual Machine

Embedded x86 Programming: Protected Mode

Chapter 2 Topics. 2.1 Classification of Computers & Instructions 2.2 Classes of Instruction Sets 2.3 Informal Description of Simple RISC Computer, SRC

Z80 Family. CPU User Manual

A3 Computer Architecture

StrongARM** SA-110 Microprocessor Instruction Timing

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.

Chapter 5, The Instruction Set Architecture Level

ASSEMBLY PROGRAMMING ON A VIRTUAL COMPUTER

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

Processes and Non-Preemptive Scheduling. Otto J. Anshus

PROGRAMMING CONCEPTS AND EMBEDDED PROGRAMMING IN C, C++ and JAVA: Lesson-4: Data Structures: Stacks

Chapter 5 Instructor's Manual

The x86 PC: Assembly Language, Design, and Interfacing 5 th Edition

CS352H: Computer Systems Architecture

CPU Performance Equation

find model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1

Central Processing Unit Simulation Version v2.5 (July 2005) Charles André University Nice-Sophia Antipolis

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Keil Debugger Tutorial

Operating System Overview. Otto J. Anshus

Chapter 1. Bootstrap. Hardware

l C-Programming l A real computer language l Data Representation l Everything goes down to bits and bytes l Machine representation Language

Chapter 4 Lecture 5 The Microarchitecture Level Integer JAVA Virtual Machine

Application Note 195. ARM11 performance monitor unit. Document number: ARM DAI 195B Issued: 15th February, 2008 Copyright ARM Limited 2007

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Question Bank Subject Name: EC Microprocessor & Microcontroller Year/Sem : II/IV

Operating Systems. Lecture 03. February 11, 2013

Buffer Overflows. Security 2011

CS61: Systems Programing and Machine Organization

Understanding a Simple Operating System

Transcription:

Chapter 12 CPU Structure and Function

Contents Processor organization Register organization Instruction cycle Instruction pipelining Pentium processor PowerPC processor

12.1 Processor Organization Requirements on CPU Fetch instructions Interpret instructions Fetch data Process data Write data CPU consists of ALU Control unit Registers Internal bus

CPU With Systems Bus

CPU Internal Structure

12.2 Register Organization Design issues Completely GPRs vs specialized registers Specialized registers for particular operands + only BX, SI, and DI used for storing offset address in 80x86 + saving bits to represent them Specialization limits programmer s flexibility Number of registers For CISC, between 8 and 32 regarded as optimum + Fewer registers result in more memory references + More registers do not noticeably reduce memory references RISC uses hundreds of registers Register length Address register must be long enough to hold the target address Data register must be long enough to hold values of most data types + Some machine allow consecutive registers to hold double-length values

User Visible Registers GPR Data register Address register Segment pointers Index registers Stack pointer Condition codes(flags) Set according to the result of operations Used for checking certain condition Can be read (implicitly) by programs e.g. Jump if zero Can not (usually) be set by programs

Control & Status Registers Program Counter Updated after each instruction fetch Updated when branch instruction is met Instruction Register Memory Address Register Connected directly to address bus Memory Buffer Register Connected directly to data bus Program Status Word Sign, zero, carry, equal, overflow, interrupt enable/disable, supervisor mode Others Pointer to PCB (Process Control Block), Interrupt vector register Stack-related registers, Page table pointer

Supervisor Mode Intel microprocessor has 4 modes Ring zero Kernel functions Ring one Operating system functions Ring three User programs Ring two May be used for DBMS

Example Register Organizations Motorola 8 data registers MC68000 (Not including purly internal regs) Used primarily for data manipulation + 8-, 16-, and 32-bit operations are possible Also used as index registers 9 address registers 32-bit wide Includes two stack pointers + One for user and one for system PC and status register

Example Register Organizations Intel 8086 (Every register is special purpose) 4 16-bit data registers (can be used as general in some instructions) AX, BX, CX, DX 4 pointer and index registers SP, BP, SI, DI 4 segment registers CS, DS, SS, ES Instruction Pointer and flags

Example Register Organizations

12.3 Instruction Cycle Subcycles of instruction cycle Fetch Execute Interrupt Indirect(Newly added) Indirect cycle Indirect addressing requires additional memory access Can be thought of as additional instruction subcycle

Instruction Cycle State Diagram

Data Flow Fetch cycle PC contains address of next instruction Address moved to MAR Address placed on address bus Control unit requests memory read Result placed on data bus, copied to MBR, then to IR Meanwhile PC incremented by 1

Data Flow, Fetch Cycle

Data Flow Indirect cycle IR is examined If indirect addressing, indirect cycle is performed Right most N bits of MBR transferred to MAR Control unit requests memory read Result (address of operand) moved to MBR

Data Flow, Indirect Cycle

Data Flow Execute cycle May take many forms depending on instructions May include Memory read/write Input/Output Register transfers ALU operations

Data Flow Interrupt cycle Current PC saved to allow resumption after interrupt Contents of PC copied to MBR Special memory location (e.g. stack pointer) loaded to MAR MBR written to memory PC loaded with address of interrupt handling routine Next instruction (first of interrupt handler) can be fetched

Data Flow, Interrupt Cycle

12.4 Instruction Pipelining Pipelining strategy Similar to an assembly line in automobile factory Instruction has a number of stages Stages can be executed simultaneously Simple two-stage pipelining Fetch and execute stages If two stages were of equal duration, instruction cycle time would be halved But things are not that easy Execution time is longer than fetch time + Fetch stage may have to wait Conditional branch makes the next instruction unknown + Fetch stage wait or guess the branch

Two Stage Instruction Pipeline

Instruction Pipelining More stages mean further speedup Fetch Instruction(FI) Decode Instruction(DI) Calculate Operands (CO) Fetch Operands(FO) Execute Instructions(EI) Write Operand(WO) Characteristics(Equal duration assumed) Reduced execution time for 9 inst. from 54 to 14 Some instructions may not go through all 6 stages LOAD does not need WO stage Some stages may not be performed in parallel FI, FO, and WO stages involve a memory access

Timing Diagram for Instruction Pipeline Operation

Instruction Pipelining Factors that limit performance enhancement Stages may not be of equal duration Conditional branch instruction Invalidate several instruction fetches Interrupt Data dependency CO stage may depend on the contents of a register that could be altered by a previous instruction that is still in pipeline System need to contain logic to solve this conflict

Effect of a Conditional Branch

FI Fetch Instruction DI Decode Instruction CO Calculate Operands Yes Unconditional Branch? No FO Fetch Operands EI Execute Instruction Update PC WO Write Operands Empty Pipe Yes Branch or Inter -rupt? No Figure 12.12 Six-Stage Instruction Pipeline

FI DI CO FO EI WO FI DI CO FO EI WO 1 I1 1 I1 2 I2 I1 2 I2 I1 3 I3 I2 I1 3 I3 I2 I1 4 I4 I3 I2 I1 4 I4 I3 I2 I1 5 I5 I4 I3 I2 I1 5 I5 I4 I3 I2 I1 6 I6 I5 I4 I3 I2 I1 6 I6 I5 I4 I3 I2 I1 Time 7 I7 I6 I5 I4 I3 I2 7 I7 I6 I5 I4 I3 I2 8 I8 I7 I6 I5 I4 I3 8 I15 I3 9 I9 I8 I7 I6 I5 I4 9 I16 I15 10 I9 I8 I7 I6 I5 10 I16 I15 11 I9 I8 I7 I6 11 I16 I15 12 I9 I8 I7 12 I16 I15 13 I9 I8 13 I16 I15 14 I9 14 I16 (a) No branches (b) With conditional branch Figure 12.13 An Alternative Pipeline Depiction

Pipeline Performance Measures of performance Cycle time can be determined as τ = max[τ i ] + d = τ m + d 1 <= i <= k where τ m = maximum stage delay k = number of stages in the instruction pipeline d = time delay of a latch, needed to advance signals and data from one stage to the next We can ignore d since τ m >> d Total time T k to execute n instructions is T k = [k + (n - 1)]τ Thus speedup factor is defined as S k = T 1 /T k = nkτ /[k +(n - 1)]τ = nk/ [k +(n - 1)]

Speedup Factors with Pipelining

Speedup Factors with Pipelining

Dealing with Branches Approaches for dealing with branches Multiple streams Prefetch branch target Loop buffer Branch prediction Delayed branch Multiple streams Have two pipelines Prefetch each branch into a separate pipeline Use appropriate pipeline Problems There may be contention delays for accessing data Additional branch instruction needs an additional stream

Dealing with Branches Prefetch branch target Target of branch is prefetched in addition to the instruction following branch Keep target until branch is executed Used in IBM 360/91 Loop buffer Contains n most recently fetched instructions, in sequence Whenever a branch is to be taken, buffer is checked Well suited to dealing with loops If loop buffer is large enough to contain all the instructions in a loop, we need to fetch them only once Used in CDC and CRAY-1

Loop Buffer Diagram

Dealing with Branches Branch prediction Predict never taken Predict always taken Predict by opcode Taken/not taken switch Branch history table Predict never taken Assume that jump will not happen Always fetch next instruction Used in MC68020 & VAX 11/780 VAX will not prefetch the instruction after branch if a page fault would result

Dealing with Branches Predict always taken Assume that jump will happen Always fetch target instruction Studies show that conditional branches are taken more than 50% But prefetching the branch target is more likely to cause a page fault Predict by opcode Some instructions are more likely to result in a jump than others JNZ of 80x86 Success rates of >75% are reported

Dealing with Branches Taken/Not taken switch Based on previous history History bits are associated with each conditional branch instruction Single bit switch Record whether the last execution resulted in a branch or not Not good for the nested loop case Two bit switch Two consecutive wrong predictions change the prediction Drawback If the decision is to take the branch, target instruction cannot be fetched until the target address is decoded

Read next conditional branch instr Read next conditional branch instr Predict taken Predict not taken Yes Branch taken? No Branch taken? No Yes Read next conditional branch instr Read next conditional branch instr Predict taken Predict not taken Yes Branch taken? No No Branch taken? Yes Figure 12.16 Branch Prediction Flow Chart

Branch Prediction State Diagram

Dealing with Branches Branch history table Each entry consists of the address of a branch instruction, history bits, and information about the target instruction Used in AMD29000 microprocessor

Dealing with Branches

Dealing with Branches

Dealing with Branches Delayed Branch Do not take jump until you have to Rearrange instructions To improve pipeline performance, rearrange instructions so that branch instructions occur later than actually desired

Intel 80486 Pipelining Fetch From cache or external memory Put in one of two 16-byte prefetch buffers Fill buffer with new data as soon as old data consumed Average 5 instructions fetched per 16-byte load Independent of other stages to keep buffers full Decode stage 1 Opcode & address-mode info is decoded Required information is in at most first 3 bytes of instruction Can direct D2 stage to get rest of instruction Decode stage 2 Expand opcode into control signals Computation of complex address modes Execute ALU operations, cache access, register update Write back Update registers & flags Results sent to cache & bus interface write buffers

Pentium Instruction Formats

12.5 Pentium Processor Register organization General : EAX ~ EDX, ESP, EBP, ESI, and EDI Segment : CS, SS, DS, ES, FS, and GS Flags(EFLAGS) : condition codes and mode bits Instruction pointer Registers for floating-point unit Numeric + Register for holding 80-bit floating-point number Control Status + 16-bit register reflecting the state of floating-point unit Tag word + 2 bits associated with each numeric register + Represents valid, zero, special(nan, infinity..), and empty

Pentium 4 Registers

Pentium 4 Registers

EFLAGS Register Condition codes and control bits Carry, parity, auxiliary, zero, sign, and overflow Trap flag(tf) Causes an interrupt after each instruction execution Interrupt enable flag(if) Direction flag(df) I/O privilege flag(iopl) Causes CPU to generate an exception on all accesses to I/O devices Resume flag(rf) Used for debugging Alignment check(ac) Identification flag(id) Provides information about vendor, family, and model

EFLAGS Register

Control Registers 4 32-bit control registers(cr0 ~ CR4) CR0 Protection enable(pe) Monitor coprocessor(mp) Emulation(EM) + Set when CPU does not have floating-point unit Task switched(ts) Extension type(et) Numeric error(ne) Write protect(wp) Alignment mask(am) Not write through(nw) + Selects mode of operation of data cache Cache disable(cd) Paging(PG)

Control Registers 4 32-bit control registers(cr0 ~ CR4) CR2 CR3 CR4 Holds 32-bit linear address of the last page accessed before a page fault Leftmost 20 bits for the 20 most significant bits of the base address of the page directory Additional control bits

Control Registers

MMX Register Mapping MMX uses 64 bit data types Each instruction use 3 bit register address fields 8 MMX registers No MMX specific registers Aliasing to lower 64 bits of existing 8 floating point registers

Mapping of MMX Registers to FP Registers

Pentium Interrupt Processing Interrupts Maskable : Received on INTR pin Nonmaskable : Received on NMI pin Exceptions Processor detected Programmed : Instructions that generate exception Interrupt vector table Each interrupt type assigned a number Index into the interrupt vector table 256 * 32 bit interrupt vectors

Pentium Interrupts and Exceptions

Pentium Interrupt Processing 5 priority classes Class 1 (1) Traps on the previous instruction Class 2 (2, 32-255) External interrupts Class 3 (3, 14) Faults from fetching next instruction Class 4 (6, 7) Faults from decoding next instruction Class 5 (0, 4, 5, 8, 10-14, 16, 17) Faults on executing an instruction

Pentium Interrupt Processing Interrupt handling If the transfer involves a change of privilege level, SS and ESP are pushed onto the stack EFLAGS is pushed IF and TF flags are cleared CS and EIP are pushed If error code is accompanied, it is pushed CS and EIP of interrupt service routine are fetched to be executed To return from interrupt, IRET instruction is executed

12.6 PowerPC Processor Register organization Fixed-point unit General : 32 64-bit GPRs Exception register(xer) : Used to report exceptions Floating-point unit General : 32 64-bit GPRs Floating-point status and control register(fpscr) + Used to control the operation and to record status Branch processing unit Condition register : 8 4-bit condition code fields Link register + Used for indirect addressing of the target address Count register + Used to control an iteration loop

PowerPC User Visible Registers

PowerPC Interrupt Processing Types of interrupts Machine state registers Interrupt handling Place the address of next instruction in SRR0 MSR is copied into SRR1 MSR is set according to interrupt type Control is transferred to interrupt handler To return from interrupt, rfi instruction is executed

PowerPC Interrupt Table

PowerPC Interrupt Table(Cont d)

PowerPC MSR