Engineering 9859 CoE Fundamentals Computer Architecture

Similar documents
Instruction Set Architecture (ISA)

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Instruction Set Design

Instruction Set Architecture

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

Reduced Instruction Set Computer (RISC)

Computer Organization and Architecture

İSTANBUL AYDIN UNIVERSITY

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

Central Processing Unit (CPU)

Property of ISA vs. Uarch?

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

on an system with an infinite number of processors. Calculate the speedup of

Intel 8086 architecture

CPU Organization and Assembly Language

LSN 2 Computer Processors

Divide: Paper & Pencil. Computer Architecture ALU Design : Division and Floating Point. Divide algorithm. DIVIDE HARDWARE Version 1

CHAPTER 7: The CPU and Memory

MACHINE ARCHITECTURE & LANGUAGE

Instruction Set Architecture (ISA) Design. Classification Categories

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

CHAPTER 4 MARIE: An Introduction to a Simple Computer

Chapter 2 Topics. 2.1 Classification of Computers & Instructions 2.2 Classes of Instruction Sets 2.3 Informal Description of Simple RISC Computer, SRC

CSE 141 Introduction to Computer Architecture Summer Session I, Lecture 1 Introduction. Pramod V. Argade June 27, 2005

Pipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1

Week 1 out-of-class notes, discussions and sample problems

COMP 303 MIPS Processor Design Project 4: MIPS Processor Due Date: 11 December :59

Computer Architectures

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Microprocessor and Microcontroller Architecture

Basic Computer Organization

Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013

EE361: Digital Computer Organization Course Syllabus

Computer Organization. and Instruction Execution. August 22

Stack machines The MIPS assembly language A simple source language Stack-machine implementation of the simple language Readings:

In the Beginning The first ISA appears on the IBM System 360 In the good old days

Processor Architectures

Chapter 2 Logic Gates and Introduction to Computer Architecture

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

Let s put together a Manual Processor

Microprocessor & Assembly Language


ARM Microprocessor and ARM-Based Microcontrollers

EC 362 Problem Set #2

Pentium vs. Power PC Computer Architecture and PCI Bus Interface

1 Classical Universal Computer 3

Chapter 5 Instructor's Manual

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

picojava TM : A Hardware Implementation of the Java Virtual Machine

Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s)

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-17: Memory organisation, and types of memory

Performance evaluation

ARM Architecture. ARM history. Why ARM? ARM Ltd developed by Acorn computers. Computer Organization and Assembly Languages Yung-Yu Chuang

MICROPROCESSOR AND MICROCOMPUTER BASICS

(Refer Slide Time: 00:01:16 min)

Exceptions in MIPS. know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine

CPU Performance Equation

CISC, RISC, and DSP Microprocessors

An Overview of Stack Architecture and the PSC 1000 Microprocessor

Computer Architecture Basics

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

The AVR Microcontroller and C Compiler Co-Design Dr. Gaute Myklebust ATMEL Corporation ATMEL Development Center, Trondheim, Norway

Chapter 7D The Java Virtual Machine

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

PowerPC 405 GP Overview

A SystemC Transaction Level Model for the MIPS R3000 Processor

Computer organization

WAR: Write After Read

ASSEMBLY PROGRAMMING ON A VIRTUAL COMPUTER

Central Processing Unit

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Faculty of Engineering Student Number:

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Binary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit

Lecture Outline. Stack machines The MIPS assembly language. Code Generation (I)

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Review: MIPS Addressing Modes/Instruction Formats

PART B QUESTIONS AND ANSWERS UNIT I

SOC architecture and design

The Java Virtual Machine and Mobile Devices. John Buford, Ph.D. Oct 2003 Presented to Gordon College CS 311

MICROPROCESSOR. Exclusive for IACE Students iacehyd.blogspot.in Ph: /422 Page 1

CS 61C: Great Ideas in Computer Architecture Finite State Machines. Machine Interpreta4on

How It All Works. Other M68000 Updates. Basic Control Signals. Basic Control Signals

EEM 486: Computer Architecture. Lecture 4. Performance

Outline. Lecture 3. Basics. Logical vs. physical memory physical memory. x86 byte ordering

VLIW Processors. VLIW Processors

CPU Organisation and Operation

Introducción. Diseño de sistemas digitales.1

Architectures and Platforms

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

Pipeline Hazards. Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. Based on the material prepared by Arvind and Krste Asanovic

TriMedia CPU64 Application Development Environment

Computer-System Architecture

MACHINE INSTRUCTIONS AND PROGRAMS

M A S S A C H U S E T T S I N S T I T U T E O F T E C H N O L O G Y DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

1 The Java Virtual Machine

Transcription:

Engineering 9859 CoE Fundamentals Computer Architecture Instruction Set Principles Dennis Peters 1 Fall 2007 1 Based on notes from Dr. R. Venkatesan

RISC vs. CISC Complex Instruction Set Computers (CISC) have 1000s of instructions. Reduced Instruction Set Computers (RISC) designed to efficiently execute about 100 instructions. Most modern processors have features of both RISC and CISC. Architects/designers should remember Amdahl s law: Implement most frequently executed instructions efficiently in hardware Some rarely executed instructions can even be microcoded. Simplicity and uniformity (consistency and orthogonality) in architectural choices will lead to hardware that has a short critical path (worst-case propagation delay), resulting in high performance.

Classifying IS Architectures Classify based on type of internal storage in processor: Stack Operands implicitly on top of stack. Accumulator One operand implicitly in the accumulator. Register-memory Only explicit operands, one may be in memory. Register-register/load-store Only explicity register operands. Load-store is used for almost all new architectures since 1980.

Instruction Types Arithmetic & Logic (ALU) instructions Only register operands Register operands plus one immediate operand (with instn.) Memory access instructions: Load & Store Transfer of control instructions: branches and jumps that are conditional or unconditional, procedure calls, returns, interrupts, exceptions, traps, etc. Special instructions: flag setting, etc. Floating point instructions Graphic, digital signal processing (DSP) instructions Input / Output instructions: Memory-mapped I/O? Other instructions

Instruction Occurrence ALU (incl. ALU immediates) 25 40% Loads 20 30% Stores 10 15% Conditional branches 15 30% others > 2% Data processing applications use loads and stores more frequently than ALU instructions; numeric (scientific & engineering) applications use more ALU. Graphic processors, network processors, DSP, vector coprocessors will have a different mix.

Word and Instruction Size General purpose CPUs are most commonly 32- or 64-bit word based. Microcontrollers with 8-bit and 16-bit words are still common. Instruction size is variable or fixed. Fixed is common in high-performance processors. May be smaller than, equal to or larger than word size. Most GP CPUs use 64-bit words and 32-bit instructions. Very large instruction word (VLIW) processors are emerging. Memory access, especially writing, needs to be done one byte (8 bits) at a time due to character data type memory is almost always byte-addressable even with larger word size.

Memory Organization Which bit is bit 0, lsb or msb? only a convention Every byte has unique address, so a word spans several addresses. Example: 64-bit computer, word spans 8 addresses. Must be consecutive addresses. Which byte is LSB? Little endian LSB is lowest address (little end). Big endian LSB is highest address (big end). Also only a convention. Aligned or non-aligned memory access: Aligned: gaps in memory but fast and simple hardware. Non-aligned: efficient storage but complex and slow hardware. Most (but not all) modern processors restrict that the size of all operands in ALU instructions be equal to the word size again, for fast and simple hardware.

MIPS-64 Instruction Set General Purpose Register (GPR), load-store architecture. 64-bit word, 32-bit instructions. 32 64-bit GPRs R0 is read-only contains 0. R31 is used with procedure calls. 32 32-bit Floating Point Registers (FPR) accessed in pairs as 64-bit entities.. Instruction formats: R-type 6-bit opcode, 3x5-bit register ids, 6-bit function. Used by ALU reg-reg instructions I-type 6-bit opcode, 2x5-bit register ids, 16-bit disp./imm. Used by ALU reg-imm., load, store, conditional branch. J-type used only for J and JAL

MIPS-64 Instruction Set (cont d) Load, store byte (8 bits), half word (16), word (32), double word (64). ALU add, subtract, AND, OR, XOR, shifts (all register-register). FP single- or double-precision move, add, subtract, multiply, divide converstion to/from integer compare, test with branch (true/false) paired single operate on two single-precision operands in a single double-precision register. Branch e.g.: BEQ, BNE, BNEQZ (always conditional). Jump to name or using register, with or without link (return address) in R31.

Sample Code DADDI R1, R0, #8000 ;1000 words in array here: SUBDI R1, R1, #8 ;Word has 8 bytes LD R2, 30000(R1) ;Array starts at 30000 DADD R2, R2, R2 ;Double contents SD 30000(R1), R2 ;Store back in array BNEZ R1, here ;if not done, repeat A 1000-word array, starting at the memory location 30000, is accessed word-by-word (64-bit words), and the value of each word is doubled.

Memory Access (for sample code) Approximately, 40% of all instructions executed access memory. Without cache, memory is accessed 2 1000 times during the execution of the code. If L1 data cache is at least 64kB large, then there would be only compulsory misses of the cache during the execution of the code.

Compilers and Architecture Architectures must be developed in such a manner that compilers can be easily developed. Architects should be aware of how compilers work. Compiler designers can design good products if they are aware of the details of the architecture. Simple compilers perform table look-up. Optimizing compilers can come up with machine code that is up to 50 times faster in execution.

Optimization Optimization is done by compilers at various levels: Front-end: transform to common intermediate form High-level: loop transformations and procedure in-lining Global optimizer: global and local optimizations; register allocation Code generator: machine-dependent optimizations

Example: compiler & architecture A program execution entails running 10 million instructions. 45% of these are ALU instructions. An optimizing compiler removes a third of them. The processor runs at 1 GHz clock speed. Instruction execution times are: 1 cc for ALU, 2 cc for load & store that comprise 25% of all operations, and 3 cc for conditional branches. Compute MIPS ratings and CPU times before and after optimization.

Example solution Before optimization: CPI avg = MIPS = CPU time = After optimization: CPI avg = MIPS = CPU time =