Instruction Set Architecture (ISA) Design. Classification Categories
|
|
|
- Annis Thompson
- 9 years ago
- Views:
Transcription
1 Instruction Set Architecture (ISA) Design Overview» Classify Instruction set architectures» Look at how applications use ISAs» Examine a modern RISC ISA (DLX)» Measurement of ISA usage in real computers 1 Classification Categories How are operands stored in CPU?» accumulator-based» stack-based» register-set based Memory addressing Branch architecture Instruction encoding 2 1
2 Accumulator Architectures Single register A One explicit operand per instruction» two working instruction types: op and store A A op M A A op *M *M A» two addressing modes direct addressing, *M immediate, M Attributes: 0» short instructions possible» minimal internal state; simple internal design» high memory traffic; many loads and stores The most popular early architecture: IBM 7090, DEC PDP-8, MOS 6502 Memory Op Machine State PC Accumulator Address, M Instruction Format 3 Stack Architectures No registers No explicit operands in ALU instructions; one in push/pop A = B + C * D push b push c push d mul add pop a Advantages:» Short instructions possible: implicit stack address references» Compiler is easy to write PC SP Memory Cur Inst Code TOS Stack Op Address, M Op Instruction Formats 4 2
3 Stack Architectures (2) Disadvantages» Code is inefficient fix by: random access to stacked values» Stack is a bottleneck fix by: with a stack-cache Promoted in the 60 s; limited popularity: Burroughs B5500/6500, HP 3000/70, Java VM PC SP TOS Stack $ Memory Cur Inst Code TOS Stack 5 Register-Set based Architectures General Purpose Registers = GPRs Registers are like an explicitly-managed cache for holding recently used values The dominant architecture: CDC 6600, IBM 360/370 (RX), PDP-11, 68000, all RISC machines, etc. FFFFF Advantages:» Allows fast access to temporary values» Permits clever compiler optimization» Reduced traffic to memory Disadvantages:» Longer instructions (than accumulator 0 designs) GPR classification» Number operands in ALU instruction» Number of operands in memory Op i j M Instruction Format IP Rn-1 Memory R1 R0 Machine State 6 3
4 Register-to-Register No memory addresses (load/store architecture)» typically 3-operand ALU ops Bigger encoding, but simplifies register allocation» Simple fixed-length instructions» easily pipelined» higher instruction count» Examples: CDC6600, CRAY-1, most RISCs C = A + B LOAD LOAD ADD STORE R1 <- A R2 <- B R3 <- R1 + R2 C <- R3 7 Register-to-Memory One memory address» typically 2-operand ALU ops» instruction length and/or cycle time varies» result destroys an operand» harder to pipeline» Examples: IBM 360/370 C = A + B LOAD ADD STORE R1 <- A R1 <- R1 + B C <- R1 8 4
5 Memory-to-Memory Three memory addresses,» No register wastage» Large variation in work/instruction C = A + B ADD C <- A + B 9 Measurements to support ISA design Given alternatives, how can we decide which is best? Measure real applications/compilers Ten SPEC92 Benchmarks for load/store machine measurements» Integer : espresso, li, eqntott, compress, gcc» Floating point: doduc, mdljdp2, ear, su2cor, hydro2d Three SPEC89 Benchmarks for VAX measurements» Tex, Spice2g6, gcc Measurements may vary» Dependent on specific benchmark, ISA and compiler» Design of ISAs needs a wide class of benchmarks including OS» Special purpose applications may have different requirements (eg. embeded control, DSP, graphics)» We can still draw conclusions about general purpose apps 10 5
6 Program Usage of Adressing Modes double x[100] ; void foo(int a) { int j ; for(j=0;j<10;j++) x[j] = 3 + a*x[j-1] ; bar(a); } Memory j a Stack procedure array reference x constant argument bar foo 11 Addressing Modes Addressing modes ultimately specify a constant, a register, or a location in memory» Register R1» Immediate #123» Direct (12345)» Register indirect (R1)» Displacement 100(R1)» Indexed (R1+R2) or 12345(R1)» Scaled 100(R2+4*R3)» Memory mode)» Auto-increment (R2)+» Auto-decrement -(R2) PC-relative forms (or PC may be a general register, like VAX R15) Complicated modes reduce instruction count at the cost of complex implementations 12 6
7 What Memory addressing modes are most common? From J.L. Hennessy and D.A. Patterson, Computer Architecture: A Quantitative Approach, Second Edition, Copyright 1996, Morgan KaufmannPublishers, San Francisco, CA. All rights reserved» Measured on the VAX» Register operands account for 51% of all references» Displacement-style mode is most common 13 How big are address displacements 30% 25% 20% Frequency 15% 10% Int. Flt. pt. 5% 0% Value» 12 bits captures 75%» 16 bits captures 99% 14 7
8 How common are immediates? 100% Percentage of operations that use immediates 80% 60% 40% 20% TeX Spice GCC 0% Loads Compares ALU ops 15 How big are immediate fields? 60% 50% 40% 30% 20% TeX Spice GCC 10% 0% Bits needed for the immediate value 16 8
9 What size data is accessed? Byte 7% Halfword Word 19% 31% 74% Flt. pt. Integer Doubleword 69% 0% 20% 40% 60% 80% Original DEC Alpha AXP hadno byte or halfword data access instructions; saves having an alignment network in a critical data path. A good choice? 17 Control flow operations Four types of control flow instructions:» Conditional branches: 84%» Unconditional jumps: 4%» Procedure calls/returns: 12% Ways to specify the destination address:» PC relative uses fewer instruction bits position independent» Any of the addressing modes for data instructions Procedure return requires some kind of register-indirect 18 9
10 Conditional Branch Instructions (1) Z N C O Condition code» record ALU status from each op Z, N, C, O» single copy, modified by most instructions» special small register is set by all operations» sometimes set free, as a side effect of necessary instruction» sometimes set when unwanted!» compare and branch must be kept together» single flag register can be a serial bottleneck» Examples: 360/370, 80x86, VAX, SPARC, PowerPC 19 Conditional Branch Instructions (2) Condition in general registers» record ALU status from compare in a register R1 COMP(R2, R3)» alternatively use condition registers» test the register later for branch BGE R1, LOOP» allows separation of test and branch» enables parallel execution R1 COMP(R2, R3) R4 COMP(R5, R3) BGE R1, DST1 BGE R4, DST2 Cn» may require additional instruction; uses up a register» Examples: MIPS, DEC Alpha Preceding integer compares» Most are EQ or NE (86%) vs. GT/LE(7%) or LT/GE (7%)» Most are immediate compares (87%) of which most are compares to zero (75%) R31 R1 R0 C2 C1 ZNCO C
11 Conditional Branch Instructions (3) Compare-and-branch» Combination compare/branch: no explicit condition encoding BGE R1, R2, LOOP» saves an instruction but requires a lot of work» Examples: VAX, MIPS 21 How far do branches go? 40% 35% 30% 25% 20% 15% 10% 5% 0% number of bits needed to specify distance between target and branch Integer Flt. pt
12 The role of compiler technology Most programming is now done in a high-level language Increased role of the compiler in determining performance Correct and efficient compilers are hard to write, so design ISAs that make it as easy as possible to write an optimizing compiler Optimizations:» High level: source code manipulation (eg: inline procedures)» Local: within straight-line code (eg: common subexpression elimination, constant propagation, strength reduction)» Global: across branches (eg: copy propagation, loop code motion, register allocation)» Machine-dependent (eg: pipeline scheduling) 23 A simple loop int A[100], B[100], C; main () { int i; } c=10; for (i=0; i<100; i++) A[i] = B[i] + C; 24 12
13 Unoptimized code C=10; li r14, 10 sw r14, C for (i=0; i<100; i++) sw r0, 4(sp) $33: A[i] = B[i] + C lw r14, 4(sp) mul r15, r14, 4 lw r24, B(r15) lw r25, C addu r8, r24, r25 lw r16, 4(sp) mul r17, r16, 4 sw r8, A(r17) lw r9, 4(sp) addu r10, r9, 1 sw r10, 4(sp) blt r10, 100, $33 12 instructions per iteration j $31 25 Optimized code C=10; li r14, 10 sw r14, C for (i=0; i<100; i++) la r3, A la r4, B la r6, B+400 $33: A[i] = B[i] + C lw r14, 0(r4) addu r15, r14, 10 sw r15, 0(r3) addu r3, r3, 4 addu r4, r4, 4 bltu r4, r6, $33 6 instructions per iteration 4 fewer loads due to code motion, register allocation, constant propagation 2 fewer multipies due to induction variable elimination, strength reduction Can you do better by hand? j $
14 What is the effect of optimization on the instruction mix? Decreases:» Total instruction count» number of memory references» number of ALU operations Does NOT generally decrease:» number of branch operations, which are thus increased as a percentage of all instructions 27 Good ISA features for an optimizing compiler Regularity» Make operations, data types, and addressing modes all orthogonal Provide primitives, not complex solutions» Do not match high-level language features directly -- it s been tried many times before and generally doesn t work! Simplify tradeoffs» Provide one efficient mechanism Allow constants to be used everywhere» Allows constant propagation Provide enough registers» Enables sophisticated register allocation schemes Provide a way to communicate hints» about memory hierarchy» about branch guesses 28 14
Instruction Set Architecture (ISA)
Instruction Set Architecture (ISA) * Instruction set architecture of a machine fills the semantic gap between the user and the machine. * ISA serves as the starting point for the design of a new machine
Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek
Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture
Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2
Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of
Instruction Set Design
Instruction Set Design Instruction Set Architecture: to what purpose? ISA provides the level of abstraction between the software and the hardware One of the most important abstraction in CS It s narrow,
Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University [email protected].
Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University [email protected] Review Computers in mid 50 s Hardware was expensive
İSTANBUL AYDIN UNIVERSITY
İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER
Instruction Set Architecture
Instruction Set Architecture Consider x := y+z. (x, y, z are memory variables) 1-address instructions 2-address instructions LOAD y (r :=y) ADD y,z (y := y+z) ADD z (r:=r+z) MOVE x,y (x := y) STORE x (x:=r)
Computer Organization and Architecture
Computer Organization and Architecture Chapter 11 Instruction Sets: Addressing Modes and Formats Instruction Set Design One goal of instruction set design is to minimize instruction length Another goal
18-447 Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013
18-447 Computer Architecture Lecture 3: ISA Tradeoffs Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013 Reminder: Homeworks for Next Two Weeks Homework 0 Due next Wednesday (Jan 23), right
Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX
Overview CISC Developments Over Twenty Years Classic CISC design: Digital VAX VAXÕs RISC successor: PRISM/Alpha IntelÕs ubiquitous 80x86 architecture Ð 8086 through the Pentium Pro (P6) RJS 2/3/97 Philosophy
Reduced Instruction Set Computer (RISC)
Reduced Instruction Set Computer (RISC) Focuses on reducing the number and complexity of instructions of the ISA. RISC Goals RISC: Simplify ISA Simplify CPU Design Better CPU Performance Motivated by simplifying
Stack machines The MIPS assembly language A simple source language Stack-machine implementation of the simple language Readings: 9.1-9.
Code Generation I Stack machines The MIPS assembly language A simple source language Stack-machine implementation of the simple language Readings: 9.1-9.7 Stack Machines A simple evaluation model No variables
Week 1 out-of-class notes, discussions and sample problems
Week 1 out-of-class notes, discussions and sample problems Although we will primarily concentrate on RISC processors as found in some desktop/laptop computers, here we take a look at the varying types
LSN 2 Computer Processors
LSN 2 Computer Processors Department of Engineering Technology LSN 2 Computer Processors Microprocessors Design Instruction set Processor organization Processor performance Bandwidth Clock speed LSN 2
Intel 8086 architecture
Intel 8086 architecture Today we ll take a look at Intel s 8086, which is one of the oldest and yet most prevalent processor architectures around. We ll make many comparisons between the MIPS and 8086
CHAPTER 7: The CPU and Memory
CHAPTER 7: The CPU and Memory The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides
Property of ISA vs. Uarch?
More ISA Property of ISA vs. Uarch? ADD instruction s opcode Number of general purpose registers Number of cycles to execute the MUL instruction Whether or not the machine employs pipelined instruction
Review: MIPS Addressing Modes/Instruction Formats
Review: Addressing Modes Addressing mode Example Meaning Register Add R4,R3 R4 R4+R3 Immediate Add R4,#3 R4 R4+3 Displacement Add R4,1(R1) R4 R4+Mem[1+R1] Register indirect Add R4,(R1) R4 R4+Mem[R1] Indexed
Computer Architectures
Computer Architectures 2. Instruction Set Architectures 2015. február 12. Budapest Gábor Horváth associate professor BUTE Dept. of Networked Systems and Services [email protected] 2 Instruction set architectures
An Introduction to Assembly Programming with the ARM 32-bit Processor Family
An Introduction to Assembly Programming with the ARM 32-bit Processor Family G. Agosta Politecnico di Milano December 3, 2011 Contents 1 Introduction 1 1.1 Prerequisites............................. 2
The AVR Microcontroller and C Compiler Co-Design Dr. Gaute Myklebust ATMEL Corporation ATMEL Development Center, Trondheim, Norway
The AVR Microcontroller and C Compiler Co-Design Dr. Gaute Myklebust ATMEL Corporation ATMEL Development Center, Trondheim, Norway Abstract High Level Languages (HLLs) are rapidly becoming the standard
picojava TM : A Hardware Implementation of the Java Virtual Machine
picojava TM : A Hardware Implementation of the Java Virtual Machine Marc Tremblay and Michael O Connor Sun Microelectronics Slide 1 The Java picojava Synergy Java s origins lie in improving the consumer
Chapter 2 Topics. 2.1 Classification of Computers & Instructions 2.2 Classes of Instruction Sets 2.3 Informal Description of Simple RISC Computer, SRC
Chapter 2 Topics 2.1 Classification of Computers & Instructions 2.2 Classes of Instruction Sets 2.3 Informal Description of Simple RISC Computer, SRC See Appendix C for Assembly language information. 2.4
Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s)
Addressing The problem Objectives:- When & Where do we encounter Data? The concept of addressing data' in computations The implications for our machine design(s) Introducing the stack-machine concept Slide
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit
Instruction Set Architecture
CS:APP Chapter 4 Computer Architecture Instruction Set Architecture Randal E. Bryant adapted by Jason Fritts http://csapp.cs.cmu.edu CS:APP2e Hardware Architecture - using Y86 ISA For learning aspects
Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters
Interpreters and virtual machines Michel Schinz 2007 03 23 Interpreters Interpreters Why interpreters? An interpreter is a program that executes another program, represented as some kind of data-structure.
VLIW Processors. VLIW Processors
1 VLIW Processors VLIW ( very long instruction word ) processors instructions are scheduled by the compiler a fixed number of operations are formatted as one big instruction (called a bundle) usually LIW
Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:
Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):
CPU Organization and Assembly Language
COS 140 Foundations of Computer Science School of Computing and Information Science University of Maine October 2, 2015 Outline 1 2 3 4 5 6 7 8 Homework and announcements Reading: Chapter 12 Homework:
EEM 486: Computer Architecture. Lecture 4. Performance
EEM 486: Computer Architecture Lecture 4 Performance EEM 486 Performance Purchasing perspective Given a collection of machines, which has the» Best performance?» Least cost?» Best performance / cost? Design
Chapter 2 Logic Gates and Introduction to Computer Architecture
Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are
Types of Workloads. Raj Jain. Washington University in St. Louis
Types of Workloads Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 [email protected] These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/ 4-1 Overview!
Computer organization
Computer organization Computer design an application of digital logic design procedures Computer = processing unit + memory system Processing unit = control + datapath Control = finite state machine inputs
a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.
CS143 Handout 18 Summer 2008 30 July, 2008 Processor Architectures Handout written by Maggie Johnson and revised by Julie Zelenski. Architecture Vocabulary Let s review a few relevant hardware definitions:
ARM Microprocessor and ARM-Based Microcontrollers
ARM Microprocessor and ARM-Based Microcontrollers Nguatem William 24th May 2006 A Microcontroller-Based Embedded System Roadmap 1 Introduction ARM ARM Basics 2 ARM Extensions Thumb Jazelle NEON & DSP Enhancement
Lecture Outline. Stack machines The MIPS assembly language. Code Generation (I)
Lecture Outline Code Generation (I) Stack machines The MIPS assembl language Adapted from Lectures b Profs. Ale Aiken and George Necula (UCB) A simple source language Stack- machine implementation of the
Typy danych. Data types: Literals:
Lab 10 MIPS32 Typy danych Data types: Instructions are all 32 bits byte(8 bits), halfword (2 bytes), word (4 bytes) a character requires 1 byte of storage an integer requires 1 word (4 bytes) of storage
Pipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1
Pipeline Hazards Structure hazard Data hazard Pipeline hazard: the major hurdle A hazard is a condition that prevents an instruction in the pipe from executing its next scheduled pipe stage Taxonomy of
CSE 141 Introduction to Computer Architecture Summer Session I, 2005. Lecture 1 Introduction. Pramod V. Argade June 27, 2005
CSE 141 Introduction to Computer Architecture Summer Session I, 2005 Lecture 1 Introduction Pramod V. Argade June 27, 2005 CSE141: Introduction to Computer Architecture Instructor: Pramod V. Argade ([email protected])
Introducción. Diseño de sistemas digitales.1
Introducción Adapted from: Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg431 [Original from Computer Organization and Design, Patterson & Hennessy, 2005, UCB] Diseño de sistemas digitales.1
In the Beginning... 1964 -- The first ISA appears on the IBM System 360 In the good old days
RISC vs CISC 66 In the Beginning... 1964 -- The first ISA appears on the IBM System 360 In the good old days Initially, the focus was on usability by humans. Lots of user-friendly instructions (remember
2) Write in detail the issues in the design of code generator.
COMPUTER SCIENCE AND ENGINEERING VI SEM CSE Principles of Compiler Design Unit-IV Question and answers UNIT IV CODE GENERATION 9 Issues in the design of code generator The target machine Runtime Storage
EC 362 Problem Set #2
EC 362 Problem Set #2 1) Using Single Precision IEEE 754, what is FF28 0000? 2) Suppose the fraction enhanced of a processor is 40% and the speedup of the enhancement was tenfold. What is the overall speedup?
MICROPROCESSOR AND MICROCOMPUTER BASICS
Introduction MICROPROCESSOR AND MICROCOMPUTER BASICS At present there are many types and sizes of computers available. These computers are designed and constructed based on digital and Integrated Circuit
A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc
Other architectures Example. Accumulator-based machines A single register, called the accumulator, stores the operand before the operation, and stores the result after the operation. Load x # into acc
RISC AND CISC. Computer Architecture. Farhat Masood BE Electrical (NUST) COLLEGE OF ELECTRICAL AND MECHANICAL ENGINEERING
COLLEGE OF ELECTRICAL AND MECHANICAL ENGINEERING NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLOGY (NUST) RISC AND CISC Computer Architecture By Farhat Masood BE Electrical (NUST) II TABLE OF CONTENTS GENERAL...
Computer Systems Architecture
Computer Systems Architecture http://cs.nott.ac.uk/ txa/g51csa/ Thorsten Altenkirch and Liyang Hu School of Computer Science University of Nottingham Lecture 10: MIPS Procedure Calling Convention and Recursion
Quiz for Chapter 1 Computer Abstractions and Technology 3.10
Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,
WAR: Write After Read
WAR: Write After Read write-after-read (WAR) = artificial (name) dependence add R1, R2, R3 sub R2, R4, R1 or R1, R6, R3 problem: add could use wrong value for R2 can t happen in vanilla pipeline (reads
The Design of the Inferno Virtual Machine. Introduction
The Design of the Inferno Virtual Machine Phil Winterbottom Rob Pike Bell Labs, Lucent Technologies {philw, rob}@plan9.bell-labs.com http://www.lucent.com/inferno Introduction Virtual Machine are topical
What to do when I have a load/store instruction?
76 What to do when I have a load/store instruction? Is there a label involved or a virtual address to compute? 1 A label (such as ldiq $T1, a; ldq $T0, ($T1);): a Find the address the label is pointing
EE282 Computer Architecture and Organization Midterm Exam February 13, 2001. (Total Time = 120 minutes, Total Points = 100)
EE282 Computer Architecture and Organization Midterm Exam February 13, 2001 (Total Time = 120 minutes, Total Points = 100) Name: (please print) Wolfe - Solution In recognition of and in the spirit of the
CS521 CSE IITG 11/23/2012
CS521 CSE TG 11/23/2012 A Sahu 1 Degree of overlap Serial, Overlapped, d, Super pipelined/superscalar Depth Shallow, Deep Structure Linear, Non linear Scheduling of operations Static, Dynamic A Sahu slide
OAMulator. Online One Address Machine emulator and OAMPL compiler. http://myspiders.biz.uiowa.edu/~fil/oam/
OAMulator Online One Address Machine emulator and OAMPL compiler http://myspiders.biz.uiowa.edu/~fil/oam/ OAMulator educational goals OAM emulator concepts Von Neumann architecture Registers, ALU, controller
PROBLEMS (Cap. 4 - Istruzioni macchina)
98 CHAPTER 2 MACHINE INSTRUCTIONS AND PROGRAMS PROBLEMS (Cap. 4 - Istruzioni macchina) 2.1 Represent the decimal values 5, 2, 14, 10, 26, 19, 51, and 43, as signed, 7-bit numbers in the following binary
Design Cycle for Microprocessors
Cycle for Microprocessors Raúl Martínez Intel Barcelona Research Center Cursos de Verano 2010 UCLM Intel Corporation, 2010 Agenda Introduction plan Architecture Microarchitecture Logic Silicon ramp Types
GPU Hardware Performance. Fall 2015
Fall 2015 Atomic operations performs read-modify-write operations on shared or global memory no interference with other threads for 32-bit and 64-bit integers (c. c. 1.2), float addition (c. c. 2.0) using
Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:
Pipelining HW Q. Can a MIPS SW instruction executing in a simple 5-stage pipelined implementation have a data dependency hazard of any type resulting in a nop bubble? If so, show an example; if not, prove
CS352H: Computer Systems Architecture
CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline - Hazards October 1, 2009 University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell Data Hazards in ALU Instructions
Software Pipelining. for (i=1, i<100, i++) { x := A[i]; x := x+1; A[i] := x
Software Pipelining for (i=1, i
Stack Allocation. Run-Time Data Structures. Static Structures
Run-Time Data Structures Stack Allocation Static Structures For static structures, a fixed address is used throughout execution. This is the oldest and simplest memory organization. In current compilers,
PROBLEMS #20,R0,R1 #$3A,R2,R4
506 CHAPTER 8 PIPELINING (Corrisponde al cap. 11 - Introduzione al pipelining) PROBLEMS 8.1 Consider the following sequence of instructions Mul And #20,R0,R1 #3,R2,R3 #$3A,R2,R4 R0,R2,R5 In all instructions,
1 Classical Universal Computer 3
Chapter 6: Machine Language and Assembler Christian Jacob 1 Classical Universal Computer 3 1.1 Von Neumann Architecture 3 1.2 CPU and RAM 5 1.3 Arithmetic Logical Unit (ALU) 6 1.4 Arithmetic Logical Unit
Chapter 5 Instructor's Manual
The Essentials of Computer Organization and Architecture Linda Null and Julia Lobur Jones and Bartlett Publishers, 2003 Chapter 5 Instructor's Manual Chapter Objectives Chapter 5, A Closer Look at Instruction
Habanero Extreme Scale Software Research Project
Habanero Extreme Scale Software Research Project Comp215: Java Method Dispatch Zoran Budimlić (Rice University) Always remember that you are absolutely unique. Just like everyone else. - Margaret Mead
Chapter 7D The Java Virtual Machine
This sub chapter discusses another architecture, that of the JVM (Java Virtual Machine). In general, a VM (Virtual Machine) is a hypothetical machine (implemented in either hardware or software) that directly
Winter 2002 MID-SESSION TEST Friday, March 1 6:30 to 8:00pm
University of Calgary Department of Electrical and Computer Engineering ENCM 369: Computer Organization Instructors: Dr. S. A. Norman (L01) and Dr. S. Yanushkevich (L02) Winter 2002 MID-SESSION TEST Friday,
612 CHAPTER 11 PROCESSOR FAMILIES (Corrisponde al cap. 12 - Famiglie di processori) PROBLEMS
612 CHAPTER 11 PROCESSOR FAMILIES (Corrisponde al cap. 12 - Famiglie di processori) PROBLEMS 11.1 How is conditional execution of ARM instructions (see Part I of Chapter 3) related to predicated execution
More on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction
EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution
EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000 Lecture #11: Wednesday, 3 May 2000 Lecturer: Ben Serebrin Scribe: Dean Liu ILP Execution
on an system with an infinite number of processors. Calculate the speedup of
1. Amdahl s law Three enhancements with the following speedups are proposed for a new architecture: Speedup1 = 30 Speedup2 = 20 Speedup3 = 10 Only one enhancement is usable at a time. a) If enhancements
Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com
CSCI-UA.0201-003 Computer Systems Organization Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com Some slides adapted (and slightly modified)
INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER
Course on: Advanced Computer Architectures INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER Prof. Cristina Silvano Politecnico di Milano [email protected] Prof. Silvano, Politecnico di Milano
Processor Architectures
ECPE 170 Jeff Shafer University of the Pacific Processor Architectures 2 Schedule Exam 3 Tuesday, December 6 th Caches Virtual Memory Input / Output OperaKng Systems Compilers & Assemblers Processor Architecture
Microprocessor and Microcontroller Architecture
Microprocessor and Microcontroller Architecture 1 Von Neumann Architecture Stored-Program Digital Computer Digital computation in ALU Programmable via set of standard instructions input memory output Internal
FLIX: Fast Relief for Performance-Hungry Embedded Applications
FLIX: Fast Relief for Performance-Hungry Embedded Applications Tensilica Inc. February 25 25 Tensilica, Inc. 25 Tensilica, Inc. ii Contents FLIX: Fast Relief for Performance-Hungry Embedded Applications...
MACHINE ARCHITECTURE & LANGUAGE
in the name of God the compassionate, the merciful notes on MACHINE ARCHITECTURE & LANGUAGE compiled by Jumong Chap. 9 Microprocessor Fundamentals A system designer should consider a microprocessor-based
1 The Java Virtual Machine
1 The Java Virtual Machine About the Spec Format This document describes the Java virtual machine and the instruction set. In this introduction, each component of the machine is briefly described. This
Computer Architecture TDTS10
why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers
Introduction to MIPS Assembly Programming
1 / 26 Introduction to MIPS Assembly Programming January 23 25, 2013 2 / 26 Outline Overview of assembly programming MARS tutorial MIPS assembly syntax Role of pseudocode Some simple instructions Integer
EECS 427 RISC PROCESSOR
RISC PROCESSOR ISA FOR EECS 427 PROCESSOR ImmHi/ ImmLo/ OP Code Rdest OP Code Ext Rsrc Mnemonic Operands 15-12 11-8 7-4 3-0 Notes (* is Baseline) ADD Rsrc, Rdest 0000 Rdest 0101 Rsrc * ADDI Imm, Rdest
THUMB Instruction Set
5 THUMB Instruction Set This chapter describes the THUMB instruction set. Format Summary 5-2 Opcode Summary 5-3 5. Format : move shifted register 5-5 5.2 Format 2: add/subtract 5-7 5.3 Format 3: move/compare/add/subtract
An Introduction to the ARM 7 Architecture
An Introduction to the ARM 7 Architecture Trevor Martin CEng, MIEE Technical Director This article gives an overview of the ARM 7 architecture and a description of its major features for a developer new
Central Processing Unit Simulation Version v2.5 (July 2005) Charles André University Nice-Sophia Antipolis
Central Processing Unit Simulation Version v2.5 (July 2005) Charles André University Nice-Sophia Antipolis 1 1 Table of Contents 1 Table of Contents... 3 2 Overview... 5 3 Installation... 7 4 The CPU
language 1 (source) compiler language 2 (target) Figure 1: Compiling a program
CS 2112 Lecture 27 Interpreters, compilers, and the Java Virtual Machine 1 May 2012 Lecturer: Andrew Myers 1 Interpreters vs. compilers There are two strategies for obtaining runnable code from a program
ARM Architecture. ARM history. Why ARM? ARM Ltd. 1983 developed by Acorn computers. Computer Organization and Assembly Languages Yung-Yu Chuang
ARM history ARM Architecture Computer Organization and Assembly Languages g Yung-Yu Chuang 1983 developed by Acorn computers To replace 6502 in BBC computers 4-man VLSI design team Its simplicity it comes
Virtual Machines as an Aid in Teaching Computer Concepts
Virtual Machines as an Aid in Teaching Computer Concepts Ola Ågren Department of Computing Science Umeå University SE-901 87 Umeå, SWEDEN E-mail: [email protected] Abstract A debugger containing a set
The Microarchitecture of Superscalar Processors
The Microarchitecture of Superscalar Processors James E. Smith Department of Electrical and Computer Engineering 1415 Johnson Drive Madison, WI 53706 ph: (608)-265-5737 fax:(608)-262-1267 email: [email protected]
System i Architecture Part 1. Module 2
Module 2 Copyright IBM Corporation 2008 1 System i Architecture Part 1 Module 2 Module 2 Copyright IBM Corporation 2008 2 2.1 Impacts of Computer Design Module 2 Copyright IBM Corporation 2008 3 If an
More MIPS: Recursion. Computer Science 104 Lecture 9
More MIPS: Recursion Computer Science 104 Lecture 9 Admin Homework Homework 1: graded. 50% As, 27% Bs Homework 2: Due Wed Midterm 1 This Wed 1 page of notes 2 Last time What did we do last time? 3 Last
Pipelining Review and Its Limitations
Pipelining Review and Its Limitations Yuri Baida [email protected] [email protected] October 16, 2010 Moscow Institute of Physics and Technology Agenda Review Instruction set architecture Basic
Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level
System: User s View System Components: High Level View Input Output 1 System: Motherboard Level 2 Components: Interconnection I/O MEMORY 3 4 Organization Registers ALU CU 5 6 1 Input/Output I/O MEMORY
This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?
This Unit: Putting It All Together CIS 501 Computer Architecture Unit 11: Putting It All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Amir Roth with contributions by Milo
Divide: Paper & Pencil. Computer Architecture ALU Design : Division and Floating Point. Divide algorithm. DIVIDE HARDWARE Version 1
Divide: Paper & Pencil Computer Architecture ALU Design : Division and Floating Point 1001 Quotient Divisor 1000 1001010 Dividend 1000 10 101 1010 1000 10 (or Modulo result) See how big a number can be
EE361: Digital Computer Organization Course Syllabus
EE361: Digital Computer Organization Course Syllabus Dr. Mohammad H. Awedh Spring 2014 Course Objectives Simply, a computer is a set of components (Processor, Memory and Storage, Input/Output Devices)
Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers
CS 4 Introduction to Compilers ndrew Myers Cornell University dministration Prelim tomorrow evening No class Wednesday P due in days Optional reading: Muchnick 7 Lecture : Instruction scheduling pr 0 Modern
