# Computer Architecture TDTS10

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers Erik Larsson Department of Computer Science 2 Pipelining FI: Fetch Instruction DI: Decode Instruction CO: Calculate operand FO: Fetch Operand EI: Execute Instruction WO: Write Operand Superpipelining FI: Fetch Instruction DI: Decode Instruction CO: Calculate operand FO: Fetch Operand EI: Execute Instruction WO: Write Operand 3 4

2 Superscalar Architecture FI: Fetch Instruction DI: Decode Instruction CO: Calculate operand FO: Fetch Operand EI: Execute Instruction WO: Write Operand Superscalar Architectures Superscalar architectures allow several instructions to be issued and completed per clock cycle. A superscalar architecture consists of a number of pipelines that are working in parallel. Depending on the number and kind of parallel units available, a certain number of instructions can be executed in parallel. 5 6 Limitations Resource conflicts Control dependency Data conflicts True data dependency Output dependency Anti-dependency Limitations worse - Resource conflict - similar to structural hazards Control - branches Data-conflicts True Data Dependency ADD R2,R4,R5 R2 R4 + R5 MUL R4,R3,R1 ADD R2,R4,R5 Assume: R1=2, R2=0, R3=2, R4=0, R5=2 Correct: MUL makes: R4=4 ADD makes: R2=6 R4 becomes 4 FO EI WO Value of R4 is read. R4=0 at this point 7 8

3 9 Output Dependency Antidependency Two instructions are writing into the same location; if the second instruction writes before the first one, an error occurs: ADD R4,R2,R5 R4 R2 + R5 An antidependency exists if an instruction uses a location as an operand while a following one is writing into that location; if the first one is still using the location when the second one writes into it, an error occurs: MUL R4,R3,R1 ADD R4,R2,R5 ADD R3,R2,R5 R3 R2 + R5 10 Output Dependency and Antidependency Correct operation Policies for Parallel Instruction Execution Without dependencies by using additional registers: Case 1: ADD R7,R2,R5 R7 R2 + R5 R4 Case 2: ADD R6,R2,R5 R6 R2 + R5 R4 Storage conflicts The ability of a superscalar processor to execute instructions in parallel is determined by: the number and nature of parallel pipelines the mechanism used to find independent instructions. The policies are characterized by: the order in which instructions are issued for execution. the order in which instructions are completed. Execution policies: In-order issue with in-order completion. In-order issue with out-of-order completion. Out-of-order issue with out-of-order completion

4 13 Policies for Parallel Instruction Execution We consider the following instruction sequence: I1: ADDF R12,R13,R14 R12 R13 + R14 (float. pnt.) I2: ADD R1,R8,R9 R1 R8 + R9 I3: MUL R4,R2,R3 R4 R2 * R3 I4: MUL R5,R6,R7 R5 R6 * R7 I5: ADD R10,R5,R7 R10 R5 + R7 I6: ADD R11,R2,R3 R11 R2 + R3 Two instructions can be fetched and decoded at a time; Three functional units can work in parallel: floating point unit, integer adder, integer multiplier; Two instructions can be written back (completed) at a time; I1 requires two cycles to execute; I3 and I4 are in conflict for the same functional unit; I5 depends on the value produced by I4 (we have a true data dependency between I4 and I5); I2, I5 and I6 are in conflict for the same functional unit; 14 In-Order Issue with In-Order Completion In-Order Issue with In-Order Completion Decode/ Issue Execute Write back/ complete Cycle I1 I2 1 I3 I4 I1 I2 2 I5 I6 I1 STALL 3 I3 I1 I2 4 I4 I3 5 I5 I4 6 I6 I5 7 I1: ADDF R12,R13,R14 I2: ADD R1,R8,R9 I3: MUL R4,R2,R3 I4: MUL R5,R6,R7 I5: ADD R10,R5,R7 I6: ADD R11,R2,R3 Instructions are issued in the exact order that would correspond to sequential execution; results are written (completion) in the same order. An instruction cannot be issued before the previous one has been issued; An instruction completes only after the previous one has completed. To guarantee in-order completion, instruction issuing stalls when there is a conflict and when the unit requires more than one cycle to execute; The processor detects and handles (by stalling) true data dependencies and resource conflicts. I

6 21 Some Architectures Summary Superscalars Pentium three independent execution units: 2 Integer units Floating point unit in-order issue two instructions issued per clock cycle. Pentium II to 4 provides in addition to the Pentium out-of-order issue five instructions can be issued in one cycle Good The hardware solves everything: Hardware detects potential parallelism between instructions; Hardware tries to issue as many instructions as possible in parallel. Hardware solves register renaming. Binary compatibility If functional units are added in a new version of the architecture or some other improvements have been made to the architecture (without changing the instruction sets), old programs can benefit from the additional potential of parallelism. Why? Because the new hardware will issue the old instruction sequence in a more efficient way. 22 Summary Superscalars Outline Bad Very complex Much hardware is needed for run-time detection. There is a limit in how far we can go with this technique. Power consumption can be very large! The instruction window is limited this limits the capacity to detect potentially parallel instructions Superscalar Processors Very Long Instruction Word Processors Parallel computers 23 24

7 25 The Alternative: VLIW Processors VLIW Processors VLIW architectures rely on compile-time detection of parallelism the compiler analysis the program and detects operations to be executed in parallel; such operations are packed into one large instruction. After one instruction has been fetched all the corresponding operations are issued in parallel. No hardware is needed for run-time detection of parallelism. Detection of parallelism and packaging of operations into instructions is done, by the compiler, off-line. The instruction window problem is solved: the compiler can potentially analyse the whole program in order to detect parallel operations. 26 VLIW Processors Summary VLIW Processors Advantages the number of FUs can be increased without needing additional sophisticated hardware to detect parallelism. power consumption can be reduced. Compilers can detect parallelism based on global analysis program. Problems: large number of registers needed in order to keep all FUs active. Large data transport capacity is needed between FUs and the register file register files and memory. instruction cache and fetch unit. Large code size, partially because unused operations -> wasted bits. Incompatibility of binary code 27 28

8 Outline Parallel Computers Superscalar Processors Very Long Instruction Word Processors Parallel computers One solution to the need for high performance: architectures in which several CPUs are running in order to solve a certain application. Such computers have been organized in very different ways. Some key features: number and complexity of individual CPUs availability of common (shared memory) interconnection topology performance of interconnection network I/O devices The need for high performance! Two main factors contribute to high performance of modern processors: Fast circuit technology Architectural features: large caches multiple fast buses pipelining superscalar architectures (multiple funct. units) Classification of Computer Architectures Processor organisation Flynn s classification is based on the nature of the instruction flow executed by the computer and that of the data flow on which the instructions operate. The multiplicity of instruction stream and data stream gives us four different classes: Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD Multiple instruction, multiple data stream- MIMD References Flynn, M., Some Computer Organizations and Their Effectiveness, IEEE Trans. Comput., Vol. C-21, pp. 948, Duncan, Ralph, "A Survey of Parallel Computer Architectures", IEEE Computer. February 1990, pp Single instruction, single data stream (SISD) Uniprocessor Single instruction, multiple data stream (SIMD) Vector processor Processor organizations Array processor Multiple instruction, single data stream (MISD) Shared memory Symmetric multiprocessor (SMP) Multiple instruction, multiple data stream (MIMD) Distributed memory Nonuniform memory access (NUMA) 31 32

9 33 Single Instruction stream, Single Data stream (SISD) Single Instruction stream, Multiple Data stream (SIMD) with shared memory 34 SIMD with no shared memory Multiple Instruction stream, Multiple Data stream (MIMD) with shared memory 35 36

10 37 MIMD with no shared memory Multiprocessors Shared memory MIMD computers are called multiprocessors: 38 Amdahl's law N - parallel units Multiprocessors S = (s + p)/(s + p/n)= 1 / (s + p/n) where S is speed, s is seriel and p is parallel, and N is the number of processors. (s+p) is the total execution time p is the time that can be executed in parallel s is the time that cannot be executed in parallel Example, assume that 20% must be executed serial while 80% can be executed in parallel, and that there are 4 processors. Optimal speed-up is 4 but this example gives: S= (0,2 + 0,8)/ (0,2+0,8/4) = 1/0,4 = 2,5 IBM System/370 (1970s): two IBM CPUs connected to shared memory. IBM System370/XA (1981): multiple CPUs can be connected to shared memory. IBM System/390 (1990s): similar features like S370/XA, with improved performance. Possibility to connect several multiprocessor systems together through fast fiber-optic connection. CRAY X-MP (mid 1980s): from one to four vector processors connected to shared memory (cycle time: 8.5 ns). CRAY Y-MP (1988): from one to 8 vector processors connected to shared memory; 3 times more powerful than CRAY X-MP (cycle time: 4 ns). C90 (early 1990s): further development of CRAY Y-MP; 16 vector processors. CRAY 3 (1993): maximum 16 vector processors (cycle time 2ns). Butterfly multiprocessor system, by BBN Advanced Computers (1985/87): maximum 256 Motorola processors, interconnected by a sophisticated dynamic switching network. BBN TC2000 (1990): improved version of the Butterfly using Motorola RISC processor

11 Questions What is a superscalar architecture? How does Amdahl's law work? Give examples why many parallel units will not improve performance? What is the difference between: In-order issue with in-order completion, In-order issue with out-of-order completion, and Out-of-order issue with out-of-order completion? Give examples of True data dependency, Output dependency, and Anti-dependency Give advantages and disadvantages of VLIW How did Flynn classify computers? 41

### UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS Structure Page Nos. 2.0 Introduction 27 2.1 Objectives 27 2.2 Types of Classification 28 2.3 Flynn s Classification 28 2.3.1 Instruction Cycle 2.3.2 Instruction

### LSN 2 Computer Processors

LSN 2 Computer Processors Department of Engineering Technology LSN 2 Computer Processors Microprocessors Design Instruction set Processor organization Processor performance Bandwidth Clock speed LSN 2

Advanced Pipelining Techniques 1. Dynamic Scheduling 2. Loop Unrolling 3. Software Pipelining 4. Dynamic Branch Prediction Units 5. Register Renaming 6. Superscalar Processors 7. VLIW (Very Large Instruction

### Computer Organization and Architecture

Computer Organization and Architecture Chapter 14 Instruction Level Parallelism and Superscalar Processors What does Superscalar mean? Common instructions (arithmetic, load/store, conditional branch) can

### Introduction to Cloud Computing

Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

### INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

Course on: Advanced Computer Architectures INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER Prof. Cristina Silvano Politecnico di Milano cristina.silvano@polimi.it Prof. Silvano, Politecnico di Milano

### CMSC 611: Advanced Computer Architecture

CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis Parallel Computers Definition: A parallel computer is a collection of processing

### VLIW Processors. VLIW Processors

1 VLIW Processors VLIW ( very long instruction word ) processors instructions are scheduled by the compiler a fixed number of operations are formatted as one big instruction (called a bundle) usually LIW

### Lecture 23: Multiprocessors

Lecture 23: Multiprocessors Today s topics: RAID Multiprocessor taxonomy Snooping-based cache coherence protocol 1 RAID 0 and RAID 1 RAID 0 has no additional redundancy (misnomer) it uses an array of disks

### IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus

Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 CELL INTRODUCTION 2 1 CELL SYNERGY Cell is not a collection of different processors, but a synergistic whole Operation paradigms,

More on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction

### Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

### Tools Page 1 of 13 ON PROGRAM TRANSLATION. A priori, we have two translation mechanisms available:

Tools Page 1 of 13 ON PROGRAM TRANSLATION A priori, we have two translation mechanisms available: Interpretation Compilation On interpretation: Statements are translated one at a time and executed immediately.

### Pipelining Review and Its Limitations

Pipelining Review and Its Limitations Yuri Baida yuri.baida@gmail.com yuriy.v.baida@intel.com October 16, 2010 Moscow Institute of Physics and Technology Agenda Review Instruction set architecture Basic

### Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):

### A Lab Course on Computer Architecture

A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,

### Scalability and Classifications

Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

### Hiroaki Kobayashi 8/3/2011

Hiroaki Kobayashi 8/3/2011 l l l l l l l l l Introduction The Difficulty of Creating Parallel Processing Programs Shared Memory Multiprocessors Clusters and Other Message-Passing Multiprocessors Cache

### Chapter 2 Logic Gates and Introduction to Computer Architecture

Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are

### Chapter 2 Parallel Computer Architecture

Chapter 2 Parallel Computer Architecture The possibility for a parallel execution of computations strongly depends on the architecture of the execution platform. This chapter gives an overview of the general

### Middleware and Distributed Systems. Introduction. Dr. Martin v. Löwis

Middleware and Distributed Systems Introduction Dr. Martin v. Löwis 14 3. Software Engineering What is Middleware? Bauer et al. Software Engineering, Report on a conference sponsored by the NATO SCIENCE

### A Brief Review of Processor Architecture. Why are Modern Processors so Complicated? Basic Structure

A Brief Review of Processor Architecture Why are Modern Processors so Complicated? Basic Structure CPU PC IR Regs ALU Memory Fetch PC -> Mem addr [addr] > IR PC ++ Decode Select regs Execute Perform op

### Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

CS 4 Introduction to Compilers ndrew Myers Cornell University dministration Prelim tomorrow evening No class Wednesday P due in days Optional reading: Muchnick 7 Lecture : Instruction scheduling pr 0 Modern

### OC By Arsene Fansi T. POLIMI 2008 1

IBM POWER 6 MICROPROCESSOR OC By Arsene Fansi T. POLIMI 2008 1 WHAT S IBM POWER 6 MICROPOCESSOR The IBM POWER6 microprocessor powers the new IBM i-series* and p-series* systems. It s based on IBM POWER5

### Some Computer Organizations and Their Effectiveness. Michael J Flynn. IEEE Transactions on Computers. Vol. c-21, No.

Some Computer Organizations and Their Effectiveness Michael J Flynn IEEE Transactions on Computers. Vol. c-21, No.9, September 1972 Introduction Attempts to codify a computer have been from three points

### Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

### ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit

### SOC architecture and design

SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external

### Introduction to GPU Programming Languages

CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

### ELE 356 Computer Engineering II. Section 1 Foundations Class 6 Architecture

ELE 356 Computer Engineering II Section 1 Foundations Class 6 Architecture History ENIAC Video 2 tj History Mechanical Devices Abacus 3 tj History Mechanical Devices The Antikythera Mechanism Oldest known

### Annotation to the assignments and the solution sheet. Note the following points

Computer rchitecture 2 / dvanced Computer rchitecture Seite: 1 nnotation to the assignments and the solution sheet This is a multiple choice examination, that means: Solution approaches are not assessed

### Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors Lesson 05: Array Processors Objective To learn how the array processes in multiple pipelines 2 Array Processor

### Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) * Instruction set architecture of a machine fills the semantic gap between the user and the machine. * ISA serves as the starting point for the design of a new machine

### Chapter 2 Parallel Architecture, Software And Performance

Chapter 2 Parallel Architecture, Software And Performance UCSB CS140, T. Yang, 2014 Modified from texbook slides Roadmap Parallel hardware Parallel software Input and output Performance Parallel program

### INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 ISSN 0976 6464(Print)

### White Paper COMPUTE CORES

White Paper COMPUTE CORES TABLE OF CONTENTS A NEW ERA OF COMPUTING 3 3 HISTORY OF PROCESSORS 3 3 THE COMPUTE CORE NOMENCLATURE 5 3 AMD S HETEROGENEOUS PLATFORM 5 3 SUMMARY 6 4 WHITE PAPER: COMPUTE CORES

### This Unit: Code Scheduling. CIS 371 Computer Organization and Design. Pipelining Review. Readings. ! Pipelining and superscalar review

This Unit: Code Scheduling CIS 371 Computer Organization and Design App App App System software Mem CPU I/O! Pipelining and superscalar review! Code scheduling! To reduce pipeline stalls! To increase ILP

### How to improve (decrease) CPI

How to improve (decrease) CPI Recall: CPI = Ideal CPI + CPI contributed by stalls Ideal CPI =1 for single issue machine even with multiple execution units Ideal CPI will be less than 1 if we have several

### Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.tw Review Computers in mid 50 s Hardware was expensive

### Design Cycle for Microprocessors

Cycle for Microprocessors Raúl Martínez Intel Barcelona Research Center Cursos de Verano 2010 UCLM Intel Corporation, 2010 Agenda Introduction plan Architecture Microarchitecture Logic Silicon ramp Types

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA AGENDA INTRO TO BEAGLEBONE BLACK HARDWARE & SPECS CORTEX-A8 ARMV7 PROCESSOR PROS & CONS VS RASPBERRY PI WHEN TO USE BEAGLEBONE BLACK Single

### Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Overview CISC Developments Over Twenty Years Classic CISC design: Digital VAX VAXÕs RISC successor: PRISM/Alpha IntelÕs ubiquitous 80x86 architecture Ð 8086 through the Pentium Pro (P6) RJS 2/3/97 Philosophy

### What are embedded systems? Challenges in embedded computing system design. Design methodologies.

Embedded Systems Sandip Kundu 1 ECE 354 Lecture 1 The Big Picture What are embedded systems? Challenges in embedded computing system design. Design methodologies. Sophisticated functionality. Real-time

### Introducción. Diseño de sistemas digitales.1

Introducción Adapted from: Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg431 [Original from Computer Organization and Design, Patterson & Hennessy, 2005, UCB] Diseño de sistemas digitales.1

### Instruction Set Architectures

Instruction Set Architectures Early trend was to add more and more instructions to new CPUs to do elaborate operations (CISC) VAX architecture had an instruction to multiply polynomials! RISC philosophy

### High Performance Computing

High Performance Computing Trey Breckenridge Computing Systems Manager Engineering Research Center Mississippi State University What is High Performance Computing? HPC is ill defined and context dependent.

### An Introduction to Parallel Computing/ Programming

An Introduction to Parallel Computing/ Programming Vicky Papadopoulou Lesta Astrophysics and High Performance Computing Research Group (http://ahpc.euc.ac.cy) Dep. of Computer Science and Engineering European

### Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture

### Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design

Learning Outcomes Simple CPU Operation and Buses Dr Eddie Edwards eddie.edwards@imperial.ac.uk At the end of this lecture you will Understand how a CPU might be put together Be able to name the basic components

### PROBLEMS #20,R0,R1 #\$3A,R2,R4

506 CHAPTER 8 PIPELINING (Corrisponde al cap. 11 - Introduzione al pipelining) PROBLEMS 8.1 Consider the following sequence of instructions Mul And #20,R0,R1 #3,R2,R3 #\$3A,R2,R4 R0,R2,R5 In all instructions,

### 18-447 Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013

18-447 Computer Architecture Lecture 3: ISA Tradeoffs Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013 Reminder: Homeworks for Next Two Weeks Homework 0 Due next Wednesday (Jan 23), right

### Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27

Logistics Week 1: Wednesday, Jan 27 Because of overcrowding, we will be changing to a new room on Monday (Snee 1120). Accounts on the class cluster (crocus.csuglab.cornell.edu) will be available next week.

### Chapter 7. Multicores, Multiprocessors, and Clusters

Chapter 7 Multicores, Multiprocessors, and Clusters Introduction Goal: connecting multiple computers to get higher performance Multiprocessors Scalability, availability, power efficiency Job-level (process-level)

### on an system with an infinite number of processors. Calculate the speedup of

1. Amdahl s law Three enhancements with the following speedups are proposed for a new architecture: Speedup1 = 30 Speedup2 = 20 Speedup3 = 10 Only one enhancement is usable at a time. a) If enhancements

### Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,

### ARM Cortex-A8 Processor

ARM Cortex-A8 Processor High Performances And Low Power for Portable Applications Architectures for Multimedia Systems Prof. Cristina Silvano Gianfranco Longi Matr. 712351 ARM Partners 1 ARM Powered Products

### Computer Organization

Computer Organization and Architecture Designing for Performance Ninth Edition William Stallings International Edition contributions by R. Mohan National Institute of Technology, Tiruchirappalli PEARSON

### Multicore and Parallel Processing

Multicore and Parallel Processing Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University P & H Chapter 4.10 11, 7.1 6 Administrivia FlameWar Games Night Next Friday, April 27 th 5pm

### IA-64 Application Developer s Architecture Guide

IA-64 Application Developer s Architecture Guide The IA-64 architecture was designed to overcome the performance limitations of today s architectures and provide maximum headroom for the future. To achieve

### CS 352H: Computer Systems Architecture

CS 352H: Computer Systems Architecture Topic 14: Multicores, Multiprocessors, and Clusters University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell Introduction Goal:

### Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

Introduction to RISC Processor ni logic Pvt. Ltd., Pune AGENDA What is RISC & its History What is meant by RISC Architecture of MIPS-R4000 Processor Difference Between RISC and CISC Pros and Cons of RISC

### Introduction to Microprocessors

Introduction to Microprocessors Yuri Baida yuri.baida@gmail.com yuriy.v.baida@intel.com October 2, 2010 Moscow Institute of Physics and Technology Agenda Background and History What is a microprocessor?

### 22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC

### SUPERCOMPUTERS SECOND EDITION SEPTEMBER 1991

SUPERCOMPUTERS SECOND EDITION SEPTEMBER 1991 ARCHITECTURE TECHNOLOGY CORPORATION SPECIALISTS IN COMPUTER ARCHITECTURE P.O. BOX 24344 MINNEAPOLIS, MINNESOTA 55424 (612) 935-2035 ELSEVIER ADVANCED TECHNOLOGY

### EE361: Digital Computer Organization Course Syllabus

EE361: Digital Computer Organization Course Syllabus Dr. Mohammad H. Awedh Spring 2014 Course Objectives Simply, a computer is a set of components (Processor, Memory and Storage, Input/Output Devices)

### Parallel Programming

Parallel Programming Parallel Architectures Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Parallel Architectures Acknowledgements Prof. Felix

### Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor. Travis Lanier Senior Product Manager

Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor Travis Lanier Senior Product Manager 1 Cortex-A15: Next Generation Leadership Cortex-A class multi-processor

### Introduction to Computer Architecture Concepts

to Computer Architecture Concepts 1. We will start at the very beginning, first with the fundamental concepts behind the modern digital computer, and then some details of their implementation. Many people,

### İSTANBUL AYDIN UNIVERSITY

İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER

### Battle against Intel CPU Monopoly Moves from Lower Costs to Faster IA-32 Processors.

Battle against Intel CPU Monopoly Moves from Lower Costs to Faster IA-32 Processors. Giovana Mendes Dep. Informática, Universidade do Minho 4710-057 Braga, Portugal gimendes@uol.com.br Abstract.The battle

### Computer Organization and Components

Computer Organization and Components IS5, fall 25 Lecture : Pipelined Processors ssociate Professor, KTH Royal Institute of Technology ssistant Research ngineer, University of California, Berkeley Slides

### Multithreading Lin Gao cs9244 report, 2006

Multithreading Lin Gao cs9244 report, 2006 2 Contents 1 Introduction 5 2 Multithreading Technology 7 2.1 Fine-grained multithreading (FGMT)............. 8 2.2 Coarse-grained multithreading (CGMT)............

### Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007

Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer

### Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of

### EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000 Lecture #11: Wednesday, 3 May 2000 Lecturer: Ben Serebrin Scribe: Dean Liu ILP Execution

AMD Opteron Quad-Core a brief overview Daniele Magliozzi Politecnico di Milano Opteron Memory Architecture native quad-core design (four cores on a single die for more efficient data sharing) enhanced

### Technical Report. Complexity-effective superscalar embedded processors using instruction-level distributed processing. Ian Caulfield.

Technical Report UCAM-CL-TR-707 ISSN 1476-2986 Number 707 Computer Laboratory Complexity-effective superscalar embedded processors using instruction-level distributed processing Ian Caulfield December

### Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

Pipelining HW Q. Can a MIPS SW instruction executing in a simple 5-stage pipelined implementation have a data dependency hazard of any type resulting in a nop bubble? If so, show an example; if not, prove

### ANALYSIS OF SUPERCOMPUTER DESIGN

ANALYSIS OF SUPERCOMPUTER DESIGN CS/ECE 566 Parallel Processing Fall 2011 1 Anh Huy Bui Nilesh Malpekar Vishnu Gajendran AGENDA Brief introduction of supercomputer Supercomputer design concerns and analysis

### Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Interpreters and virtual machines Michel Schinz 2007 03 23 Interpreters Interpreters Why interpreters? An interpreter is a program that executes another program, represented as some kind of data-structure.

### Analysis (III) Low Power Design. Kai Huang

Analysis (III) Low Power Design Kai Huang Chinese new year: 1.3 billion urban exodus 1/28/2014 Kai.Huang@tum 2 The interactive map, which is updated hourly The thicker, brighter lines are the busiest routes.

### Software implementation of Post-Quantum Cryptography

Software implementation of Post-Quantum Cryptography Peter Schwabe Radboud University Nijmegen, The Netherlands October 20, 2013 ASCrypto 2013, Florianópolis, Brazil Part I Optimizing cryptographic software

### Static Scheduling. option #1: dynamic scheduling (by the hardware) option #2: static scheduling (by the compiler) ECE 252 / CPS 220 Lecture Notes

basic pipeline: single, in-order issue first extension: multiple issue (superscalar) second extension: scheduling instructions for more ILP option #1: dynamic scheduling (by the hardware) option #2: static

### High Performance Processor Architecture. André Seznec IRISA/INRIA ALF project-team

High Performance Processor Architecture André Seznec IRISA/INRIA ALF project-team 1 2 Moore s «Law» Nb of transistors on a micro processor chip doubles every 18 months 1972: 2000 transistors (Intel 4004)

### Generations of the computer. processors.

. Piotr Gwizdała 1 Contents 1 st Generation 2 nd Generation 3 rd Generation 4 th Generation 5 th Generation 6 th Generation 7 th Generation 8 th Generation Dual Core generation Improves and actualizations

### Embedded System Hardware - Processing (Part II)

12 Embedded System Hardware - Processing (Part II) Jian-Jia Chen (Slides are based on Peter Marwedel) Informatik 12 TU Dortmund Germany Springer, 2010 2014 年 11 月 11 日 These slides use Microsoft clip arts.

### Systolic Computing. Fundamentals

Systolic Computing Fundamentals Motivations for Systolic Processing PARALLEL ALGORITHMS WHICH MODEL OF COMPUTATION IS THE BETTER TO USE? HOW MUCH TIME WE EXPECT TO SAVE USING A PARALLEL ALGORITHM? HOW

### Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim MIPS Architecture MIPS (Microprocessor without interlocked pipeline stages) MIPS Computer Systems Inc. Developed from Stanford MIPS architecture usages 1990 s R2000, R3000,

### CISC, RISC, and DSP Microprocessors

CISC, RISC, and DSP Microprocessors Douglas L. Jones ECE 497 Spring 2000 4/6/00 CISC, RISC, and DSP D.L. Jones 1 Outline Microprocessors circa 1984 RISC vs. CISC Microprocessors circa 1999 Perspective:

WAR: Write After Read write-after-read (WAR) = artificial (name) dependence add R1, R2, R3 sub R2, R4, R1 or R1, R6, R3 problem: add could use wrong value for R2 can t happen in vanilla pipeline (reads

### Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Chapter 2 Basic Structure of Computers Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Functional Units Basic Operational Concepts Bus Structures Software

### Current Trend of Supercomputer Architecture

Current Trend of Supercomputer Architecture Haibei Zhang Department of Computer Science and Engineering haibei.zhang@huskymail.uconn.edu Abstract As computer technology evolves at an amazingly fast pace,

### Sotirios G. Ziavras, Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, New Jersey 07102, U.S.A.

COMPUTER SYSTEMS Sotirios G. Ziavras, Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, New Jersey 07102, U.S.A. Keywords Computer organization, processor,

### SPARC64 VIIIfx: CPU for the K computer

SPARC64 VIIIfx: CPU for the K computer Toshio Yoshida Mikio Hondo Ryuji Kan Go Sugizaki SPARC64 VIIIfx, which was developed as a processor for the K computer, uses Fujitsu Semiconductor Ltd. s 45-nm CMOS

### High Performance Computing. Course Notes 2007-2008. HPC Fundamentals

High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

### Pentium vs. Power PC Computer Architecture and PCI Bus Interface

Pentium vs. Power PC Computer Architecture and PCI Bus Interface CSE 3322 1 Pentium vs. Power PC Computer Architecture and PCI Bus Interface Nowadays, there are two major types of microprocessors in the

### PROBLEMS. which was discussed in Section 1.6.3.

22 CHAPTER 1 BASIC STRUCTURE OF COMPUTERS (Corrisponde al cap. 1 - Introduzione al calcolatore) PROBLEMS 1.1 List the steps needed to execute the machine instruction LOCA,R0 in terms of transfers between

### Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Objectives The Central Processing Unit: What Goes on Inside the Computer Chapter 4 Identify the components of the central processing unit and how they work together and interact with memory Describe how

### Interconnection Networks

Advanced Computer Architecture (0630561) Lecture 15 Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept. Interconnection Networks: Multiprocessors INs can be classified based on: 1. Mode