COSC 243. Computer Architecture 2. Lecture 13 Computer Architecture 2. COSC 243 (Computer Architecture)

Similar documents

CISC, RISC, and DSP Microprocessors

Computer Architecture TDTS10

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

Chapter 2 Logic Gates and Introduction to Computer Architecture

LSN 2 Computer Processors

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Introduction to Cloud Computing

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Processor Architectures

Generations of the computer. processors.

Design Cycle for Microprocessors

IA-64 Application Developer s Architecture Guide

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit

Thread level parallelism

2

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr Teruzzi Roberto matr IBM CELL. Politecnico di Milano Como Campus

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

GPUs for Scientific Computing

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

ELE 356 Computer Engineering II. Section 1 Foundations Class 6 Architecture

İSTANBUL AYDIN UNIVERSITY

Instruction Set Architecture (ISA)

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May ILP Execution

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

Using Power to Improve C Programming Education

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Week 1 out-of-class notes, discussions and sample problems

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

Intel 8086 architecture

Operating System Impact on SMT Architecture

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

Computer Architectures

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Introduction to GPU Programming Languages

Instruction Set Architecture

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

VLIW Processors. VLIW Processors

An Introduction to the ARM 7 Architecture

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

Pipelining Review and Its Limitations

Introduction to Microprocessors

Low Power AMD Athlon 64 and AMD Opteron Processors

WAR: Write After Read

CPU Organization and Assembly Language

PROBLEMS. which was discussed in Section

Operating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

Chapter 6. Inside the System Unit. What You Will Learn... Computers Are Your Future. What You Will Learn... Describing Hardware Performance

PROBLEMS #20,R0,R1 #$3A,R2,R4

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

1. Memory technology & Hierarchy

Introducción. Diseño de sistemas digitales.1

Central Processing Unit (CPU)

SOC architecture and design

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Industry First X86-based Single Board Computer JaguarBoard Released

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

Technical Report. Complexity-effective superscalar embedded processors using instruction-level distributed processing. Ian Caulfield.

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Chapter 5 Instructor's Manual

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

This Unit: Multithreading (MT) CIS 501 Computer Architecture. Performance And Utilization. Readings

ARM Microprocessor and ARM-Based Microcontrollers

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

12. Introduction to Virtual Machines

Parallel Programming Survey

Intel DPDK Boosts Server Appliance Performance White Paper

Enabling Technologies for Distributed and Cloud Computing

The Central Processing Unit:

Operating System Overview. Otto J. Anshus

Multi-Core Programming

Computer Organization

Using the Game Boy Advance to Teach Computer Systems and Architecture

Chapter 1 Computer System Overview

CHAPTER 7: The CPU and Memory

ADVANCED COMPUTER ARCHITECTURE

Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27

CSC 2405: Computer Systems II

Computer Organization and Architecture

In the Beginning The first ISA appears on the IBM System 360 In the good old days

Comparative Performance Review of SHA-3 Candidates

The Design of the Inferno Virtual Machine. Introduction

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.

CSCI 4717 Computer Architecture. Function. Data Storage. Data Processing. Data movement to a peripheral. Data Movement

OC By Arsene Fansi T. POLIMI

Operating System Software

Enabling Technologies for Distributed Computing

EMBEDDED SYSTEM BASICS AND APPLICATION

Multi-core Programming System Overview

7a. System-on-chip design and prototyping platforms

A Lab Course on Computer Architecture

Software Pipelining. for (i=1, i<100, i++) { x := A[i]; x := x+1; A[i] := x

Transcription:

COSC 243 1

Overview This Lecture Architectural topics CISC RISC Multi-core processors Source: lecture notes Next Lecture Operating systems 2

Moore s Law 3

CISC What is the best thing to do with all those transistors? Add extra instructions? Make the CPU do more (integrated cache, etc)? Pipelines? We call these Complex Instruction Set Computers 4

High Level Languages As the cost of a computer dropped the relative cost of software went up As computers became more ubiquitous the need to port software from one machine to another increased As the complexity of software went up the need to use high level languages increased Programs today are almost always written in high level languages As time went on languages became higher You could do more in the same number of lines of code 5

The Semantic Gap A semantic gap appeared. Programming languages are disconnected from CPU architecture This is part of the purpose of high level languages New instructions were added to the CPU, but: They were not being used by programmers Who wrote in high level languages They were not being used by the compilers It wasn t worthwhile re-writing the compiler for each release of a CPU The new instructions were being ignored We need a CPU optimized for high level language use 6

Some research Compiled higher level programs: Do a lot of branching and procedure calls Mostly operate on a small number of local variables In fact: Almost a third of the CPU's time is spent making procedure calls and branches Most functions have fewer than 6 local variables Most memory access is due to procedure calls Can we use these observations to speed up the CPU? 7

RISC Reduced Instruction Set Computers Three design principles: Large number of registers This reduces the number of memory accesses Careful design of the pipeline for conditional branches Better handling of if statements and procedure calls Simplified (reduced) instruction set Each instruction does less Fewer addressing modes Often just as many instructions as in a CISC CPU Reduced complexity does not mean reduced number of instructions 8

RISC Characteristics: One instruction per cycle (all instructions take the same time) This keeps the pipeline simple Register to register operations All memory access is via dedicate load and store instrucitons Simple addressing modes Simple instruction formats Fixed instruction length Aligned on machine word boundaries (for fast CPU load) 9

RISC Register Windows Most memory access is because of procedure calls: Local variables on the stack Function parameters on the stack Some RISC processors move the stack into the CPU as a special bank of registers This means that we don't need to spend time on memory access: Writing parameters onto the stack Accessing local variables on the stack 10

CISC vs. RISC RISC requires more program instructions than CISC RISC instructions are simplified Have fewer addressing modes Take less memory space to store CISC does more per instruction But the control unit is more complex (and so slower) The microcode is more complex (and so slower) The microcode is often a RISC program! 11

CISC vs. RISC CISC: Minimize instructions per program Increase cycles per instruction RISC: Minimise cycles per instruction Increase instructions per program time program time cycle cycles instruction instructions program 12

CISC vs. RISC: Who won? CISC: Intel architecture in PCs, Macs (now), servers RISC: ARM architecture in phones, tablets, almost everything else MIPS, SPARC, PowerPC in some unix systems and old Macs Hybrid systems: Modern Intel CPUs have a RISC core with a translation layer High performance RISC chips have adopted some CISC characteristics like more instructions and variable length instructions 13

Superpipelines We can make pipeline stages very simple by adding more pipeline stages If every pipeline stage is simple (short gate delay) then we can increase the clock speed Double number of pipeline stages, double the clock speed, more instructions complete per second But longer pipelines increase the likelihood of hazards, and the cost of mistakes in branch prediction 14

Superscalar Why use only one pipeline, lets have two. Then we can execute two instructions at once! This is called instruction-level parallelism. Five limitations to instruction-level parallelism True data dependency (write after read, WAR) Output dependency (write after write, WAW) Antidependency (read after write, RAW) Procedural dependency Conditional branches require a pipeline reload Resource conflicts Both pipelines require access to memory at the same time 15

Superscalar However, if there are no dependencies then the instructions need not be executed in the correct order This is known as out-of-order execution In-order issue with in-order completion Instructions must start and finish in the correct order In-order issue with out-of-order completion The CPU starts the instructions in order but the second one finished before the first one! Out-of-order issue with out-of-order completion The CPU does the next instruction before the current one! E.g. (TSX then TYA) does the order matter? 16

Superscalar A program is a linear sequence of instructions Instruction fetch with branch prediction Produces an instruction stream Stream examined for dependencies Instructions are re-ordered by their dependencies Instructions are executed based on dependencies on each other and the hardware resources Results are recorded or discarded Discarded in the case of speculative prediction 17

Superscalar Window of Execution Instruction Fetch and Branch Prediction Produces an Instruction Stream Instruction Execution Instruction Re-order And Commit Static Program Instruction Dispatch Instruction Issue 18

Hyperthreading (SMT) We can do more!!!! The CPU slows down when we access memory The pipeline slows down when we have dependencies Can we write programs that do more than one thing at a time but whose parts don t interact (much)? Yes! We can use threading Actually, the OS switches between programs too Perhaps we can build that into the CPU too 19

Hyperthreading (SMT) Imagine a superscalar architecture with 2 pipelines Each pipeline reads from a different part of memory Each pipeline has a separate set of registers If one pipeline becomes stalled the other keeps going Two programs are executed at the same time! This is called Simultaneous Multithreading (SMT) This is the approach of the Intel Hyperthreading CPUs Such as the Pentium 4 20

Heat! The heat dissipation in transistor is linear in the switching rate. The faster you switch the more heat you get The total amount of heat generated is linear in the number of transistors on the silicon die Both have been following Moore s law! But transistors have been getting smaller too, so they dissipate less heat each Overall a huge increase in heat: ~1 watt for early single chip CPUs > 150 watts for current top end CPUs 21

Multi-Core How can we reduce the heat? The obvious solution is to go slower How can we go slower and faster at the same time? Instead of having one CPU on the silicon die we put two. They share certain resources including: The buses The level 2 cache 22

Vectors What if you want to do the same operation over and over again? One way is to tell the CPU how many times to repeat the operation. (i.e. make a loop) Other way is to have a special CPU that performs the same instruction on many chunks of data at once the instruction decoding only occurs once. We call this Single Instruction Multiple Data (SIMD) 23

Classification of Architectures Single instruction, single data (SISD) Normal computer Single instruction, multiple data (SIMD) Intel SSE instructions / Cray / etc Graphics processors Multiple instruction, multiple data (MIMD) Multi-core 24

That s All Folks If you need help I m on email I m in room 248 The Tutors / Teaching Fellows can also help Good luck with the exam! 25