A Brief Review of Processor Architecture. Why are Modern Processors so Complicated? Basic Structure



Similar documents
Introduction to Cloud Computing

VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Computer organization

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Execution Cycle. Pipelining. IF and ID Stages. Simple MIPS Instruction Formats

A Lab Course on Computer Architecture

Central Processing Unit (CPU)

Chapter 1 Computer System Overview

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

CPU Organisation and Operation


SOC architecture and design

MICROPROCESSOR AND MICROCOMPUTER BASICS

Using Power to Improve C Programming Education

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Computer Organization and Architecture. Characteristics of Memory Systems. Chapter 4 Cache Memory. Location CPU Registers and control unit memory

CS352H: Computer Systems Architecture

Computer Organization. and Instruction Execution. August 22

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

(Refer Slide Time: 00:01:16 min)

361 Computer Architecture Lecture 14: Cache Memory

Introducción. Diseño de sistemas digitales.1

Performance evaluation

Computer Architecture TDTS10

The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy

GPUs for Scientific Computing

LSN 2 Computer Processors

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

CHAPTER 7: The CPU and Memory

on an system with an infinite number of processors. Calculate the speedup of

Board Notes on Virtual Memory

UNIVERSITY OF CALIFORNIA, DAVIS Department of Electrical and Computer Engineering. EEC180B Lab 7: MISP Processor Design Spring 1995

Memory Basics. SRAM/DRAM Basics

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

WAR: Write After Read

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

Computer Organization and Components

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer.

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

Chapter 2 - Computer Organization

Lecture: Pipelining Extensions. Topics: control hazards, multi-cycle instructions, pipelining equations

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Memory Hierarchy. Arquitectura de Computadoras. Centro de Investigación n y de Estudios Avanzados del IPN. adiaz@cinvestav.mx. MemoryHierarchy- 1

Pipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1

Chapter 6. Inside the System Unit. What You Will Learn... Computers Are Your Future. What You Will Learn... Describing Hardware Performance

How To Understand The Design Of A Microprocessor

Chapter 2: Computer-System Structures. Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General System Architecture

CSCI 4717 Computer Architecture. Function. Data Storage. Data Processing. Data movement to a peripheral. Data Movement

AMD Opteron Quad-Core

Chapter 02: Computer Organization. Lesson 04: Functional units and components in a computer organization Part 3 Bus Structures

Rethinking SIMD Vectorization for In-Memory Databases

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

What is a bus? A Bus is: Advantages of Buses. Disadvantage of Buses. Master versus Slave. The General Organization of a Bus

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Central Processing Unit

Pitfalls of Object Oriented Programming

SPARC64 VIIIfx: CPU for the K computer

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Thread level parallelism

1. Memory technology & Hierarchy

VLIW Processors. VLIW Processors

l C-Programming l A real computer language l Data Representation l Everything goes down to bits and bytes l Machine representation Language

Static Scheduling. option #1: dynamic scheduling (by the hardware) option #2: static scheduling (by the compiler) ECE 252 / CPS 220 Lecture Notes

CPU Organization and Assembly Language

Computer Architecture

Generations of the computer. processors.

CHAPTER 4 MARIE: An Introduction to a Simple Computer

Technical Note. Micron NAND Flash Controller via Xilinx Spartan -3 FPGA. Overview. TN-29-06: NAND Flash Controller on Spartan-3 Overview

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May ILP Execution

MACHINE ARCHITECTURE & LANGUAGE

Course on Advanced Computer Architectures

Week 1 out-of-class notes, discussions and sample problems

Linux Process Scheduling Policy

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

Architecture of Hitachi SR-8000

Computer Organization and Architecture

Microprocessor & Assembly Language

Lua as a business logic language in high load application. Ilya Martynov ilya@iponweb.net CTO at IPONWEB

Computer Systems Structure Input/Output

Next Generation GPU Architecture Code-named Fermi

Pentium vs. Power PC Computer Architecture and PCI Bus Interface

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1

ADVANCED COMPUTER ARCHITECTURE

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27

Low Power AMD Athlon 64 and AMD Opteron Processors

CSEE W4824 Computer Architecture Fall 2012

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two

2

The Central Processing Unit:

Levels of Programming Languages. Gerald Penn CSC 324

Transcription:

A Brief Review of Processor Architecture Why are Modern Processors so Complicated? Basic Structure CPU PC IR Regs ALU Memory Fetch PC -> Mem addr [addr] > IR PC ++ Decode Select regs Execute Perform op Write result Each inst. Takes multiple (clock) cycles 1

Simple Pipeline Fetch Decode Execute Mem Writeback PC REGS ALU MEM Clock Cycle 1 2 3 4 5 6 7 Inst a IF ID EX MEM WB Inst b IF ID EX MEM WB Inst c IF ID EX MEM WB Inst d IF ID EX MEM Inst e IF ID EX Superscalar Up to 4 instructions in parallel PC S C H E D U L E R E G S Pipeline1 Pipeline2 Pipeline3 Pipeline4 4x Speedup? Maybe 2.5 8 pipelines not worth it? 2

Superscalar Complexity Data dependencies Enough parallelism in program? Cache misses Cache contention Out of order execution? Register renaming? Precise exceptions Hardware complexity Clock distribution Further Reading Computer Architecture: A Quantitative Approach, 4th Edition John L. Hennessy, David A. Patterson Morgan Kaufmann ISBN-13: 978-0123704900 Chapters 1 & 2 3

Caches What are they? How do they work? What are the problems? Basic CPU Operation CPU Memory Random Access Memory (RAM) Indexed by address CPU Program Counter (PC) fetches program instructions from successive addresses (+branches) Instructions (may) address memory read or write CPU performs operations (ADD, TEST, BRANCH etc.) 4

Why Cache Memory? Modern processor speed > 1GHz 1 instruction / nsec (10-9 sec) Every instruction needs to be fetched from memory. Many instructions (1 in 3?) also access memory to read or write data. But RAM memory access time typically 50 nsec! (67 x too slow!) What is Cache Memory? Dictionary Cache: A secret hiding place Small amount of very fast memory used as temporary store for frequently used memory locations (both instructions and data) Relies on fact (not always true) that, at any point in time, a program uses only a small subset (working set) of its instructions and data. 5

Facts about memory speeds Circuit capacitance is the thing that makes things slow (needs charging) Bigger things have bigger capacitance So large memories are slow Dynamic memories (storing data on capacitance) are slower than static memories (bistable circuits) Interconnection Speeds External wires also have significant capacitance. Driving signals between chips needs special high power interface circuits. Things within a VLSI chip are fast anything off chip is slow. Put everything on a single chip? Maybe one day! Manufacturing limitations 6

Basic Level 1 (L1) Cache Usage CPU Registers On-chip L1 Cache RAM Memory Compiler makes best use of registers they are the fastest. Anything not in registers must go (logically) to memory. But is there a copy in cache? Fully Associative Cache (1) Address Memory Address From CPU Data Data to/from CPU Cache is small (so fast) (16 kbytes?) Can only hold a few memory values Memory stores both addresses and data Hardware compares Input address with All stored addresses (in parallel) 7

Fully Associative Cache (2) If address is found this is a cache hit data is read (or written) If address is not found this is a cache miss must go to main RAM memory But how does data get into the cache? If we have to go to main memory, should we just leave the cache as it was or do something with the new data? Locality Caches rely on locality Temporal Locality things when used will be used again soon e.g. instructions and data in loops. Spatial locality things close together (adjacent addresses) in store are often used together. (e.g. instructions & arrays) 8

What to do on a cache miss Temporal locality says it is a good idea to keep recently used data in the cache. Assume a store read for the moment, as well as using the data: Put newly read value into the cache But cache may be already full Need to choose a location to reject (replacement policy) Cache replacement policy Least Recently Used (LRU) makes sense but hard to implement (in hardware) Round Robin (or cyclic) - cycle round locations least recently fetched from memory Random easy to implement not as bad as it might seem. 9

Cache Write Strategy (1) Writes are slightly more complex than reads We always try to write to cache - address may or may not exist there Cache hit Update value in cache But what about value in RAM? If we write to RAM every time it will be slow Does RAM need updating every time? Cache Write Strategy (2) Write Through Every cache write is also done to memory - slow Write Through with buffers Write through is buffered i.e processor doesn t wait for it to finish (but multiple writes could back up) Copy Back Write is only done to cache (mark as dirty ) RAM is updated when dirty cache entry is replaced (or cache is flushed e.g. on process switch) 10

Cache Write Strategy (3) Cache miss on write Write Around - just write to RAM - Subsequent read will cache it if necessary Write Allocate - Assign cache location and write value - May need to reject existing entry - Write through back to RAM - Or rely on copy back later Cache Write Strategy (4) Fastest is Write Allocate / Copy Back most often used But cache & memory are not coherent (i.e. can have different values) Does this matter? Other things accessing the same memory? Autonomous I/O devices Multi-processors Needs special handling 11

The Multi-Core Coherence Problem So, with multiple cores, each with their own cache, we have a serious problem Each may be writing to its own cache and not updating shared memory Not a problem if programs on the individual cores do not share variables But whole purpose of a shared memory programming model is to communicate via shared memory 12