How To Understand The Design Of A Microprocessor

Similar documents

Introducción. Diseño de sistemas digitales.1

İSTANBUL AYDIN UNIVERSITY

EE361: Digital Computer Organization Course Syllabus

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Processor Architectures

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Architectures and Platforms

SOC architecture and design

A Lab Course on Computer Architecture

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Computer Systems Structure Main Memory Organization

Performance evaluation

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

Instruction Set Design

picojava TM : A Hardware Implementation of the Java Virtual Machine

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

CHAPTER 7: The CPU and Memory

Parallel Programming Survey

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Enabling Technologies for Distributed Computing

Computer Organization and Components

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit

ADVANCED COMPUTER ARCHITECTURE

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

Enabling Technologies for Distributed and Cloud Computing

Parallel Algorithm Engineering

LSN 2 Computer Processors

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉志尉 National Chiao Tung University

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

Computer Organization

Introduction to Microprocessors

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?

on an system with an infinite number of processors. Calculate the speedup of

OC By Arsene Fansi T. POLIMI

The Central Processing Unit:

Memory Hierarchy. Arquitectura de Computadoras. Centro de Investigación n y de Estudios Avanzados del IPN. adiaz@cinvestav.mx. MemoryHierarchy- 1

Design Cycle for Microprocessors

VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27

Unit 4: Performance & Benchmarking. Performance Metrics. This Unit. CIS 501: Computer Architecture. Performance: Latency vs.

CSEE W4824 Computer Architecture Fall 2012

Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng

Chapter 1 Computer System Overview

Computer Architecture TDTS10

1. Memory technology & Hierarchy

Chapter 6. Inside the System Unit. What You Will Learn... Computers Are Your Future. What You Will Learn... Describing Hardware Performance

Operating Systems 4 th Class

Thread level parallelism

CISC, RISC, and DSP Microprocessors

VLIW Processors. VLIW Processors

Operating System Impact on SMT Architecture

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Chapter 2 Heterogeneous Multicore Architecture

Computer organization

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May ILP Execution

Management Challenge. Managing Hardware Assets. Central Processing Unit. What is a Computer System?

7a. System-on-chip design and prototyping platforms

Week 1 out-of-class notes, discussions and sample problems

Introduction to Cloud Computing

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

COMPUTER SCIENCE AND ENGINEERING - Microprocessor Systems - Mitchell Aaron Thornton

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

High Performance Processor Architecture. André Seznec IRISA/INRIA ALF project-team

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING

Chapter 2 Logic Gates and Introduction to Computer Architecture

Program Optimization for Multi-core Architectures

Hardware/Software Co-Design of a Java Virtual Machine

CSE 141L Computer Architecture Lab Fall Lecture 2

Memory Systems. Static Random Access Memory (SRAM) Cell

CSC 2405: Computer Systems II

Parallelism and Cloud Computing

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr Teruzzi Roberto matr IBM CELL. Politecnico di Milano Como Campus

Price/performance Modern Memory Hierarchy

Chapter 2: Computer-System Structures. Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General System Architecture

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

ARM Microprocessor and ARM-Based Microcontrollers

Administrative Issues

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Computer Organization and Architecture. Characteristics of Memory Systems. Chapter 4 Cache Memory. Location CPU Registers and control unit memory

What is a System on a Chip?

SPARC64 VIIIfx: CPU for the K computer

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design

Intel 8086 architecture

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Computer Systems Design and Architecture by V. Heuring and H. Jordan

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Binary search tree with SIMD bandwidth optimization using SSE

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Parallel Computing. Benson Muite. benson.

Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013

Transcription:

Computer Architecture R. Poss 1

What is computer architecture? 2

Your ideas and expectations What is part of computer architecture, what is not? Who are computer architects, what is their job? What is the role of computer architecture in science? Why do you need to know about computer architecture? 3

Vocabulary List words/concepts that you want explained during this course: About the hardware/software interface About hardware components About choices in design About how to be a better programmer etc. - anything you think is relevant 4

An engineering domain layers of composition and complexity: from parts to whole refined matter electronics logic circuits components platforms systems petro-chemicals packaging Backplanes Processors metals metal oxides GaA/Si/SiGe/SiC crystals semiconductors magnetic substrates CMOS NMOS ELECTRONIC ENGINEERING You are here The hidden partner activity: compilers, operating systems Links Functional Units Storage Memories Caches Networks COMPUTER ARCHITECTURE Algorithms Frameworks Operating software SOFTWARE ENGINEERING Computing platforms Software programs Computational clusters Embedded systems Personal computers Game consoles SYSTEMS ARCHITECTURE 5

Current state of affairs Power vs chip area: before: power free, transistors expensive now: power expensive, transistors cheap Storage vs computation: before: computation slow, storage fast now: storage slow, computation fast Computation vs storage cost: before: small storage, ok to compute more to save space (eg compression) now: large storage, expensive to compute more 6

Trends - the free lunch is over This is the story of uniprocessor performance This is the power wall This is the memory wall + seq. performance wall http://www.gotw.ca/publications/concurrency-ddj.htm 7

Current state of affairs (cont) A dramatic change in processor chips: Memory wall: processors much faster than memories Power wall: can t power all transistors lest the chip will fry Sequential performance wall: more transistors don t help sequential performance any more This course will suggest how we got here, why these problems are happening and what we can do about it 8

Example questions You are in charge of selecting a processor chip for a new web server. You have a choice between two chips, a 1-core running at 2GHz and a 2-core running at 1 GHz. They use the same core microarchitecture, have nearly the same price. How do you choose? Why? The web server will support encrypted SSL connections. You can choose a 4-core processor, 4MB of cache on chip with a cryptographic accelerator with 4 channels. Or you can choose a 4-core processor with 8MB of cache on chip but no accelerator. How do you choose? 9

Course & organization 10

Aims of this course The aims of this course are: to introduce the notion of hardware/software interface and variations in ISA design to give a thorough understanding of modern microprocessor design and related issues to introduce parallelism in computer architecture to introduce simulators & architecture models 11

Bibliography The main course text is: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN 978-0-12-370490-0. Other useful texts are: Processor Architecture, Silc, Robic and Ungerer, Springer, ISBN 13-540-64798-8 D. Sima, T. Fountain and P. Kacsuk, Advanced Computer Architecture a Design space approach (Addison-Wesley) Your own search-fu -- use Google Scholar! 12

Assessment Assessment will be by coursework assignments and exam The following weighting will be used Homework 20% Exam 30% Lab assignments 50% Assignment labs start on Sep 3rd, supervised by M. van Drunen, R. Ooms and F. Treurniet http://www.liacs.nl/ca Assignment assessment will be based on demonstration to the lab supervisors (60%) and a brief report (40%) to me on the observations of your results, deadline in details 13

Assessment Lab1 Lab2 Lab3 Exam Homework Lab1 10% Lab2 20% Lab3 20% Homework 20% Exam 30% 14

Communication & contact Please use the mailing list https://groups.google.com/d/forum/liacs-ca-2014 Prefer group discussions to one-on-one interactions There are no stupid questions, don t be ashamed However, try to find if your question has already been answered first 15

Course overview The following topics will be covered in a bottom-up approach to the subject 1. Hardware/software interface, ISAs 2. Processors & implicit concurrency Pipelined processors. superscalar microprocessors 3. Memory, caches, interconnects & topology 4. Explicit concurrency & parallelism in systems Vector processors, VLIW, multi-cores, hardware multi-threading 5. Co-processors and accelerators 16

Getting started 17

Contribution of Comp. Arch. Quantitative principles of design Take advantage of parallelism Principle of locality Focus on the common case Amdahl s laws Careful, quantitative comparison: define, quantify, summarize Anticipating and exploiting advances in technology Well-defined interfaces, carefully implemented and thoroughly defined 18

Parallelism Three main strategies: Increase bandwidth and throughput by duplicating storage and data paths Use pipelining, ie assembly line Perform operations out of order, including simultaneously Fundamental limits: pipeline hazards time and data dependencies = mandatory order 19

Locality Principle: individual programs access a relatively small portion of memory in a small amount of time Two different types: Temporal locality: if an item is referenced, it will tend to be referenced again soon (e.g., loops, reuse) Spatial locality: if an item is referenced, items whose addresses are close by tend to be referenced soon (e.g., straight-line code, array access) Caches are a fundamental mechanism to take advantage of locality 20

Memory hierarchy and latency Registers - on-chip SRAM L1 cache - on-chip SRAM L2 cache - on-chip SRAM off-chip SRAM L3 cache - off-chip SRAM Main memory - DRAM Distributed memory 1/cycle time 100MHz clocks < 1 1-2 2-6 4-8 10 10 100 GHz clocks < 1 1-6 10 10 100 1000 10000 Size 21

Focus on the common case In making a design trade-off, favor the frequent case over the infrequent case E.g., Instruction fetch and decode unit used more frequently than multiplier, so optimize it 1st E.g., If database server has 50 disks / processor, storage dependability dominates system dependability, so optimize it 1st Frequent case is often simpler and can be done faster than the infrequent case What is frequent case and how much performance improved by making case faster => Amdahl s Law 22

Amdahl s law on speedup Consider a computation P which contains two parts A and B in sequence A can be enhanced (eg more parallelism, more performance); B cannot T(P) = T(A) + T(B) ( T = time to complete ) Imagine we can accelerate A infinitely so that T(A) becomes 0 Intuitively: overall speedup is limited by T(B) If the complexity ratio between A and B is P[A/B] (proportion), and A can be accelerated by a factor SA (speedup), Amdalh s law says: Soverall = 1 / ( (1 - P[A/B]) + (P[A/B] / SA) ) 23

Amdahl s law example An algorithm contains a sequential section and a parallel section The parallel section contains 20% of the computation steps (P=0.2) The parallel section can be accelerated by a factor N by using N processors/cores Maximum speedup with N cores = 1 / ( ( 1-0.2) + (0.2 / N) ) With N = 100, speedup = 1.24X (100 cores, yet only 24% perf increase!) This is the fundamental limit to parallelism: to maximize performance gains, need to first increase the proportion of the parallel section. 24

Amdahl s law on design A balanced system design should provision 1 bit per second of external bandwidth for each potential instruction per second Too little external bandwidth: I/O bound Too little instructions/second: compute-bound Desktop computers are traditionally I/O bound Mainframes are the other way around Multi-cores require huge amount of bandwidth to stay balanced 25

Lab assignments 26

Lab assignments 1st series: getting to know the hardware/software interface, make your own MIPS code 2nd series: implement your own MIPS-like ISA in a simulator/emulator 3rd series: implement your own cache protocol 4th series: implement your own branch predictor Total: 50% of final grade; First series: 10% of final grade; 2nd: 15%, 3rd: 15%, 4th: 10% 27

How do programs run? General computer model: processor + memory + interconnect + I/O devices Software is just bits, so is data How does software translate into behavior? ie. communication, computation and control? Your take here 28

Computers and interpreters A computer processes inputs and produces output Both inputs and outputs are just bits of data A universal computer also reads what to do (program) as data - it interprets the program code step by step as instructions Software = data = program code + input NB: The behavior of software comes from the machine that interprets it 29

Multiple layers of interpreters Java bytecode and real hardware: Java VM is an interpreter for Java bytecode Hardware processor is an interpreter for Java VM program code Python bytecode and real hardware: CPython is an interpreter for Python bytecode Hardware processor is an interpreter for CPython System simulators/emulators and real hardware MGSim is an interpreter for Alpha/SPARC/MIPS program code Hardware processor is an interpreter for MGSim 30

System initialization What happens when you switch the computer on? Define/explain the relationships between: Reset signal Initial program counter Boot ROM Boot code Disks I/O interface CPU Operating system Start-up storage ROM RAM 31

What does this code do? sum: L3: L2:.ent sum mov $31,$0 ble $16,L2 mov $31,$1 ldl $2,0($17) addl $2,$0,$0 addl $1,1,$1 lda $17,4($17) cmpeq $1,$16,$2 beq $2,L3 ret $31,($26),1.end sum $31 is a special register zero Alpha assembly uses left-to-right operands - except for ldx and br/jsr/ret Function arguments passed in $16-$21 ble = branch if lower or equal lda = load address, a form of add $26 is also called ra for return address 32

What does this code do? sum: L3: L2:.ent sum mov $rz,$0 ble $a0,l2 mov $rz,$1 ldl $2,0($a1) addl $2,$0,$0 addl $1,1,$1 lda $a1,4($a1) cmpeq $1,$a0,$2 beq $2,L3 ret $rz,($ra),1.end sum $a0 is an alias for the concrete register name $16 The assembler translates the former to the latter automatically ret A, (B) means place the current PC in A, then jump to the address in B 33

How did we get this code? C source for sum.c: int sum(int n, int x[]) { int s = 0; for (int i = 0; i < n; ++i) s += x[i]; return s; } Compile: alpha-cc -S -o sum.s sum.c Assemble: alpha-as -o sum.o sum.c Link: alpha-ld -o demo sum.o... 34

Let s run this You probably don t have an Alpha processor at hand First, let s try to compile/assemble/link/run natively eg. using the x86 instruction set But this course is about RISC - the example uses the Alpha instruction set so let s use an emulator instead! Emulator = program running on processor A that interprets program code made for another processor B Simulator = program that mimics the behavior of another system 35

Why simulators? Why not native? This course will talk about the processor(s) in your desktops/ laptop machines But all x86 processors are really RISC under the hood More useful to study RISC to understand the main problems Also: for your lab assignments you will modify an architecture and study its parameters Easier with a simulator than real hardware! 36

Intro to MGSim Developed at the University of Amsterdam Purpose is to simulate multi-cores and do architecture research Simulates: cores, memory, interconnect, I/O devices i.e. it is a full-system simulator It emulates the Alpha and SPARC instruction sets you will add support for MIPS in your lab assignments 37

Intro to MGSim See the manual page mgsimdoc(7) System diagram for minisim.ini: bootrom cpu0 memory command to start: mgsim -c minisim.ini yourprogram.bin Add -i for an interactive prompt 38

Summary Seen today: What is computer architecture and why it is important Some general principles of architecture Intro to the hardware/software interface, machine instructions and system initialization How to compile/assemble/link/run a program in a simulated environment 39