Introduction to Microprocessors



Similar documents
Pipelining Review and Its Limitations

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

Introducción. Diseño de sistemas digitales.1

ELE 356 Computer Engineering II. Section 1 Foundations Class 6 Architecture

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Chapter 6. Inside the System Unit. What You Will Learn... Computers Are Your Future. What You Will Learn... Describing Hardware Performance

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Design Cycle for Microprocessors

Parallel Programming Survey

CSCI 4717 Computer Architecture. Function. Data Storage. Data Processing. Data movement to a peripheral. Data Movement

CSCA0102 IT & Business Applications. Foundation in Business Information Technology School of Engineering & Computing Sciences FTMS College Global

What is this course is about? Design of Digital Circuitsit. Digital Integrated Circuits. What is this course is about?

Generations of the computer. processors.

Why Latency Lags Bandwidth, and What it Means to Computing

Hardware: Input, Processing, and Output Devices. A PC in Every Home. Assembling a Computer System

Chapter 2 Logic Gates and Introduction to Computer Architecture

Chapter 1 Introduction to The Semiconductor Industry 2005 VLSI TECH. 1

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit


Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27

OC By Arsene Fansi T. POLIMI

Computer Systems Structure Main Memory Organization

Putting it all together: Intel Nehalem.

Chapter 4 System Unit Components. Discovering Computers Your Interactive Guide to the Digital World

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

Lecture 5: Cost, Price, and Price for Performance Professor Randy H. Katz Computer Science 252 Spring 1996

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

How To Understand The Design Of A Microprocessor

Computer Architecture TDTS10

Parallel Algorithm Engineering

Microprocessor or Microcontroller?

Discovering Computers Living in a Digital World

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

01 Introduction. The timeline

Price/performance Modern Memory Hierarchy

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Computer Architectures

Thread level parallelism

Giving credit where credit is due

Assessing trends in the electrical efficiency of computation over time

1. Memory technology & Hierarchy

Multi-Threading Performance on Commodity Multi-Core Processors

Memory Hierarchy. Arquitectura de Computadoras. Centro de Investigación n y de Estudios Avanzados del IPN. adiaz@cinvestav.mx. MemoryHierarchy- 1

CISC, RISC, and DSP Microprocessors

! Metrics! Latency and throughput. ! Reporting performance! Benchmarking and averaging. ! CPU performance equation & performance trends

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Pentium vs. Power PC Computer Architecture and PCI Bus Interface

CS 159 Two Lecture Introduction. Parallel Processing: A Hardware Solution & A Software Challenge

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design

Introduction to CMOS VLSI Design

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Computer Architecture. Secure communication and encryption.

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Operating Systems: Basic Concepts and History

Communicating with devices

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Robert Wagner

The Central Processing Unit:

RUNAHEAD EXECUTION: AN EFFECTIVE ALTERNATIVE TO LARGE INSTRUCTION WINDOWS

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

SPARC64 VIIIfx: CPU for the K computer

White Paper Open-E NAS Enterprise and Microsoft Windows Storage Server 2003 competitive performance comparison

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING

September 25, Maya Gokhale Georgia Institute of Technology

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Introduction to Semiconductor Manufacturing Technology. Chapter 1, Introduction. Hong Xiao, Ph. D.

Machine Architecture and Number Systems. Major Computer Components. Schematic Diagram of a Computer. The CPU. The Bus. Main Memory.

CHAPTER 2: HARDWARE BASICS: INSIDE THE BOX

Design of Digital Circuits (SS16)

Technical Product Specifications Dell Dimension 2400 Created by: Scott Puckett

<Insert Picture Here> T4: A Highly Threaded Server-on-a-Chip with Native Support for Heterogeneous Computing

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

Computer Systems Design and Architecture by V. Heuring and H. Jordan

7a. System-on-chip design and prototyping platforms

Overview. CPU Manufacturers. Current Intel and AMD Offerings

Area 3: Analog and Digital Electronics. D.A. Johns

Fusionstor NAS Enterprise Server and Microsoft Windows Storage Server 2003 competitive performance comparison

CHAPTER 7: The CPU and Memory

CPU Organization and Assembly Language

Performance evaluation

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

Lecture 2: Computer Hardware and Ports.

Memory ICS 233. Computer Architecture and Assembly Language Prof. Muhamed Mudawar

Computer Organization and Architecture. Characteristics of Memory Systems. Chapter 4 Cache Memory. Location CPU Registers and control unit memory

Computer Architecture

The Big Picture. Cache Memory CSE Memory Hierarchy (1/3) Disk

U. Wisconsin CS/ECE 752 Advanced Computer Architecture I

Chapter 2 Parallel Computer Architecture

Multithreading Lin Gao cs9244 report, 2006

Low Power AMD Athlon 64 and AMD Opteron Processors

Virtualization and Cloud Computing. Sorav Bansal

FPGA Design From Scratch It all started more than 40 years ago

CSEE W4824 Computer Architecture Fall 2012

Fall Lecture 1. Operating Systems: Configuration & Use CIS345. Introduction to Operating Systems. Mostafa Z. Ali. mzali@just.edu.

EEM 486: Computer Architecture. Lecture 4. Performance

Introduction to Cloud Computing

Transcription:

Introduction to Microprocessors Yuri Baida yuri.baida@gmail.com yuriy.v.baida@intel.com October 2, 2010 Moscow Institute of Physics and Technology

Agenda Background and History What is a microprocessor? What is the history of the development of the microprocessor? How does transistor scaling affect processor design? PC Components What are the major PC components and their functions? What is memory hierarchy and how has it changed? Processor Architecture What are processor architecture and microarchitecture? How does microarchitecture affect performance? How is performance measured? Moscow Institute of Physics and Technology 2

Background and History Moscow Institute of Physics and Technology 3

What is a Microprocessor? Microprocessor is a computer Central Processing Unit (CPU) on a single chip. It contains millions of transistors connected by wires Core i7 die Picture: Intel Core i7 in package Picture: Ebbesen Moscow Institute of Physics and Technology 4

Electrical Numerical Integrator and Calculator Designed for the U.S. Army's Ballistic Research Laboratory Built out of 17,468 vacuum tubes 7,200 crystal diodes 1,500 relays 70,000 resistors 10,000 capacitors Consumed 150 kw of power Took up 72 m 2 Weighted 27 tons Suffered a failure on average every 6 hours Moscow Institute of Physics and Technology 5

Electrical Numerical Integrator and Calculator Glen Beck and Betty Snyder program the ENIAC in BRL building 328. (Picture: U.S. Army) Moscow Institute of Physics and Technology 6

The First Transistor was Created in 1947 Used germanium Created by a team lead by William Shockley at Bell Labs Shockley later shared the Noble prize in physics Shockley semiconductors was founded in Palo Alto in 1955 In 1957 Bob Noyce, Gordon Moore, and 6 others ( Traitorous Eight ) leave to found Fairchild semiconductor Moscow Institute of Physics and Technology 7

The First Integrated Circuit was Created in 1959 Proposed independently by Bob Noyce at Fairchild and Jack Kilby at Texas Instruments In 1968 Noyce and Moore leave Fairchild to found Intel Contained a single transistor and supporting components The first working integrated circuit (1.6 11.1 mm) Picture: TI Moscow Institute of Physics and Technology 8

Intel Created the First Commercial Microprocessor Introduced the 4004 in 1971, contained 2,300 transistors Had roughly the same processing power as ENIAC Intel 4004 Picture: Intel Intel founders (circa 1978) Picture: Intel Moscow Institute of Physics and Technology 9

IBM Introduced its Original PC in 1981 Used the Intel 8088 processor containing 29,000 transistors Used operating system (MS-DOS) designed by Microsoft IBM PC Picture: Intel Intel 8088 Picture: Intel Moscow Institute of Physics and Technology 10

Microprocessor Evolution 4004 transistors were 10 µm across Pentium 4 transistors are 0.13 µm across Human hair is about 100 µm across Smaller transistors allow More transistors per chip More processing per clock cycle Faster clock rates Smaller/cheaper chips Moscow Institute of Physics and Technology 11

Microprocessor Evolution Picture: Intel Moscow Institute of Physics and Technology 12

Moore s Law "The number of transistors incorporated in a chip will approximately double every 24 months. (1965) Picture: Intel Moscow Institute of Physics and Technology 13

Computer Components Moscow Institute of Physics and Technology 14

PC Components Microprocessor performs all computations Cache fast memory which holds current data and program Main memory larger DRAM memory contains more data Chipset controls communication between components Motherboard circuit board which holds all the above components Peripheral cards controls added computer accessories Moscow Institute of Physics and Technology 15

Memory Hierarchy ~1kB Register File ~1ns ~100kB ~1MB ~10MB ~10GB ~1TB Capacity Level 1 Cache Level 2 Cache Level 3 Cache Main Memory Hard Drive ~5ns ~10ns ~20ns ~100ns ~10ms Access Time Moscow Institute of Physics and Technology 16

Processor-Memory Performance Gap 1000 60% per year (Doubles every 1.5 year) CPU 100 Relative performance 10 9% per year (Doubles every 10 year) Performance gap (Grows 50% per year) DRAM 1 Source: David Patterson, UC Berkeley Years Moscow Institute of Physics and Technology 17

Memory Hierarchy Evolution CPU CPU CPU L1 I D Cache L2 L2 Chipset DRAM Chipset DRAM Chipset DRAM 386 No on-die cache. Level 1 cache on motherboard 486 Level 1 cache on-die. Level 2 cache on motherboard Pentium Separate Instruction and Data Caches Moscow Institute of Physics and Technology 18

Memory Hierarchy Evolution CPU CPU CPU I D L2 I L2 D I L2 D L3 Chipset DRAM Chipset DRAM Chipset DRAM Pentium II Separate bus to L2 cache in same package Pentium III L2 cache on-die Core i7 L3 cache on-die Moscow Institute of Physics and Technology 19

Microprocessor Architecture Moscow Institute of Physics and Technology 20

What is Architecture? Computer architecture is defined by the instructions a processor can execute Programs written for one processor can run on any other processor of the same architecture Current architectures include: IA32 (x86) IA64 ARM PowerPC SPARC Moscow Institute of Physics and Technology 21

What are Instructions? Instructions are the most basic actions the processor can take: ADD AX, BX Add value AX to BX and store in AX CMP AX, 5 Compare value in AX to 5 JE 16 Jump ahead 16 bytes if comparison was equal High level programming languages (C, C++, Java) allow many processor instructions to be written simply: if (A + B = 5) then Jump if sum of A and B is 5 Every program must be converted to the processor instructions of the computer it will be run on Moscow Institute of Physics and Technology 22

What is Microarchitecture? Microarchitecture is the steps a processor takes to execute a particular set of instructions Processors of the same architecture have the same instructions but may carry them out in different ways Microarchitecture Features: Cache memory Pipelining Out-of-Order Execution Superscalar Issue Moscow Institute of Physics and Technology 23

Simplified Microprocessor Fetch L2 Cache Decode Floating Multimedia Point Unit Unit Write Integer Unit Fetch Unit gets the next instruction from the cache. Decode Unit determines type of instruction. Instruction and data sent to Execution Unit. Write Unit stores result. Instruction Fetch Decode Execute Write Moscow Institute of Physics and Technology 24

Sequential Processing (386) Cycle 1 2 3 4 5 6 7 8 9 Instr 1 Fetch Decode Execute Write Instr 2 Fetch Decode Execute Write Instr 3 Fetch Sequential processing works on one instruction at a time Moscow Institute of Physics and Technology 25

Pipelined Processing (486) Cycle 1 2 3 4 5 6 7 8 9 Instr 1 Fetch Decode Execute Write Instr 2 Fetch Decode Execute Write Instr 3 Fetch Decode Execute Write Instr 4 Fetch Decode Execute Write Instr 5 Fetch Decode Execute Write Instr 6 Fetch Decode Execute Write Latency elapsed time from start to completion of a particular task Throughput how many tasks can be completed per unit of time Pipelining only improves throughput Each job still takes 4 cycles to complete Real life analogy: Henry Ford s automobile assembly line Moscow Institute of Physics and Technology 26

In-Order Pipeline (486) Cycle 1 2 3 4 5 6 7 8 9 Instr 1 Fetch Decode Execute Write Instr 2 Fetch Decode Wait Execute Write Instr 3 Fetch Decode Wait Execute Write Instr 4 Fetch Decode Wait Execute Write Instr 5 Fetch Decode Wait Execute Instr 6 Fetch Decode Wait In-Order execution requires instructions to be executed in the original program order Moscow Institute of Physics and Technology 27

Out-of-Order Execution (Pentium II) Cycle 1 2 3 4 5 6 7 8 9 Instr 1 Fetch Decode Execute Write Instr 2 Fetch Decode Wait Execute Write Instr 3 Fetch Decode Execute Write Instr 4 Fetch Decode Wait Execute Write Instr 5 Fetch Decode Execute Write Instr 6 Fetch Decode Execute Write Program Order vs Dataflow Order Dataflow: data-driven scheduling of events The start of an event should be enabled by the availability of its required input (data dependency) Real life analogy: taking tests work on the questions you know first Moscow Institute of Physics and Technology 28

Superscalar Issue (Pentium) Cycle 1 2 3 4 5 6 7 8 9 Instr 1 Fetch Decode Execute Write Instr 2 Fetch Decode Wait Execute Write Instr 3 Fetch Decode Execute Write Instr 4 Fetch Decode Wait Execute Write Instr 5 Fetch Decode Execute Write Instr 6 Fetch Decode Execute Write Instr 7 Fetch Decode Execute Write Instr 8 Fetch Decode Execute Write Superscalar issue allows multiple instructions to be issued at the same time Moscow Institute of Physics and Technology 29

Microarchitecture and Performance Performance is measured by how long a processor takes to run a program Time is reduced by increasing Instructions Per Cycle (IPC) and clock rate Microarchitecture affects IPC and clock rate: More pipe stages Less work in each cycle means better clock rate More dependencies means worse IPC Superscalar Issue and Out-of-Order Execution Parallel work means better IPC More complexity can mean worse clock rate Moscow Institute of Physics and Technology 30

Measuring Processor Performance Clock Rate Simplest but not especially accurate Instructions Per Cycle (IPC) Not meaningful without clock rate Varies from program to program SPEC Performance Standard Performance Evaluation Corporation Tests PCs using benchmark suite of programs SPECint: Integer intensive programs SPECfp: Floating point intensive programs TPC Performance Transaction Performance Council Tests servers and workstations Moscow Institute of Physics and Technology 31

Acknowledgements These slides contain material developed and copyright by: Grant McFarland (Intel) Mark Louis (Intel) Arvind (MIT) Joel Emer (Intel/MIT) Moscow Institute of Physics and Technology 32