Lecture 2 : Advanced Processors Part 1: The Road Map of Intel Microprocessors: Current and Future Trends. Lessons Learned so far:

Similar documents

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Generations of the computer. processors.

Software Pipelining. for (i=1, i<100, i++) { x := A[i]; x := x+1; A[i] := x

Design Cycle for Microprocessors

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

Computer Architecture TDTS10

Introduction to Cloud Computing

VLIW Processors. VLIW Processors

CISC, RISC, and DSP Microprocessors

Introduction to Microprocessors

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May ILP Execution

Symmetric Multiprocessing

ADVANCED COMPUTER ARCHITECTURE

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

ELE 356 Computer Engineering II. Section 1 Foundations Class 6 Architecture

High Performance Computing in the Multi-core Area

Static Scheduling. option #1: dynamic scheduling (by the hardware) option #2: static scheduling (by the compiler) ECE 252 / CPS 220 Lecture Notes

OC By Arsene Fansi T. POLIMI

Chip Multithreading: Opportunities and Challenges

Multithreading Lin Gao cs9244 report, 2006

CMSC 611: Advanced Computer Architecture

Parallel Programming Survey

High Performance Processor Architecture. André Seznec IRISA/INRIA ALF project-team

Chapter 2 Logic Gates and Introduction to Computer Architecture

Computer Organization

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Pentium vs. Power PC Computer Architecture and PCI Bus Interface

LSN 2 Computer Processors

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

IA-64 Application Developer s Architecture Guide

CS 159 Two Lecture Introduction. Parallel Processing: A Hardware Solution & A Software Challenge

This Unit: Multithreading (MT) CIS 501 Computer Architecture. Performance And Utilization. Readings

Parallelism and Cloud Computing

ADVANCED COMPUTER ARCHITECTURE: Parallelism, Scalability, Programmability

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

Multi-Threading Performance on Commodity Multi-Core Processors

Coming Challenges in Microarchitecture and Architecture

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Computer Architectures

Performance Impacts of Non-blocking Caches in Out-of-order Processors

CHAPTER 4 MARIE: An Introduction to a Simple Computer

Current Trend of Supercomputer Architecture

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

İSTANBUL AYDIN UNIVERSITY

64-Bit versus 32-Bit CPUs in Scientific Computing

Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013

The IA-32 processor architecture

Technical Report. Complexity-effective superscalar embedded processors using instruction-level distributed processing. Ian Caulfield.

Thread Level Parallelism II: Multithreading

The Motherboard Chapter #5

GPUs for Scientific Computing

Chapter 2 Parallel Computer Architecture

Instruction Set Design

Sotirios G. Ziavras, Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, New Jersey 07102, U.S.A.

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Overview. CPU Manufacturers. Current Intel and AMD Offerings

Operating System Impact on SMT Architecture

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr Teruzzi Roberto matr IBM CELL. Politecnico di Milano Como Campus

An examination of the dual-core capability of the new HP xw4300 Workstation

Itanium 2 Platform and Technologies. Alexander Grudinski Business Solution Specialist Intel Corporation

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

Comparison of Intel Single-Core and Intel Dual-Core Processor Performance

Multi-Core Programming

! Metrics! Latency and throughput. ! Reporting performance! Benchmarking and averaging. ! CPU performance equation & performance trends

Why Latency Lags Bandwidth, and What it Means to Computing

A Very Brief History of High-Performance Computing

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

Chapter 6. Inside the System Unit. What You Will Learn... Computers Are Your Future. What You Will Learn... Describing Hardware Performance

CPU Session 1. Praktikum Parallele Rechnerarchtitekturen. Praktikum Parallele Rechnerarchitekturen / Johannes Hofmann April 14,

SPARC64 VIIIfx: CPU for the K computer

Chapter 1: Introduction. What is an Operating System?

Multi-core and Linux* Kernel

Microwatt to Megawatt - Transforming Edge to Data Centre Insights

Putting it all together: Intel Nehalem.

Thread Level Parallelism (TLP)

Enabling Technologies for Distributed and Cloud Computing

Architecture of Hitachi SR-8000

Personal Systems Reference Intel PC Processors - withdrawn

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor. Travis Lanier Senior Product Manager

CS550. Distributed Operating Systems (Advanced Operating Systems) Instructor: Xian-He Sun

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Data Centric Systems (DCS)

Intel Itanium Quad-Core Architecture for the Enterprise. Lambert Schaelicke Eric DeLano

Processor Architectures

on an system with an infinite number of processors. Calculate the speedup of

Intel Pentium 4 Processor on 90nm Technology

Architectures and Platforms

CPU Organization and Assembly Language

Multicore Processors A Necessity By Bryan Schauer

Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation

Chapter 2 - Computer Organization

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

Next Generation GPU Architecture Code-named Fermi

How To Write A Parallel Computer Program

Transcription:

Lessons Learned so far: Lecture 2 : Advanced Processors Part 1: The Road Map of Intel Microprocessors: Current and Future Trends Kai Hwang, August 31, 2007 Evolution of Instruction Set Architectures : CISC to RISC and to Hybrid Amdhal s Law : Making the Common Case Fast Moore s Law: Scalable Commodity Computing Network-based computing booming: Cluster and Grid Computing 1 2 How to Improve Computer Performance with Architectural Innovations? Use RISC Instruction Set Architecture Use high density chip with billion of transistors Use multi- GHz clocks with better cooling support Use deep pipelined design of processor datapath Use superscalar processor with multiple datapaths Use out-of-order or dynamic execution How to Improve Computer Performance with Architectural Innovations? (Cont d) Use hybrid architecture like Pentium series Use low-power and better power management (Core duo) Use multi-level cache and shared memory architecture Use branch prediction technique Use multi-core architecture with multiple threads of control like the Niagara Processor 3 4

Top 500 List of Supercomputers Single Instruction multiple data (SIMD) Myrias 1991 Supercomputer Industry? (Dinosaurs Vanished ) BBN 1997 500 Cluster (network of workstations) 400 300 Cluster (network of SMPs) Multiflow 1990 ESCD 1990 Convex Computer 1994 Kendall Square Resarch 1996 MasPar 1996 Cray Research 1996 ncube 2005 200 Massively parallel processors (MPPs) 1985 1990 1992 1994 1996 1998 2000 2005 100 0 93 93 94 94 95 95 96 96 97 97 98 98 99 99 00 Sharedmemory multiprocessors (SMPs) 1989 ETA 1992 Meiko Scientific 1994 Thinking Machines 1995 Pyramid 1998 DEC 1999 Sequent Uniprocessors 5 6 Intel Microprocessors since The Pentium Series Pentium 4 Architecture Processor executes simple microinstructions, 70 bits wide (hardwired) 120 control lines for integer datapath (400 control lines for floating point) If an instruction requires more than 4 microinstructions to implement, control from microcode ROM (8000 microinstructions) Control Control I/O interface Instruction cache Enhanced floating point and multimedia Control Data cache Integer datapath Secondary cache and memory interface Advanced pipelining hyperthreading support Control 7 8

The Pentium 4 Architecture Interesting Features in Pentium 4 A special trace cache is used to hold predecoded IA-32 instructions, which are translated into RISC-like microoperations for out-of-order execution Allows up to 126 microoperations to be outstanding in deep-pipelined execution queues, including 48 loads, 24 stores, and integer and floating-point operations over 7 functional units, simultaneously. The FP units also handles the MMX (multimedia extension) ans SSE2 instructions. The functional pipelines execute dynamically with multiple issues Other interesting features include ILP, speculation, register renaming, dynamic scheduling for OOO execution, and inorder retirement, etc. 9 10 Innovative Processor Features in Modern Processors: Use data or instruction prefetching techniques Use speculative load and speculative branch ahead of store instructions Use predicated instruction execution Use EPIC (Explicit Parallelism Instruction Computing) architecture like the Itanium (IA-64 architecture) Innovative Processor Features in Modern Processors: Compiler optimization from small basic blocks to bigger blocks Extending compiler from local to global optimizations System In a Chip (SIC) architecture Chip MultiProcessor (CMP) architecture 11 12

Advances in IA-64 Architecture beyond The IA-32 : Enlarged memory address space from 2 32 words to 2 64 double words. Use 256 physical registers over 64 registers Use speculative load beyond store and speculative branch Use predicated instruction execution 13 14 HP/Intel Itanium Series Use EPIC (Explicit Parallelism Instruction Computing) to merge superscalar and VLIW features Compiler optimization extending small basic blocks to much bigger blocks Extending compiler from local to global optimizations 15 1

17 18 Intel Wide Dynamic Execution: Multi-Core The Core Duo Processor Architecture Intel Core Duo Processor Floor plan: To provide the best performance under Power and thermal constraints. To achieve a significant improvement in performance by a punctual control between the performance, power, and thermal features in the Core Duo system. 19 20

Power States of Intel Core Duo Processor Optimize the power utilization for sustained performance : Improved power process technology Low power CPU design Dynamic power coordination Low Vcc cache design with voltage regulation Smart cache sizing and enhanced deeper sleep Using thermal monitor and digital sensor 21 22 Smart Cache in Intel Core Duo A Multiprocessor using separate CPU Chips on the same circuit board 23 24

A Chip Multiprocessor (CMP) Intel s Core 2 Duo Architecture Further integration on denser chips 25 26 Intel s Core Fetch/Decode Hardware Required Paper Readings (download from class web site, for Lectures 1 2 : O. Wechsler, Inside Intel Core Microarchitecture, Intel White Paper, 2006 H. Sharangpani, Intel Itanium Processor Micro-architecture Overview, Intel Presentation Slides, 2001. S. Gochman, et al, Introduction to Intel Core Duo Processor Architecture, Intel Technology Journal, Vol.10, Issue 2, 2006 27 28

Book Chapter Readings related to Lectures 1 8 : I. Taylor, From P2P to Web Services and Grids, Springer- Verlag, Chapter 1, 2, and 4 on Introduction, P2P systems, and Grid Computing London, 2005, ISBN 1-85233-869-5 K. Hwang and Z. Xu: Scalable Parallel Computing, Chapters 8, 9, and 10 on Scalable Multiprocessors and Clusters McGraw-Hill, 1998, ISBN: 0-07-031798-4 More Advanced Processors plus Network Technology to be studied in next Lecture 2 : Intel s Montecito A dual-core successor of Itanium Series Sun s Niagara Series IBM POWER 5 Series HP Piranha- A CMP Architecture LAN, SAN, and NAS Technologies 29 30