The i860 XP Second Generation of the i860 Supercomputing Microprocessor Family. Presentation Outline



Similar documents
The Orca Chip... Heart of IBM s RISC System/6000 Value Servers

OpenSPARC T1 Processor

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

OC By Arsene Fansi T. POLIMI

Intel StrongARM SA-110 Microprocessor

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

Intel Pentium 4 Processor on 90nm Technology

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Vorlesung Rechnerarchitektur 2 Seite 178 DASH

21152 PCI-to-PCI Bridge

Thread level parallelism

Next Generation GPU Architecture Code-named Fermi

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

LSI SAS inside 60% of servers. 21 million LSI SAS & MegaRAID solutions shipped over last 3 years. 9 out of 10 top server vendors use MegaRAID

Types of Workloads. Raj Jain. Washington University in St. Louis

Architecture of Hitachi SR-8000

Generations of the computer. processors.

Family 12h AMD Athlon II Processor Product Data Sheet

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

A Scalable VISC Processor Platform for Modern Client and Cloud Workloads

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor. Travis Lanier Senior Product Manager

IA-64 Application Developer s Architecture Guide

AMD Opteron Quad-Core

Design Cycle for Microprocessors

Introduction to Microprocessors

1. Memory technology & Hierarchy

Introduction to Cloud Computing

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

MPC603/MPC604 Evaluation System

Parallel Algorithm Engineering

Pentium vs. Power PC Computer Architecture and PCI Bus Interface

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

DDR4 Memory Technology on HP Z Workstations

SPARC64 VIIIfx: CPU for the K computer

Family 10h AMD Phenom II Processor Product Data Sheet

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

CS 147: Computer Systems Performance Analysis

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr Teruzzi Roberto matr IBM CELL. Politecnico di Milano Como Campus

Benchmarking Large Scale Cloud Computing in Asia Pacific

Five Families of ARM Processor IP

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Switch Fabric Implementation Using Shared Memory

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

CISC, RISC, and DSP Microprocessors

CMSC 611: Advanced Computer Architecture

Supporting Cache Coherence in Heterogeneous Multiprocessor Systems

CS521 CSE IITG 11/23/2012

Intel Itanium Quad-Core Architecture for the Enterprise. Lambert Schaelicke Eric DeLano

Chapter 13. PIC Family Microcontroller

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Eloquence Training What s new in Eloquence B.08.00

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

Parallel Programming Survey

An examination of the dual-core capability of the new HP xw4300 Workstation

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

Computer Architecture

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

PikeOS: Multi-Core RTOS for IMA. Dr. Sergey Tverdyshev SYSGO AG , Moscow

What is a bus? A Bus is: Advantages of Buses. Disadvantage of Buses. Master versus Slave. The General Organization of a Bus

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Chapter 2 Logic Gates and Introduction to Computer Architecture

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING

Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

Memory Architecture and Management in a NoC Platform

Putting it all together: Intel Nehalem.

The Motherboard Chapter #5

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Introduction to GPGPU. Tiziano Diamanti

Why Latency Lags Bandwidth, and What it Means to Computing

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?

Intel Itanium Architecture

"JAGUAR AMD s Next Generation Low Power x86 Core. Jeff Rupley, AMD Fellow Chief Architect / Jaguar Core August 28, 2012

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Semiconductor Device Technology for Implementing System Solutions: Memory Modules

The Central Processing Unit:

How To Understand The Design Of A Microprocessor

MC68060 MC68LC060 MC68EC060

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Dell PowerEdge Servers Memory

NAND Flash Architecture and Specification Trends

SPARC64 X: Fujitsu s New Generation 16 Core Processor for the next generation UNIX servers

Multimedia Systems Hardware & Software THETOPPERSWAY.COM

Full and Para Virtualization

Building an Inexpensive Parallel Computer

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Introduction to PCI Express Positioning Information

The K computer: Project overview

A Close Look at PCI Express SSDs. Shirish Jamthe Director of System Engineering Virident Systems, Inc. August 2011

Transcription:

intel The i860 XP Second Generation of the i860 Supercomputing Microprocessor Family David Perlmutter Michael Kagan ntel srael August 1991 infel Presentation Outline i860 XP CPU Key Attributes SupercomputingNisualization System Requirements The i860 XP Microprocessor Vector Operation Capabilities Multi-Processing Capabilities nternal Architecture Performance Benchmarks $/MFLOP Roadmap Summary and Conclusions 3.15 2

i860 XP CPU Key Attributes Target Markets - Massively Parrallel Supercomputer and Mulit-Processing Systems - Super Workstation & servers - High End Workstation Graphics/Accelerator Subsystems Technology - 3 Layer Metal, O.BuM CHMOS-V Technology - 2.55 Million Transistors - Die Size: 612 X 404 mils - 262 pin CGA Package - Frequency 40 & 50 MHz - Power Dissipation (@50 MHz) - 5W Supercomputing/Visualization System Requirements High Throughput Computing Performance - "Number Crunching" Floating-Point Capability - Real Time 3D GraphicsNisualization Multiprocessing/Parallel Processing Vector Processing High Bus Bandwidth Scalable Performance Cost Effectiveness 3.16

The i860 XP Supercomputing Microprocessor Very High Performance - 100MFlOPS - 400MByte/Sec Bus Bandwidth - 40 & 50 MHz Operation - 40+ SpecMark - 3 operations/cycle High ntegration, Single Chip Multi & Parallel Processing - Hardware Cache Consistency - Bus Snooping - Detached Concurrency Control Unit (DCCU) - Scalable - Shared Bus or Massively Parallel r EJ 64 CORE bits FP ADD 64 bits FP MULT TM \. 1860 XR compatible 3D Graphics MP Snoop Logic D-Cache -Cache Physical tag 16KB 16KB 4 Way 4 Way 1860 TMxp CPU Upward Software Compatible with i860 XR CPU Pipelined Burst Bus & MMU " A SUPERCOMPUTNG MCROPROCESSOR 4 Vector Operation Capabilities 80860XP AddresstCNTL., PAGED CPU DRAM SUBSYSTEM data 3 2 NENE banko Pipelined Load nstructions - loads 128bits in 2 ClKs - Helps to Hide Memory Latency Specialized nstructions to Reduce Tight Loops - BLA - Add & Branch with 0 latency - Dual nstruction mode - FP and nteger parallelism - Dual Operation nstructions Large D-Cache to hold large Vectors Optimized DRAM interface For Fast Bus Throughput - Paged DRAM Support - Three levels of pipeline - Burst Bus - Wide Memory Access 3.17

セ セSR Lセ jアセm セ セセ セセ セ - -- MUltiprocessing Capabilities _ eon...ney Control 80860XP - CPU add,.. de.. J CCU MPC MBC High Bandwidth Memory Bus 495XP/ 490XP L2 CACHE MPlCBus - ".. Reduced Bus Utilization (Scalability) - Large On-chip Write-Back Cache - 2nd level Write-Back Cache (82490XP/82495XP: (Consistency By nclusion) - LOCK by Address Data Consistency / ntegrity - HW Based MES Cache Consistency Protocol - Bus Snooping Concurrently with Cache Look Up - Weak! Strong Write Ordering Mode - Data Parity Check - Bus Retry Hooks Parallel Processing - Loop Level Parallelism (MPC, DCCU) 3.18. DATA.US Bus nterface Unit DCCU ADDRESS BUS セ.. nternal Architecture t - +. -,at - to セ セ..100' 32 1 M,,. 1 セ V P セ - -- 121 セセ 121 Lセ 'r r PFLD V P T 16Kbyte T T 16Kbyte T r31 A -Cache A FFO... G G A D-Cache A J 13' 130 f29 1211,., 4 4 K T MMU T L L 8 B L RP セ /... / PA... 32,'" 32 " nst [ FP Decode --L 13 12 " lor ( セゥG 32 ェセセNセ... 'M FP Muhiplier Un",t,.,,. 100', 32 jセ J.. FR,-- CORE -, n" 1--.. BEAR CCfl FSfl AW PO セ P1 DB _ P3 EPSR セ... - '. 1..U- BANセ, o. " - lz Nfl..., FPAdderUnn J,... -. - ; - Grphica Un" J G G -

Performance Benchmarks Total SPEC * 41+ FP SPEC * 50 Dhrystone 103.9 Triangles/sec SOK Linpack (Double) MFLOPS 20 Based on preliminary results on prototype board i860 Architecture $/MFLOP Roadmap 14 $1MFLOP 7 4 i860 XR-2}.. Time 3.19 10

Summary & Conclusions Supports High End MP/PP Systems Via Coarse to Loop Level of Parallelism Supports Large Variety of Memory Sub Systems - From DRAM to Sophisticated Second Level Cache Based Systems - Scalability From Uniprocessor to Massively Parallel systems High ntegration - RSC core Surrounded with FP, Caches, MMU, and CCU Bus Optimized for Vector Operations and Fast Throughput Cost Effective MFLOPS i860 TU XP CPU DELVERS SUPERCOMPUTNG PERFORMANCE TO BROAD CLASS OF AFFORDABLE SYSTEMS 11 Die Photo 3.20 l'