intel The i860 XP Second Generation of the i860 Supercomputing Microprocessor Family David Perlmutter Michael Kagan ntel srael August 1991 infel Presentation Outline i860 XP CPU Key Attributes SupercomputingNisualization System Requirements The i860 XP Microprocessor Vector Operation Capabilities Multi-Processing Capabilities nternal Architecture Performance Benchmarks $/MFLOP Roadmap Summary and Conclusions 3.15 2
i860 XP CPU Key Attributes Target Markets - Massively Parrallel Supercomputer and Mulit-Processing Systems - Super Workstation & servers - High End Workstation Graphics/Accelerator Subsystems Technology - 3 Layer Metal, O.BuM CHMOS-V Technology - 2.55 Million Transistors - Die Size: 612 X 404 mils - 262 pin CGA Package - Frequency 40 & 50 MHz - Power Dissipation (@50 MHz) - 5W Supercomputing/Visualization System Requirements High Throughput Computing Performance - "Number Crunching" Floating-Point Capability - Real Time 3D GraphicsNisualization Multiprocessing/Parallel Processing Vector Processing High Bus Bandwidth Scalable Performance Cost Effectiveness 3.16
The i860 XP Supercomputing Microprocessor Very High Performance - 100MFlOPS - 400MByte/Sec Bus Bandwidth - 40 & 50 MHz Operation - 40+ SpecMark - 3 operations/cycle High ntegration, Single Chip Multi & Parallel Processing - Hardware Cache Consistency - Bus Snooping - Detached Concurrency Control Unit (DCCU) - Scalable - Shared Bus or Massively Parallel r EJ 64 CORE bits FP ADD 64 bits FP MULT TM \. 1860 XR compatible 3D Graphics MP Snoop Logic D-Cache -Cache Physical tag 16KB 16KB 4 Way 4 Way 1860 TMxp CPU Upward Software Compatible with i860 XR CPU Pipelined Burst Bus & MMU " A SUPERCOMPUTNG MCROPROCESSOR 4 Vector Operation Capabilities 80860XP AddresstCNTL., PAGED CPU DRAM SUBSYSTEM data 3 2 NENE banko Pipelined Load nstructions - loads 128bits in 2 ClKs - Helps to Hide Memory Latency Specialized nstructions to Reduce Tight Loops - BLA - Add & Branch with 0 latency - Dual nstruction mode - FP and nteger parallelism - Dual Operation nstructions Large D-Cache to hold large Vectors Optimized DRAM interface For Fast Bus Throughput - Paged DRAM Support - Three levels of pipeline - Burst Bus - Wide Memory Access 3.17
セ セSR Lセ jアセm セ セセ セセ セ - -- MUltiprocessing Capabilities _ eon...ney Control 80860XP - CPU add,.. de.. J CCU MPC MBC High Bandwidth Memory Bus 495XP/ 490XP L2 CACHE MPlCBus - ".. Reduced Bus Utilization (Scalability) - Large On-chip Write-Back Cache - 2nd level Write-Back Cache (82490XP/82495XP: (Consistency By nclusion) - LOCK by Address Data Consistency / ntegrity - HW Based MES Cache Consistency Protocol - Bus Snooping Concurrently with Cache Look Up - Weak! Strong Write Ordering Mode - Data Parity Check - Bus Retry Hooks Parallel Processing - Loop Level Parallelism (MPC, DCCU) 3.18. DATA.US Bus nterface Unit DCCU ADDRESS BUS セ.. nternal Architecture t - +. -,at - to セ セ..100' 32 1 M,,. 1 セ V P セ - -- 121 セセ 121 Lセ 'r r PFLD V P T 16Kbyte T T 16Kbyte T r31 A -Cache A FFO... G G A D-Cache A J 13' 130 f29 1211,., 4 4 K T MMU T L L 8 B L RP セ /... / PA... 32,'" 32 " nst [ FP Decode --L 13 12 " lor ( セゥG 32 ェセセNセ... 'M FP Muhiplier Un",t,.,,. 100', 32 jセ J.. FR,-- CORE -, n" 1--.. BEAR CCfl FSfl AW PO セ P1 DB _ P3 EPSR セ... - '. 1..U- BANセ, o. " - lz Nfl..., FPAdderUnn J,... -. - ; - Grphica Un" J G G -
Performance Benchmarks Total SPEC * 41+ FP SPEC * 50 Dhrystone 103.9 Triangles/sec SOK Linpack (Double) MFLOPS 20 Based on preliminary results on prototype board i860 Architecture $/MFLOP Roadmap 14 $1MFLOP 7 4 i860 XR-2}.. Time 3.19 10
Summary & Conclusions Supports High End MP/PP Systems Via Coarse to Loop Level of Parallelism Supports Large Variety of Memory Sub Systems - From DRAM to Sophisticated Second Level Cache Based Systems - Scalability From Uniprocessor to Massively Parallel systems High ntegration - RSC core Surrounded with FP, Caches, MMU, and CCU Bus Optimized for Vector Operations and Fast Throughput Cost Effective MFLOPS i860 TU XP CPU DELVERS SUPERCOMPUTNG PERFORMANCE TO BROAD CLASS OF AFFORDABLE SYSTEMS 11 Die Photo 3.20 l'