New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC

Similar documents
HPC & Big Data THE TIME HAS COME FOR A SCALABLE FRAMEWORK

Vendor Update Intel 49 th IDC HPC User Forum. Mike Lafferty HPC Marketing Intel Americas Corp.

Intel Media SDK Library Distribution and Dispatching Process

Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms

Intel X38 Express Chipset Memory Technology and Configuration Guide

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Intel Core i5 processor 520E CPU Embedded Application Power Guideline Addendum January 2011

Intel 965 Express Chipset Family Memory Technology and Configuration Guide

Accelerating Business Intelligence with Large-Scale System Memory

Intel Q35/Q33, G35/G33/G31, P35/P31 Express Chipset Memory Technology and Configuration Guide

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

The Transition to PCI Express* for Client SSDs

Intel Desktop Board D945GCPE Specification Update

Cloud based Holdfast Electronic Sports Game Platform

Accelerating Business Intelligence with Large-Scale System Memory

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Intel Desktop Board D945GCPE

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

Intel Desktop Board DG43RK

Intel Desktop Board DG41BI

RAID and Storage Options Available on Intel Server Boards and Systems

DDR2 x16 Hardware Implementation Utilizing the Intel EP80579 Integrated Processor Product Line

Intel SSD 520 Series Specification Update

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Intel Desktop Board DG41TY

Intel Platform and Big Data: Making big data work for you.

Intel Desktop Board DP55WB

Intel Cloud Builder Guide to Cloud Design and Deployment on Intel Xeon Processor-based Platforms

Intel Desktop Board DG31PR

Intel Desktop Board DP43BF

Intel Desktop public roadmap

Intel Many Integrated Core Architecture: An Overview and Programming Models

Intel Desktop Board DQ43AP

Intel Solid-State Drives Increase Productivity of Product Design and Simulation

Intel Solid-State Drive Pro 2500 Series Opal* Compatibility Guide

Intel 810 and 815 Chipset Family Dynamic Video Memory Technology

Riding silicon trends into our future

Intel Core TM i3 Processor Series Embedded Application Power Guideline Addendum

Intel RAID Controllers

Intel RAID RS25 Series Performance

Intel Desktop Board DG41WV

Intel Service Assurance Administrator. Product Overview

Leading Virtualization 2.0

Intel Technical Advisory

Intel Desktop Board D945GCL

Different NFV/SDN Solutions for Telecoms and Enterprise Cloud

Intel Ethernet Switch Converged Enhanced Ethernet (CEE) and Datacenter Bridging (DCB) Using Intel Ethernet Switch Family Switches

RAID and Storage Options Available on Intel Server Boards and Systems based on Intel 5500/5520 and 3420 PCH Chipset

COSBench: A benchmark Tool for Cloud Object Storage Services. Jiangang.Duan@intel.com

Intel Cyber Security Briefing: Trends, Solutions, and Opportunities. Matthew Rosenquist, Cyber Security Strategist, Intel Corp

Intelligent Business Operations

Itanium 2 Platform and Technologies. Alexander Grudinski Business Solution Specialist Intel Corporation

Intel Server Raid Controller. RAID Configuration Utility (RCU)

VNF & Performance: A practical approach

Intel Identity Protection Technology Enabling improved user-friendly strong authentication in VASCO's latest generation solutions

Intel Desktop Board DQ35JO

Intel Identity Protection Technology (IPT)

Keys to node-level performance analysis and threading in HPC applications

Intel Atom Processor E3800 Product Family

Scaling up to Production

The Foundation for Better Business Intelligence

Preserving Performance While Saving Power Using Intel Intelligent Power Node Manager and Intel Data Center Manager

Intel Desktop Board DG31GL

Evaluating Intel Virtualization Technology FlexMigration with Multi-generation Intel Multi-core and Intel Dual-core Xeon Processors.

Intel Extreme Memory Profile (Intel XMP) DDR3 Technology

Intel Itanium Quad-Core Architecture for the Enterprise. Lambert Schaelicke Eric DeLano

Intel Network Builders: Lanner and Intel Building the Best Network Security Platforms

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service. Eddie Dong, Tao Hong, Xiaowei Yang

Specification Update. January 2014

A Superior Hardware Platform for Server Virtualization

Intel Data Center Manager. Data center IT agility and control

Intel Server Board S3420GPRX Intel Server System SR1630GPRX Intel Server System SR1630HGPRX

Intel Desktop Board DG965RY

Intel Desktop Board DG45FC

Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze

How To Scale At 14 Nanomnemester

Intel Cloud Builders Guide: Cloud Design and Deployment on Intel Platforms

Intel Desktop Board DG33TL

How High Temperature Data Centers and Intel Technologies Decrease Operating Costs

Intel Desktop Board DQ965GF

Hetero Streams Library 1.0

MapReduce and Lustre * : Running Hadoop * in a High Performance Computing Environment

Benefits of Intel Matrix Storage Technology

10GBASE-T for Broad 10 Gigabit Adoption in the Data Center

Configuring RAID for Optimal Performance

Life With Big Data and the Internet of Things

Intel Open Network Platform Release 2.1: Driving Network Transformation

NVM Express TM Infrastructure - Exploring Data Center PCIe Topologies

Intel Platform Memory Operations

Intel Desktop Board DQ45CB

with PKI Use Case Guide

Intel Server Board S3420GPV

Fiber Channel Over Ethernet (FCoE)

Transcription:

New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC Alan Gara Intel Fellow Exascale Chief Architect

Legal Disclaimer Today s presentations contain forward-looking statements. All statements made that are not historical facts are subject to a number of risks and uncertainties, and actual results may differ materially. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See www.intel.com/products/processor_number for details. Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel, Intel Xeon, Intel Core microarchitecture, and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. *Other names and brands may be claimed as the property of others Copyright 2011, Intel Corporation. All rights reserved.

The next step in perf/$ Typical DRAM Memory Die (2016) ~ 8Gb will be about 100 mm^2 (as always) Processor floating point unit ~0.03 mm^2 (2 Flops/cycle) (see below) Even if the core is 100x bigger than the FPU, At 1.0 GB/core we have >100x more silicon in memory than processing. This is not cost balanced. Threading gives us a mechanism to change this balance if we have enough For cost balance we need to either, bandwidth to support much higher compute/memory. 1) Use much less memory per compute or New memory architectures allow us to get a significant step in perf/$ 2) Make physical size of capacity much smaller DARPA:Exascale computing study: Exascale_Final_report_100208.pdf

Big Data meets HPC Big Data HPC Large Memory Capacity Large Small to Large Bandwidth to Large Memory Small to Large Large System Fabric Bandwidth Small to Large Large System compute capability Small to Large Large Big Data Bandwidth requirements to data vary Random access requires high bandwidth Cacheable accesses can tolerate much lower bandwidth to memory Where data can be cached matters at processor: fabric requirements lower at remote memory: high fabric requirements

Will HPC and Big Data drive a different system balance point? HPC cost balance Synthetic data for illustration only Big data cost balance Storage/ NVM Interconnect Memory Processor Variation in budgeting to will remain bound (1:10)

System commonality between big data and HPC Need to understand future memory technology characteristics Comes down to bandwidth (assuming we have better $/bit) DRAM like Bandwidth? Y N Can be fundamental or market microarchitecture choice System architectures for Big data and HPC very similar. Both benefit from new technologies Big Data will benefit more. Architectures will not be identical but configurability will allow for cross over.

This will also drive user effort New memory technologies replace/augment DRAM DRAM the remains dominant load-store memory technology Memory capacity per compute 5x-10x better than DRAM Modest need for threading when new technologies available. Task scaling can be effectively applied to many applications. Memory capacity per performance drops 10x to 20x from current levels. Aggressive threading is commonplace/necessary. Program model changes focus on thread scaling. Aggressively strive for more performance for similar task numbers.

Microarchitecture choices will drive bandwidth/ capacity tradeoff DENSITY VERSUS BANDWIDTH TRADE-OFF DRAM (approx) NVM (optimal) 6 5 4 3 2 1 0 0 50 100 150 200 250 300 Illustrative curves: Not based on actual data

Courtesy of James Hutchby SRC

Courtesy of James Hutchby SRC

Two Design Options For Supercomputing A processor with >10B transistors on a die in 2020 OR A processor with fewer transistors on a smaller cost effective die 11

Option 1: Large Die With>10B Transistors More cache Fewer cores Everything integrated More cores Enough cache for HPC Everything integrated Flavor of cores Enough cache for HPC Everything integrated Enables on-package memory? Cache size beyond a certain threshold not utilized by the programmer High FLOPS count on a die? Enough on-package memory becomes difficult to implement Extreme performance levels result in problematic off-package memory usage Powerful cores for ST performance Smaller cores for highly parallel? Enough on-package memory becomes difficult to implement. Extreme performance levels result in problematic off-package memory usage 12

13 Option 2: Cost Effective Die That Supports On-package Memory Building Block Stacked Memory TSV Scalable fabric Processor die matched to performance. Can be much smaller than memory. Broad Usage: With the right memory capacity per building block, it can address a large portion of the HPC market Cost: Building blocks can replace the compute and DRAM in a node (at the right price point) Scalability: Configure building block as memory or memory+compute Power: Better thermal solution with disaggregated compute blocks

The Possibilities With the Building Block Approach At Exascale Cost 1 1 Memory capacity (inpackage) Memory capacity (outside package) Evolved 2 TB 300 GB Assume none Number of cores 8000 1000 Memory Bandwidth (In-package) Memory Bandwidth (outside-package) 50 TB/s 5 TB/s Assume none 2TB (DDR4/5) 400 GB/s Performance peak 512TF 64TF Synthetic data for illustration only 1) On-package memory has 8-10x the bandwidth compared to external memory 2) At iso cost and memory capacity, on-package memory enables 8-10x additional compute to be placed under the memory 14

The Motherboard in 2020: Just a Backplane of Cards? 15

Summary While Big data and HPC have different memory access patterns, cost balance will drive to similar system balances. New memory technologies will drive future system architecture design points. Especially for Big data. New packaging technologies open up new directions allowing for a new dimension of disaggregation. Must remember Power is and will remain the biggest challenge. We need to no longer improve performance faster than energy efficiency improvements.