1 New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC Alan Gara Intel Fellow Exascale Chief Architect
2 Legal Disclaimer Today s presentations contain forward-looking statements. All statements made that are not historical facts are subject to a number of risks and uncertainties, and actual results may differ materially. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See for details. Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel, Intel Xeon, Intel Core microarchitecture, and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. *Other names and brands may be claimed as the property of others Copyright 2011, Intel Corporation. All rights reserved.
3 The next step in perf/$ Typical DRAM Memory Die (2016) ~ 8Gb will be about 100 mm^2 (as always) Processor floating point unit ~0.03 mm^2 (2 Flops/cycle) (see below) Even if the core is 100x bigger than the FPU, At 1.0 GB/core we have >100x more silicon in memory than processing. This is not cost balanced. Threading gives us a mechanism to change this balance if we have enough For cost balance we need to either, bandwidth to support much higher compute/memory. 1) Use much less memory per compute or New memory architectures allow us to get a significant step in perf/$ 2) Make physical size of capacity much smaller DARPA:Exascale computing study: Exascale_Final_report_ pdf
4 Big Data meets HPC Big Data HPC Large Memory Capacity Large Small to Large Bandwidth to Large Memory Small to Large Large System Fabric Bandwidth Small to Large Large System compute capability Small to Large Large Big Data Bandwidth requirements to data vary Random access requires high bandwidth Cacheable accesses can tolerate much lower bandwidth to memory Where data can be cached matters at processor: fabric requirements lower at remote memory: high fabric requirements
5 Will HPC and Big Data drive a different system balance point? HPC cost balance Synthetic data for illustration only Big data cost balance Storage/ NVM Interconnect Memory Processor Variation in budgeting to will remain bound (1:10)
6 System commonality between big data and HPC Need to understand future memory technology characteristics Comes down to bandwidth (assuming we have better $/bit) DRAM like Bandwidth? Y N Can be fundamental or market microarchitecture choice System architectures for Big data and HPC very similar. Both benefit from new technologies Big Data will benefit more. Architectures will not be identical but configurability will allow for cross over.
7 This will also drive user effort New memory technologies replace/augment DRAM DRAM the remains dominant load-store memory technology Memory capacity per compute 5x-10x better than DRAM Modest need for threading when new technologies available. Task scaling can be effectively applied to many applications. Memory capacity per performance drops 10x to 20x from current levels. Aggressive threading is commonplace/necessary. Program model changes focus on thread scaling. Aggressively strive for more performance for similar task numbers.
8 Microarchitecture choices will drive bandwidth/ capacity tradeoff DENSITY VERSUS BANDWIDTH TRADE-OFF DRAM (approx) NVM (optimal) Illustrative curves: Not based on actual data
9 Courtesy of James Hutchby SRC
10 Courtesy of James Hutchby SRC
11 Two Design Options For Supercomputing A processor with >10B transistors on a die in 2020 OR A processor with fewer transistors on a smaller cost effective die 11
12 Option 1: Large Die With>10B Transistors More cache Fewer cores Everything integrated More cores Enough cache for HPC Everything integrated Flavor of cores Enough cache for HPC Everything integrated Enables on-package memory? Cache size beyond a certain threshold not utilized by the programmer High FLOPS count on a die? Enough on-package memory becomes difficult to implement Extreme performance levels result in problematic off-package memory usage Powerful cores for ST performance Smaller cores for highly parallel? Enough on-package memory becomes difficult to implement. Extreme performance levels result in problematic off-package memory usage 12
13 13 Option 2: Cost Effective Die That Supports On-package Memory Building Block Stacked Memory TSV Scalable fabric Processor die matched to performance. Can be much smaller than memory. Broad Usage: With the right memory capacity per building block, it can address a large portion of the HPC market Cost: Building blocks can replace the compute and DRAM in a node (at the right price point) Scalability: Configure building block as memory or memory+compute Power: Better thermal solution with disaggregated compute blocks
14 The Possibilities With the Building Block Approach At Exascale Cost 1 1 Memory capacity (inpackage) Memory capacity (outside package) Evolved 2 TB 300 GB Assume none Number of cores Memory Bandwidth (In-package) Memory Bandwidth (outside-package) 50 TB/s 5 TB/s Assume none 2TB (DDR4/5) 400 GB/s Performance peak 512TF 64TF Synthetic data for illustration only 1) On-package memory has 8-10x the bandwidth compared to external memory 2) At iso cost and memory capacity, on-package memory enables 8-10x additional compute to be placed under the memory 14
15 The Motherboard in 2020: Just a Backplane of Cards? 15
16 Summary While Big data and HPC have different memory access patterns, cost balance will drive to similar system balances. New memory technologies will drive future system architecture design points. Especially for Big data. New packaging technologies open up new directions allowing for a new dimension of disaggregation. Must remember Power is and will remain the biggest challenge. We need to no longer improve performance faster than energy efficiency improvements.
WHITE PAPER Intel Solid-State Drives Increase Productivity of Product Design and Simulation Intel Solid-State Drives Increase Productivity of Product Design and Simulation A study of how Intel Solid-State
White Paper Intel I/O Acceleration Technology Accelerating High-Speed Networking with Intel I/O Acceleration Technology The emergence of multi-gigabit Ethernet allows data centers to adapt to the increasing
RAID Basics Training Guide Discover a Higher Level of Performance RAID matters. Rely on Intel RAID. Table of Contents 1. What is RAID? 2. RAID Levels RAID 0 RAID 1 RAID 5 RAID 6 RAID 10 RAID 0+1 RAID 1E
SF 2009 Designing Solid-State Drives (SSDs) into Data Center Solutions Tony Roug Principal Engineer Datacenter Group Bhaskar Gowda Systems Engineer Datacenter Group MEMS001 Agenda Data center SSD opportunity
FIND THE RIGHT SERVERS FOR YOUR BUSINESS Intel Xeon Processor-Based Server Selection Guide Intel processors power a wide range of server options, from entry-level small business servers, to big data analytic
The Evolving Role of Flash in Memory Subsystems Greg Komoto Intel Corporation Flash Memory Group Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS and TECHNOLOGY.
Whitepaper Summit and Sierra Supercomputers: An Inside Look at the U.S. Department of Energy s New Pre-Exascale Systems November 2014 1 Contents New Flagship Supercomputers in U.S. to Pave Path to Exascale
WHITE PAPER Intel Distribution for Apache Hadoop* Software Financial Services Big Data Analytics Better Performance for Big Data One of the Largest Banks in Italy Speeds Processing for Big Data Analytics
December 2013 Planning Guide Updating IT Infrastructure Four Steps to Better Performance and Lower Costs for IT Managers in Midsize Businesses Why You Should Read This Document This guide provides step-by-step
Intel RAID Software User s Guide: Intel Embedded Server RAID Technology II Intel Integrated Server RAID Intel RAID Controllers using the Intel RAID Software Stack 3 Revision 8.0 August, 2008 Intel Order
White Paper Big Data Analytics Extract, Transform, and Load Big Data with Apache Hadoop* ABSTRACT Over the last few years, organizations across public and private sectors have made a strategic decision
Managing the Real Cost of On-Demand Enterprise Cloud Services with Chargeback Models A Guide to Cloud Computing Costs, Server Costs, Pricing Plans, and Chargeback Implementation and Systems Introduction
WHITE PAPER Intel Trusted Execution Technology Intel Xeon Processor Secure Cloud Computing Building Trust and Compliance in the Cloud with Intel Trusted Execution Technology The Taiwan Stock Exchange Corporation
Technical Introduction of the New USB Type-C Connector Bob Dunstan Principal Engineer, Intel Corporation Yun Ling Principal Engineer, Intel Corporation NETS003 Agenda Introducing the USB Type-C Cable and
Cyber Security Intel Corporation U.S. Executive Order 13636 and Critical Security Capabilities to Consider White Paper Authors Amit Agrawal (Security Strategist, Intel) Jack Lawson (Director - Security,
RAID Chunk Size Notices The information in this document is subject to change without notice. While every effort has been made to ensure that all information in this document is accurate, Xyratex accepts
Intel Rack Scale Architecture Utilizing EMC ScaleIO and CoprHD Open Source Storage Controller Whitepaper August 2015 Revision 001 Document Number: 332970-001 EMC believes the information in this publication
Dell EqualLogic Best Practices Series Sizing and Best Practices for Deploying Citrix XenDesktop on VMware vsphere with Dell EqualLogic Storage A Dell Technical Whitepaper Storage Infrastructure and Solutions
Hur hanterar vi utmaningar inom området - Big Data Jan Östling Enterprise Technologies Intel Corporation, NER Legal Disclaimers All products, computer systems, dates, and figures specified are preliminary
october 2012 Peer Research Report Insights on the Current State of BYOD Intel s IT Manager Survey Why You Should Read This Document Find out how IT managers across four countries are looking at Bring Your
Solution Blueprint Internet of Things (IoT) Telecom Service Providers Create New Revenue Source with Cloud-based IoT Platform for Smart Homes Yoga Systems* uses an Intel-based intelligent gateway to deliver
SQL Server Data Warehouse Fast Track for Tegile 20 TB Certified Data Warehouse Reference Architecture Installation and Configuration Guide 5U Design: Featuring Tegile Zebi HA2400 Storage Array November
IT@Intel White Paper Intel IT IT Best Practices Cloud Computing and Information Security January 2012 Virtualizing High-Security Servers in a Private Cloud Executive Overview Our HTZ architecture and design