Intel s vision: the Scalable system framework

Similar documents
HPC & Big Data THE TIME HAS COME FOR A SCALABLE FRAMEWORK

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Big Data Performance Growth on the Rise

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

Vendor Update Intel 49 th IDC HPC User Forum. Mike Lafferty HPC Marketing Intel Americas Corp.

Modernizing Servers and Software

Advancing Towards the Future of Cloud Computing: Intel Open Cloud Vision

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

ECLIPSE Performance Benchmarks and Profiling. January 2009

Sun Constellation System: The Open Petascale Computing Architecture

LS DYNA Performance Benchmarks and Profiling. January 2009

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Current Status of FEFS for the K computer

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Enabling High performance Big Data platform with RDMA

Built for Business. Ready for the Future.

2015 Global Technology conference. Diane Bryant Senior Vice President & General Manager Data Center Group Intel Corporation

The Foundation for Better Business Intelligence

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

FLOW-3D Performance Benchmark and Profiling. September 2012

Intel Platform and Big Data: Making big data work for you.

新 一 代 軟 體 定 義 的 網 路 架 構 Software Defined Networking (SDN) and Network Function Virtualization (NFV)

Dell* In-Memory Appliance for Cloudera* Enterprise

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Intel Desktop public roadmap

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Building a Scalable Storage with InfiniBand

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

Hadoop on the Gordon Data Intensive Cluster

SMB Direct for SQL Server and Private Cloud

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

The Open Cloud Near-Term Infrastructure Trends in Cloud Computing

Reference Architecture - Microsoft Exchange 2013 on Dell PowerEdge R730xd

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

White Paper Solarflare High-Performance Computing (HPC) Applications

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

1000-Channel IP System Architecture for DSS

Scaling up to Production

Interoperability Testing and iwarp Performance. Whitepaper

præsentation oktober 2011

Cloud Computing. Big Data. High Performance Computing

High Performance Computing and Big Data: The coming wave.

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

Transforming your IT Infrastructure for Improved ROI. October 2013

Quick Reference Selling Guide for Intel Lustre Solutions Overview

Business white paper. HP Process Automation. Version 7.0. Server performance

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Intel Service Assurance Administrator. Product Overview

Intel Omni-Path Fabric Edge Switches 100 Series

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Building Enterprise-Class Storage Using 40GbE

New Storage System Solutions

InfiniBand Switch System Family. Highest Levels of Scalability, Simplified Network Manageability, Maximum System Productivity

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

I N V E S T O R M E E T I N G

Overview: X5 Generation Database Machines

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC

Dell s SAP HANA Appliance

RoCE vs. iwarp Competitive Analysis

3G Converged-NICs A Platform for Server I/O to Converged Networks


HPC Update: Engagement Model

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

High-Availability and Scalable Cluster-in-a-Box HPC Storage Solution

SAS Business Analytics. Base SAS for SAS 9.2

Developing High-Performance, Scalable, cost effective storage solutions with Intel Cloud Edition Lustre* and Amazon Web Services

How To Build A Cisco Ukcsob420 M3 Blade Server

Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers

Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre

Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud

White Paper. Innovate Telecom Services with NFV and SDN

Large-Data Software Defined Visualization on CPUs

Storage, Cloud, Web 2.0, Big Data Driving Growth

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks

Choosing the Best Network Interface Card Mellanox ConnectX -3 Pro EN vs. Intel X520

Big data management with IBM General Parallel File System

Big Data Meets High Performance Computing

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

Accelerating Enterprise Big Data Success. Tim Stevens, VP of Business and Corporate Development Cloudera

Scaling from Datacenter to Client

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Replacing SAN with High Performance Windows Share over a Converged Network

Transitioning to 6Gb/s SAS (Serial-Attached SCSI) A Dell White Paper

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

Evaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads

UCS M-Series Modular Servers

VDI Without Compromise with SimpliVity OmniStack and Citrix XenDesktop

Cloud Data Center Acceleration 2015

SUN ORACLE DATABASE MACHINE

Mellanox Academy Online Training (E-learning)

IBM PureFlex System. The infrastructure system with integrated expertise

Transcription:

IDC HPC User Forum Intel s vision: the Scalable system framework Bret Costelow September 2015

Legal Disclaimers Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com]. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance. Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at https://www-ssl.intel.com/content/www/us/en/highperformance-computing/path-to-aurora.html. Intel, the Intel logo, Xeon, and Intel Xeon Phi are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other names and brands may be claimed as the property of others. 2015 Intel Corporation 2

Converged Architecture for HPC and Big Data HPC Big Data Programming Model Resource Manager File System Hardware FORTRAN / C++ Java* Applications Applications Hadoop* MPI Simple to Use High Performance HPC & Big Data-Aware Resource Manager Server Lustre* with Hadoop* Adapter Remote Storage Compute & Big Data Capable Scalable Performance Components Storage Intel (SSDs and Omni-Path Burst Buffers) Architecture Infrastructure *Other names and brands may be claimed as the property of others 3

Intel s Scalable System Framework A Configurable Design Path Customizable for a Wide Range of HPC & Big Data Workloads Small Clusters Through Supercomputers Compute Fabric Intel Silicon Photonics Memory/Storage Software Compute and Data-Centric Computing Standards-Based Programmability On-Premise and Cloud-Based Intel Xeon Processors Intel Xeon Phi Coprocessors Intel Xeon Phi Processors Intel True Scale Fabric Intel Omni-Path Architecture Intel SSDs Intel Lustre-based Solutions Intel Ethernet Intel Silicon Photonics Technology Intel Software Tools HPC Scalable Software Stack Intel Cluster Ready Program 4

Another huge leap in CPU Performance Extreme Parallel performance Parallel performance 72 cores; 2 VPU/core; 6 DDR4 channels with 384GB capacity >3 Teraflop/s per socket 1 Integrated memory 16GB; 5X bandwidth vs DDR 2 3 configurable modes (memory, cache, hybrid) Integrated fabric 2 Intel Omni-Path Fabric Ports (more configuration options) Market adoption >50 systems providers expected 3 >100 PFLOPS customer system compute commits to-date 3 >Software development kits shipping Q4 15 >Standard IA Software Programming Model and Tools Intel s HPC Scalable System Framework 1 Source: Intel internal information. 2 Projected result based on internal Intel analysis of STREAM benchmark using a Knights Landing processor with 16GB of ultra highbandwidth versus DDR4 memory only with all channels populated. 2Over 3 Teraflops of peak theoretical double-precision performance is preliminary and based on current expectations of cores, clock frequency and floating point operations per cycle. FLOPS = cores x clock frequency x floating-point operations per second per cycle. 5

Intel s HPC Scalable System Framework Parallel compute technology Intel Xeon Intel Xeon Phi (KNL) (eliminate offloading) New memory/storage technologies 3D XPoint Technology Next Gen NVRAM Compute Fabric Memory/Storage Software New fabric technology Omni-Path Arch (KNL integration and w/ Intel Silicon Photonics) Intel Silicon Photonics Powerful Parallel File System Intel Enterprise Lustre* Phi Support today Hadoop Adapters Intel Manager for Lustre* 6

Multidisciplinary Approach to HPC Performance 7

Intel moves Lustre forward 8 of Top10 Sites Most Adopted Most Scalable Open Source Commercial Packaging 4 th largest SW company More than 80% of code Vibrant Community HPC User Site Census: Middleware April 2014. Lustre is a registered trademark of Seagate *Other names and brands may be claimed as the property of others. 8

Intel Solutions for Lustre* Roadmap Q2 15 Q3 15 Q4 15 Q1 16 Q2 16 H2 16 H1 17 Enterprise HPC, Commercial, Technical Computing, and Analytics Enterprise Edition 2.2 OpenZFS in Monitoring Mode Hadoop adapter and scheduler connectors DKMS support DSS cache support IOPS heat maps True Scale support Xeon Phi Client RHEL 6.6 support SLES 11 sp3 support Enterprise Edition 2.3 Manager Support for Multiple MDT RHEL 7 Client Support SLES 12 Client Support Enterprise Edition 2.4 Robinhood PE 2.5.5 RHEL 6.7 Server/Client Support SLES 11 SP Server/Client Support Enterprise Edition 3.0 FE Lustre 2.7 Baseline Striped MDT directories ZFS performance enhancements Client metadata performance OpenZFS Snapshots SELinux & Kerberos RHEL 7 (server + client) SLES 12 (server + client) Enterprise Edition 3.1 OpenZFS ZIL support Expanded OpenZFS support in Manager Support for Omni Path networking Enterprise Edition 4.0 File-based replication Layout enhancement Progressive file layout Data on MDT DNE II completion IU shared key network encryption Directory-based Quotas Cloud HPC, Commercial, Technical Computing, and Analytics delivered on Cloud Services Cloud Edition 1.1.1 FE Lustre 2.5.90 GovCloud support Encrypted EBS volumes Placement groups added to templates Client mount cmd. in console IPSec over-the-wire encryption Disaster recovery Supportability enhancements Cloud Edition 1.2 Cloud Edition 2.0 Enhanced DR Automation Password and Security Enhancements Foundation Community Lustre software coupled with Intel support Foundation Edition 2.7 Job tracking with changelogs Dynamic Lnet config development Lfs setstripte to specify OSTs LFSCK 3 MDT-MDT consistency verification IU UID/GID mapping feature (preview) DNE phase lib Multiple modify RPCs UID/GID mapping NRS delay Foundation Edition 2.8 Foundation Edition * Some names and brands may be claimed as the property of others. 2.9 Key: Shipping In Development In Planning Product placement not representative of final launch date within the specified quarter. For more details refer to ILU and 5 Quarter Roadmap.

Intel s HPC Scalable System Framework Parallel compute technology Intel Xeon Intel Xeon Phi (KNL) (eliminate offloading) New memory/storage technologies 3D XPoint Technology Next Gen NVRAM Compute Fabric Memory/Storage Software New fabric technology Omni-Path Arch (KNL integration and w/ Intel Silicon Photonics) Intel Silicon Photonics Powerful Parallel File System Intel Enterprise Lustre* Phi Support today Hadoop Adapters Intel Manager for Lustre* 10

A High Performance Fabric Intel Omni- Path Fabric HPC s Next Generatio n Better scaling vs InfiniBand* 48 port switch chip lower switch costs 73% higher switch MPI message rate 2 33% lower switch fabric latency 3 Configurable / Resilient / Flexible Job prioritization (Traffic Flow Optimization) No-compromise resiliency (Packet Integrity Protection, Dynamic Lane Scaling) Open source fabric management suite, OFA-compliant software Market adoption >100 OEM designs 1 >100ku nodes under contract/bid 1 Intel s HPC Scalable System Framework 1 Source: Intel internal information. Design win count based on OEM and HPC storage vendors who are planning to offer either Intel-branded or custom switch products, along with the total number of OEM platforms that are currently planned to support custom and/or standard Intel OPA adapters. Design win count as of July 1, 2015 and subject to change without notice based on vendor product plans. 2 Based on Prairie River switch silicon maximum MPI messaging rate (48-port chip), compared to Mellanox CS7500 Director Switch and Mellanox SB7700/SB7790 Edge switch product briefs (36-port chip) posted on www.mellanox.com as of July 1, 2015. 3 Latency reductions based on Mellanox CS7500 Director Switch and Mellanox SB7700/SB7790 Edge switch product briefs posted on www.mellanox.com as of July 1, 2015, compared to Intel OPA switch port-to-port latency of 100-110ns that was measured data that was calculated from difference between back to back Fabric osu_latency test and osu_latency test through one switch hop. 10ns variation due to near and far ports on an Eldorado Forest switch. All tests performed using Intel Xeon E5-2697v3, Turbo Mode enabled. Up to 60% latency reduction is based on a 1024-node cluster in a full bisectional bandwidth (FBB) Fat-Tree configuration (3-tier, 5 total switch hops), using a 48-port switch for Intel Omni-Path cluster and 36-port switch ASIC for either Mellanox or Intel True Scale clusters. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance 11

Intel Omni-Path Architecture: Changing the Fabric Landscape CPU-Fabric Integration Intel OPA HFI Card + Optimizing Performance Density Power Cost Intel Xeon Phi processor (Knights Landing) Multi-chip package integration Next Intel Xeon processor Discrete PCIe HFI Intel Xeon processor E5-2600 v3 Discrete PCIe HFI Next Intel Xeon Phi processo (Knights Hill) Next Intel Xeon processor Next Generation Time 12

Higher Bandwidth. Lower Latency and Capacity Intel Omni-Path: Key to Intel s Next Gen Architecture Combines with advances in memory hierarchy to keep data closer to compute better data-intensive app performance and energy efficiency Processor Today compute Future compute Caches Caches In-Package High Bandwidth Memory Enough Capacity to Support Local Application Storage Compute Node I/O Node Remote Storage Interconnect Fabric Interconnect Fabric Local Memory SSD Storage Parallel File System (Hard Drive Storage) Non-Volatile Memory Burst Buffer Storage Parallel File System (SSDs & Hard Drive Storage) Interconnect Fabric Interconnect Fabric Local Processing Node Temporal Storage Faster Checkpointing Quicker Recovery Better App Performance 13

Aurora Built on a Powerful Foundation Breakthrough technologies that deliver massive benefits Compute Interconnect File System 3 rd Generation Intel Xeon Phi >17X performance FLOPS per node >12X memory bandwidth >30PB/s aggregate in-package memory bandwidth Integrated Intel Omni-Path Architecture Processor code name: Knights Hill 2 nd Generation Intel Omni-Path Architecture >20X faster >500 TB/s bi-section bandwidth >2.5 PB/s aggregate node link bandwidth Intel Lustre* Software >3X faster >1 TB/s file system throughput >5X capacity >150PB file system capacity Source: Argonne National Laboratory and Intel. Comparison to ANL s largest current system, MIRA. 14

Backup

Tying it all together: The Intel Fabric Product Roadmap Vision 2014 2015 2016 FUTURE Intel s HPC Scalable System Framework HPC Fabrics Intel True Scale QDR40/80 40/2x40Gb Q4 15 Intel Omni-Path Fabric (First Generation) Future Intel Omni-Path Fabric... Ethernet 10Gb 40Gb 100Gb... Enterprise & Cloud Fabrics Forecast and Estimations, in Planning & Targets The FABRIC is fundamental to meeting the growing needs of the datacenter Intel is extending fabric capabilities, performance, scalability Potential future options, subject to change without notice. All timeframes, features, products and dates are preliminary forecasts and subject to change without further notification. 16

The Interconnect landscape Compute / interconnect cost ratio has changed Compute price/performance improvements continue unabated Current corresponding fabric metrics unable to keep pace as a percentage of total cluster costs which includes compute and storage Challenge: Keeping fabric costs in check to free up cluster $$$ for increased compute and storage capability Com pute Fabri c FDR 56Gb/s Hardware Cost Estimate 1 Intel OPA (x8 HFI) 58Gb/s Fabric EDR 100Gb/s Compute Intel OPA (x16 HFI) 100Gb/s More Compute = More FLOPS 1 Source: Internal analysis based on a 256-node to 2048-node clusters configured with Mellanox FDR and EDR InfiniBand products. Mellanox component pricing from www.kernelsoftware.com Prices as of August 22, 2015. Compute node pricing based on Dell PowerEdge R730 server from www.dell.com. Prices as of May 26, 2015. Intel OPA (x8) Utilizes a 2-1 Oversubscribed Fabric. Analysis Limited Fabric Sizes up to 2K nodes * Other names and brands may be claimed as the property of others. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. All products, dates, and figures are preliminary and are subject to change without any notice. Copyright 2015, Intel Corporation. 17

Evolutionary Approach, Revolutionary Features, End-to-End Solution High performance, low latency fabric designed to scale from entry to the largest supercomputers Builds on proven industry technologies Innovative new features and capabilities to improve performance, reliability and QoS Highly leverage existing Aries and True Scale fabric Re-use of existing OpenFabrics Alliance* software Early market adoption >100 OEM designs 1 >100ku nodes under contract/bid 1 This device has not been authorized as required by the rules of the Federal Communications Commission. This device is not, and may not be, offered for sale or lease, or sold or leased, until authorization is obtained. 1 Source: Intel internal information. Design win count based on OEM and HPC storage vendors who are planning to offer either Intel-branded or custom switch products, along with the total number of OEM platforms that are currently planned to support custom and/or standard Intel OPA adapters. Design win count as of July 1, 2015 and subject to change without notice based on vendor product plans. 2 Based on Prairie River switch silicon maximum MPI messaging rate (48-port chip), compared to Mellanox Switch-IB* product (36-port chip) posted on www.mellanox.com as of August 13, 2015. 3 Latency reductions based on Mellanox SB7700/SB7790 Edge switch product briefs posted on www.mellanox.com as of August 13, 2015 with a stated latency of 90ns, compared to Intel OPA switch port-to-port latency of 100-110ns that was measured data that was calculated from difference between back to back osu_latency test and osu_latency test through one switch hop. 10ns variation due to near and far ports on an Eldorado Forest switch. All tests performed using Intel Xeon E5-2697v3, Turbo Mode enabled. Up to 33% latency reduction is based on a 1024-node cluster in a full bisectional bandwidth (FBB) Fat-Tree configuration, using a 48-port switch for Intel Omni-Path cluster (3 switch hops) and 36-port switch ASIC for either Mellanox or Intel True Scale clusters (5 switch hops). Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance * Other names and brands may be claimed as the property of others. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. All products, dates, and figures are preliminary and are subject to change without any notice. Copyright 2015, Intel Corporation. 18

New Highly Scalable Open Software Stack Intel-led industry collaboration underway to create a complete and cohesive HPC system stack An open community effort Broad range of ecosystem partners Designing, developing and testing Open source availability enables differentiation Intel s HPC Scalable System Framework Benefits the entire HPC ecosystem Innovative system hardware and software Simplify configuration, management and use Accelerate application development Turnkey to customizable Intel-supported versions 19

Lustre.intel.com