OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Similar documents
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

Data Centric Systems (DCS)

OpenPOWER Software Stack with Big Data Example March 2014

Infrastructure Matters: POWER8 vs. Xeon x86

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

A Scalable VISC Processor Platform for Modern Client and Cloud Workloads

Enabling Technologies for Distributed Computing

Enabling Technologies for Distributed and Cloud Computing

System Management Tool for OpenPOWER

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

Next Generation GPU Architecture Code-named Fermi

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Embedded Systems: map to FPGA, GPU, CPU?

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

Accelerating CFD using OpenFOAM with GPUs

Applied Micro development platform. ZT Systems (ST based) HP Redstone platform. Mitac Dell Copper platform. ARM in Servers

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Parallel Programming Survey

Computer Graphics Hardware An Overview

Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration. Sina Meraji sinamera@ca.ibm.com

Data Center and Cloud Computing Market Landscape and Challenges

High Performance or Cycle Accuracy?

Cloud Data Center Acceleration 2015

CFD Implementation with In-Socket FPGA Accelerators

OC By Arsene Fansi T. POLIMI

Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting

Hardware accelerated Virtualization in the ARM Cortex Processors

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Intel Xeon Processor E5-2600

Summit and Sierra Supercomputers:

LSI SAS inside 60% of servers. 21 million LSI SAS & MegaRAID solutions shipped over last 3 years. 9 out of 10 top server vendors use MegaRAID

Introduction to GPGPU. Tiziano Diamanti

Findings in High-Speed OrthoMosaic

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Scaling from Datacenter to Client

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25

Experiences on using GPU accelerators for data analysis in ROOT/RooFit

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

ICRI-CI Retreat Architecture track

Hardware Based Virtualization Technologies. Elsie Wahlig Platform Software Architect

SOC architecture and design

Itanium 2 Platform and Technologies. Alexander Grudinski Business Solution Specialist Intel Corporation

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Introduction to Cloud Computing

Pedraforca: ARM + GPU prototype

Next Generation Operating Systems

How To Build An Ark Processor With An Nvidia Gpu And An African Processor

NVIDIA GRID OVERVIEW SERVER POWERED BY NVIDIA GRID. WHY GPUs FOR VIRTUAL DESKTOPS AND APPLICATIONS? WHAT IS A VIRTUAL DESKTOP?

Xeon+FPGA Platform for the Data Center

Accelerating the Data Plane With the TILE-Mx Manycore Processor

Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation

IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM

Big Data Performance Growth on the Rise

GPU Computing. The GPU Advantage. To ExaScale and Beyond. The GPU is the Computer

White Paper. IBM POWER8: Performance and Cost Advantages in Business Intelligence Systems. 89 Fifth Avenue, 7th Floor. New York, NY 10003

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

HP Moonshot: An Accelerator for Hyperscale Workloads

Intel Xeon +FPGA Platform for the Data Center

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

HPC with Multicore and GPUs

PowerLinux introduction

Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor. Travis Lanier Senior Product Manager

The Benefits of POWER7+ and PowerVM over Intel and an x86 Hypervisor

CUDA Optimization with NVIDIA Tools. Julien Demouth, NVIDIA

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

Kalray MPPA Massively Parallel Processing Array

High Value Insights with Big Data Analytics on IBM Power Systems

The GPU Accelerated Data Center. Marc Hamilton, August 27, 2015

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

Multi-core Programming System Overview

Energy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez

HP ProLiant SL270s Gen8 Server. Evaluation Report

Trends in High-Performance Computing for Power Grid Applications

International Journal of Computer & Organization Trends Volume20 Number1 May 2015

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

Parallel Algorithm Engineering

HUAWEI TECHNOLOGIES CO., LTD. HUAWEI FusionServer X6800 Data Center Server

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Binary search tree with SIMD bandwidth optimization using SSE

VP/GM, Data Center Processing Group. Copyright 2014 Cavium Inc.

GPU Architecture. Michael Doggett ATI

The Future of Computing Cisco Unified Computing System. Markus Kunstmann Channels Systems Engineer

Transcription:

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise, investment, and server-class intellectual property to serve the evolving needs of customers. Opening the architecture to give the industry the ability to innovate across the full Hardware and Software stack Simplify system design with alternative architecture Includes SOC design, Bus Specifications, Reference Designs, FW OS and Open Source Hypervisor Little Endian Linux to ease the migration of software to POWER Driving an expansion of enterprise class Hardware and Software stack for the data center Building a complete ecosystem to provide customers with the flexibility to build servers best suited to the Power architecture 2 OpenPOWER Foundation 2014

Giving ecosystem partners a license to innovate OpenPOWER will enable data centers to rethink their approach to technology. Member companies may use POWER for custom open servers and components for Linux based cloud data centers. OpenPOWER ecosystem partners can optimize the interactions of server building blocks microprocessors, networking, I/O & other components to tune performance. How will the OpenPOWER Foundation benefit clients? OpenPOWER technology creates greater choice for customers Open and collaborative development model on the Power platform will create more opportunity for innovation New innovators will broaden the capability and value of the Power platform What does this mean to the industry? Game changer on the competitive landscape of the server industry Will enable and drive innovation in the industry Provide more choice in the industry Platinum Members 3 OpenPOWER Foundation 2014

Hardware Software Proposed Ecosystem Enablement Power Open Source Software Stack Components Cloud Software Standard Operating Environment (System Mgmt) Operating System / KVM Existing Open Source Software Communities System Operating Environment Software Stack A modern development environment is emerging based on tools and services Firmware OpenPOWER Firmware New OSS Community Hardware OpenPOWER Technology 40,000 packages now available POWER8 Multiple Options to Design with POWER Technology Within OpenPOWER Framework to Integrate System IP on Chip Industry IP License Model CAPP PCIe CAPI over PCIe Customizable Standard POWER Products 2014 Custom POWER SoC Future 4 OpenPOWER Foundation 2014

Building collaboration and innovation at all levels System / Software / Services Implementation / HPC / Research I/O / Storage / Acceleration Chip / SOC Boards / Systems Welcoming new members in all areas of the ecosystem 150+ inquiries and numerous active dialogues underway 5 OpenPOWER Foundation 2014

Proposed Work Groups and Projects System Software (Open Source) Application Software (Open Source) Work Group Projects Participants Open Server Development Platform Hardware Architecture Linux LE KVM Firmware OpenPOWER FW interface POWER LE ABI System Operating Environment OpenPOWER Software ecosystem enablement Toolchain Power 8 Developer Board POWER 8 Reference Design OpenPOWER profile of architecture Power8 ISA Book 1, 2, 3 Coherent Accelerator Interface Architecture (CAIA) Public Public Public Public Public Public Member Member Member Member Compliance Compliance Member 6 OpenPOWER Foundation 2014

7

IBM & NVIDIA Accelerating Computing Next-Gen IBM Supercomputers and Enterprise Servers Long term roadmap integration Enterprise & Data Analytics Software Applications, Tools, Algorithms, Libraries, Languages, Compilers + POWER CPU Tesla GPU GPU-Accelerated POWER-Based Systems Available in 2014 8

Cores 12 cores (SMT8) 8 dispatch, 10 issue, 16 exec pipe 2X internal data flows/queues Enhanced prefetching 64K data cache, 32K instruction cache Accelerators Crypto & memory expansion Transactional Memory VMM assist Data Move / VM Mobility Technology 22nm SOI, edram, 15 ML 650mm2 Energy Management On-chip Power Management Micro-controller Integrated Per-core VRM Critical Path Monitors Caches 512 KB SRAM L2 / core 96 MB edram shared L3 Up to 128 MB edram L4 (off-chip) Memory Up to 230 GB/s sustained bandwidth Bus Interfaces Durable open memory attach interface Integrated PCIe Gen3 SMP Interconnect CAPI (Coherent Accelerator Processor Interface) 9

The World s First GPU-accelerated Power8 Server IBM POWER S824L 2x POWER8 CPUs 10 cores 3.425 GHz or 12 cores 3.026 GHz 6MB L2 / 96MB L3 each CPU 1TB Memory Capacity 384 GB/s Max Mem Bandwidth 2 Tesla K40 GPU Accelerators Linux (Ubuntu, then RedHat) CAPI Support Available starting Oct 31st! 10

SGEMM / W Normalized Strong GPU Roadmap 20 18 Pascal Unified Memory 3D Memory NVLink 16 14 12 Maxwell DX12 10 8 6 Kepler Dynamic Parallelism 4 2 0 Tesla CUDA Fermi FP64 2008 2010 2012 2014 2016 11

Pascal GPU Optimized for double precision FP Very high bandwidth, large capacity 3D memory on package NVLINK for high bandwidth CPU GPU and GPU GPU interconnect Unified Memory (UM) HW support New packaging allows much denser solutions (one-third (one-third the size of current PCIe boards) 12

Stacked Memory 3D chip on wafer integration Multiple layers of DRAM components will be integrated vertically on the package along with the GPU Compared to GDDR5 memory 4x Higher Bandwidth 3x Larger Capacity 4x More Energy Efficient per bit 13

NVLINK CPU GPU communication limited by low bandwidth connection via PCI-e NVLINK is a high speed interconnect between CPU GPU and GPU GPU Basic building block is a 8-lane, differential, dual simplex bidirectional link Multiple links can be aggregated to increase BW of a connection NVLink will provide between 80 and 200 GB/s of bandwidth Preserves the PCIe programming model CPU-initiated transactions such as control and configuration over a PCIe connection GPU-initiated transactions use NVLink Allowing the GPU full-bandwidth access to the CPU s memory system NVLink is more than twice as energy efficient as a PCIe 3.0 connection http://devblogs.nvidia.com/parallelforall/nvlink-pascal-stacked-memory-feeding-appetite-big-data/ 14

NVLINK 15

Three ISAs, One Programming Model CUDA Ecosystem (Libraries, Directives, Languages) x86 ARM POWER 16

17

GPU-Accelerated Hadoop Extract insights from customer data Data Analytics using clustering algorithms Developed using CUDA-accelerated IBM Java 18

Compile Java for GPUs Approach: apply a closure to a set of arrays // vector addition float[] X = {1.0, 2.0, 3.0, 4.0, }; float[] Y = {9.0, 8.1, 7.2, 6.3, }; float[] Z = {0.0, 0.0, 0.0, 0.0, }; jog.foreach(x, Y, Z, new jogcontext(), new jogclosureret<jogcontext>() { public float execute(float x, float y) { return x + y; } } ); foreach iterations parallelized over GPU threads Threads run closure execute() method 14 12 10 8 6 4 2 0 Java Black-Scholes Options Pricing Speedup Speedup vs. Sequential Java 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Millions of Options 19