The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

Similar documents
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

HPC Wales Skills Academy Course Catalogue 2015

Part I Courses Syllabus

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

Introduction to ACENET Accelerating Discovery with Computational Research May, 2015

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

The GPU Accelerated Data Center. Marc Hamilton, August 27, 2015

MPI / ClusterTools Update and Plans

IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Designed for Maximum Accelerator Performance

Overview of HPC systems and software available within

SUN HPC SOFTWARE CLUSTERING MADE EASY

Hybrid Cluster Management: Reducing Stress, increasing productivity and preparing for the future

Smarter Cluster Supercomputing from the Supercomputer Experts

Amazon EC2 Product Details Page 1 of 5

The Asterope compute cluster

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Scaling from Workstation to Cluster for Compute-Intensive Applications

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Fujitsu HPC Cluster Suite

Petascale Software Challenges. Piyush Chaudhary High Performance Computing

Bright Cluster Manager

5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model

Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)

Linux Cluster Computing An Administrator s Perspective

Cloud Computing through Virtualization and HPC technologies

HPC Software Requirements to Support an HPC Cluster Supercomputer

Debugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014

TOOLS AND TIPS FOR MANAGING A GPU CLUSTER. Adam DeConinck HPC Systems Engineer, NVIDIA

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Sun in HPC. Update for IDC HPC User Forum Tucson, AZ, Sept 2008

Sourcery Overview & Virtual Machine Installation

icer Bioinformatics Support Fall 2011

Smarter Cluster Supercomputing from the Supercomputer Experts

Overview of HPC Resources at Vanderbilt

Clusters with GPUs under Linux and Windows HPC

GTC Presentation March 19, Copyright 2012 Penguin Computing, Inc. All rights reserved

SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center

Allinea Forge User Guide. Version 6.0.1

Energy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez

Working with HPC and HTC Apps. Abhinav Thota Research Technologies Indiana University

Automated Testing of Installed Software

Trends in High-Performance Computing for Power Grid Applications

- An Essential Building Block for Stable and Reliable Compute Clusters

A High Performance Computing Scheduling and Resource Management Primer

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky

GPU Tools Sandra Wienke

GPU Profiling with AMD CodeXL

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

Building an energy dashboard. Energy measurement and visualization in current HPC systems

Developing Parallel Applications with the Eclipse Parallel Tools Platform

Experiences with Tools at NERSC

Technical Computing Suite Job Management Software

Cluster performance, how to get the most out of Abel. Ole W. Saastad, Dr.Scient USIT / UAV / FI April 18 th 2013

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012

Building a Top500-class Supercomputing Cluster at LNS-BUAP

LANL Computing Environment for PSAAP Partners

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

How to Run Parallel Jobs Efficiently

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

Introduction to Hybrid Programming

CUDA in the Cloud Enabling HPC Workloads in OpenStack With special thanks to Andrew Younge (Indiana Univ.) and Massimo Bernaschi (IAC-CNR)

TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING

1 Bull, 2011 Bull Extreme Computing

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Bright Cluster Manager

Scaling LS-DYNA on Rescale HPC Cloud Simulation Platform

Middleware- Driven Mobile Applications

Stream Processing on GPUs Using Distributed Multimedia Middleware

Themis Athanassiadou HPC Project Manager. ClusterVision. ClusterVision. Engineer Innovate Integrate

Eddy Integrated Development Environment, LemonIDE for Embedded Software System Development

Integrated Grid Solutions. and Greenplum

Open Source for Cloud Infrastructure

Until now: tl;dr: - submit a job to the scheduler

Boas Betzler. Planet. Globally Distributed IaaS Platform Examples AWS and SoftLayer. November 9, IBM Corporation

Parallel Programming Survey

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

Building Platform as a Service for Scientific Applications

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

Transcription:

The Top Six Advantages of CUDA-Ready Clusters Ian Lumb Bright Evangelist GTC Express Webinar January 21, 2015

We scientists are time-constrained, said Dr. Yamanaka. Our priority is our research, not managing our clusters. Bright [Cluster Manager] is intuitive to use, and with it I can effectively manage my cluster without wasting time writing scripts, or synchronizing management tool revisions. Provisioning is fast and easy too. I prefer this approach over open source toolkits. http://www.brightcomputing.com/news-tokyo-institute-of-technology-gordon-bell-prize-winner-uses-bright-cluster-managerto-develop-applications-for-one-of-the-worlds-fastest-supercomputers 2

CUDA-Ready Clusters 1. You focus on coding not infrastructure & toolchains 2. You re always in sync with GPUs + CUDA 3. You cross-develop with confidence and ease Maintaining and using highly customized environments 4. You choose and combine in programming GPUs CUDA or OpenCL or OpenACC and combine with MPI 5. You have converged HPC + Big Data Analytics You have access to Hadoop alongside HPC 6. You seamlessly utilize The Cloud You extend into AWS, deploy OpenStack, CUDA-ready clusters are GPU developer-ready

CUDA-Ready Clusters 1. You focus on coding not infrastructure & toolchains 2. You re always in sync with GPUs + CUDA 3. You cross-develop with confidence and ease Maintaining and using highly customized environments 4. You choose and combine in programming GPUs CUDA or OpenCL or OpenACC and combine with MPI 5. You have converged HPC + Big Data Analytics You have access to Hadoop alongside HPC 6. You seamlessly utilize The Cloud You extend into AWS, deploy OpenStack, CUDA-ready clusters are GPU developer-ready

CPU GPUs Memory Disk Ethernet Interconnect IPMI / ilo PDU Bright Cluster Manager CUDA Environment Cluster Management GUI Provisioning User Portal SSL / SOAP / X509 / IPtables Cluster Management Daemon Slurm PBS Pro Torque/Maui Torque/MOAB Grid Engine LSF Monitoring Automation Health Checks Management SLES / RHEL / CentOS / SL Cluster Management Shell Compilers Libraries Debuggers Profilers

Unified Memory http://info.brightcomputing.com/blog/bid/196783/bright-cluster-manager-integrates-support-for-cuda-6 6

7

8

9

NVIDIA GPU Boost 10

Modernized monitoring for HPC clusters http://insidehpc.com/2014/11/monitoring-hpc-clusters-modernized/ 11

Cluster Health Management Provide problem free environment for running jobs Four elements 1. Cluster management automation 2. Regular health checks 3. Pre-job health checks 4. Hardware stability & performance tests All elements above are configurable and extensible

CUDA-Ready Clusters 1. You focus on coding not infrastructure & toolchains 2. You re always in sync with GPUs + CUDA 3. You cross-develop with confidence and ease Maintaining and using highly customized environments 4. You choose and combine in programming GPUs CUDA or OpenCL or OpenACC and combine with MPI 5. You have converged HPC + Big Data Analytics You have access to Hadoop alongside HPC 6. You seamlessly utilize The Cloud You extend into AWS, deploy OpenStack, CUDA-ready clusters are GPU developer-ready

Syncing with GPUs + CUDA Innovation characterizes the entire history and evolution of GPU programmability through CUDA BUT introduces challenges and opportunities Bright Computing s approach leverages People Proactively maintaining business and technical relationships Process `Hands-on engineering begins with release candidates Product Preliminary to fully productized implementations Bright Cluster Manager released once twice per year Updates flow continuously http://info.brightcomputing.com/blog/cuda-6.5-something-for-nothing http://www.brightcomputing.com/news-bright-cluster-manager-adds-support-for-the-nvidia-tesla-k80-dual-gpu-accelerator

CUDA-Ready Clusters 1. You focus on coding not infrastructure & toolchains 2. You re always in sync with GPUs + CUDA 3. You cross-develop with confidence and ease Maintaining and using highly customized environments 4. You choose and combine in programming GPUs CUDA or OpenCL or OpenACC and combine with MPI 5. You have converged HPC + Big Data Analytics You have access to Hadoop alongside HPC 6. You seamlessly utilize The Cloud You extend into AWS, deploy OpenStack, CUDA-ready clusters are GPU developer-ready

Available Versions of the CUDA Toolkit 16

Using CUDA 6.0 17

CUDA-Ready Clusters 1. You focus on coding not infrastructure & toolchains 2. You re always in sync with GPUs + CUDA 3. You cross-develop with confidence and ease Maintaining and using highly customized environments 4. You choose and combine in programming GPUs CUDA or OpenCL or OpenACC and combine with MPI 5. You have converged HPC + Big Data Analytics You have access to Hadoop alongside HPC 6. You seamlessly utilize The Cloud You extend into AWS, deploy OpenStack, CUDA-ready clusters are GPU developer-ready

HPC Development Environment Compilers (GNU, Intel*, AMD, Portland*, etc.) Debuggers and profilers (GNU, TAU, Allinea, TotalView) MPI libraries (OpenMPI, MPICH, MPICH-MX, MVAPICH) Other libraries (threading libraries, OpenMP, Global Arrays, HDF5, IIPP, TBB, NetCDF, PETSc, etc.) Mathematical libraries (ACML, MKL*, FFTW, GMP, GotoBLAS, ScaLAPACK, etc.) Environment modules

Programming GPUs CUDA OpenCL OpenACC MPI Tools CUDA gdb nvidia-smi CUDA Utility Library Examples 3 rd Party Allinea Rogue Wave

CUDA Development Environment

CUDA-Ready Clusters 1. You focus on coding not infrastructure & toolchains 2. You re always in sync with GPUs + CUDA 3. You cross-develop with confidence and ease Maintaining and using highly customized environments 4. You choose and combine in programming GPUs CUDA or OpenCL or OpenACC and combine with MPI 5. You have converged HPC + Big Data Analytics You have access to Hadoop alongside HPC 6. You seamlessly utilize The Cloud You extend into AWS, deploy OpenStack, CUDA-ready clusters are GPU developer-ready

HPC and Hadoop Use GPUs for HPC and Big Data Analytics Introduce GPUs into Hadoop clusters Make use of Hadoop services

25

26

CUDA-Ready Clusters 1. You focus on coding not infrastructure & toolchains 2. You re always in sync with GPUs + CUDA 3. You cross-develop with confidence and ease Maintaining and using highly customized environments 4. You choose and combine in programming GPUs CUDA or OpenCL or OpenACC and combine with MPI 5. You have converged HPC + Big Data Analytics You have access to Hadoop alongside HPC 6. You seamlessly utilize The Cloud You extend into AWS, deploy OpenStack, CUDA-ready clusters are GPU developer-ready

GPUs in the Cloud? The Top Four Reasons 1. You can realize possibilities using the cloud You can scale up and scale out 2. You still realize the promise of GPU programmability via HPC in the cloud 3. Your use of the cloud is transparent You ve found ways to `hide latency Constraints apply for MPI apps 4. Your go-to apps still work in the cloud http://info.brightcomputing.com/blog/bid/196290/the-top-4-reasons-you-should-try-cloud-based-gpus-for-hpc

Cloud Utilization Scenario I Cluster on Demand node001 head node node002 node003

Cloud Utilization Scenario II Cluster Extension node006 node004 node007 node005 head node node001 node002 node003

31

CUDA-Ready Clusters 1. You focus on coding not infrastructure & toolchains 2. You re always in sync with GPUs + CUDA 3. You cross-develop with confidence and ease Maintaining and using highly customized environments 4. You choose and combine in programming GPUs CUDA or OpenCL or OpenACC and combine with MPI 5. You have converged HPC + Big Data Analytics You have access to Hadoop alongside HPC 6. You seamlessly utilize The Cloud You extend into AWS, deploy OpenStack, CUDA-ready clusters are GPU developer-ready

Case Study: TUAT (1) The Customer Engages materials-science research Compares computational models with physical experiments High-resolution, 3D phase field modeling at large scales using GPUs The Challenge Make available the latest innovations in GPU technology without distracting focus from research

Case Study: TUAT (2) The Solution Laboratory GPU cluster designed and implemented by HPCTech Corp. Bright Cluster Manager deployed by HPCTech Use Bright to fully manage the entire CUDA environment including regular updates Use modules environment via Bright to manage multiple CUDA environments Prototype simulations using laboratory HPC cluster Includes debugging and tuning code Execute large-scale simulations using TSUBAME The Results

51μm 0.01 0.38 [wt.%] Calculation steps : 25000 150000 275000 Caption: Snapshots of austenite-to-ferrite transformation behavior in Fe-C alloy simulated by a multi-phase-field method. Upper and lower panels show time evolution of ferrite grains and carbon concentration during the phase transformation. The simulation was performed on 512 512 256 computational grids using 8 GPUs in lab cluster. (Prof. A. Yamanaka, TUAT)

Elapsed time [ 1000 s] 5 4 3 2 1 0 128 256 512 Number of GPUs Caption: Performance of multiple-gpu computation of multi-phase-field simulation of austenite-to-ferrite transformation in Fe-C alloy. The performance was measured by performing the simulations on TSUBAME2.5 supercomputer of Tokyo Institute of Technology. The number of computational grids, crystal grains and calculation steps were 512 3, 4068 and 10 5, respectively. (Prof. A. Yamanaka, TUAT, priv. comm.) http://www.tuat.ac.jp/~yamanaka/

Case Study: TUAT (3) We scientists are time-constrained, said Dr. Yamanaka. Our priority is our research, not managing our clusters. Bright is intuitive to use, and with it I can effectively manage my cluster without wasting time writing scripts, or synchronizing management tool revisions. Provisioning is fast and easy too. I prefer this approach over open source toolkits. 37

CUDA-Ready Clusters 1. You focus on coding not infrastructure & toolchains 2. You re always in sync with GPUs + CUDA 3. You cross-develop with confidence and ease Maintaining and using highly customized environments 4. You choose and combine in programming GPUs CUDA or OpenCL or OpenACC and combine with MPI 5. You have converged HPC + Big Data Analytics You have access to Hadoop alongside HPC 6. You seamlessly utilize The Cloud You extend into AWS, deploy OpenStack, CUDA-ready clusters are GPU developer-ready

Q & A Ian Lumb, ian.lumb@brightcomputing.com http://www.brightcomputing.com/

Additional Slides

42

Cluster Health Management Goal: provide problem free environment for running jobs Four elements 1. Cluster management automation 2. Regular health checks Actions that return PASS, FAIL or UNKNOWN Can be associated with a settable severity and a message Can launch an action based on any response value 3. Pre-job health checks Let the workload manager hold the job very briefly Check the health of each reserved node If unhealthy, take the node offline, inform the system administrator Let the workload manager reschedule the job to a different set of nodes 4. Hardware stability & performance tests Very wide range of tests May include disk overwrites and reboot(s) All elements above are configurable and extensible

Bright API 44