Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Similar documents

Recent Advances in HPC for Structural Mechanics Simulations

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Scaling from Workstation to Cluster for Compute-Intensive Applications

Accelerating CFD using OpenFOAM with GPUs

Jean-Pierre Panziera Teratec 2011

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

High Performance Computing in CST STUDIO SUITE

Case Study on Productivity and Performance of GPGPUs

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Parallel Programming Survey

HPC with Multicore and GPUs

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

GPGPU accelerated Computational Fluid Dynamics

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Retargeting PLAPACK to Clusters with Hardware Accelerators

Multicore Parallel Computing with OpenMP

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

Turbomachinery CFD on many-core platforms experiences and strategies

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

Overview of HPC systems and software available within

A quick tutorial on Intel's Xeon Phi Coprocessor

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING

Evaluation of CUDA Fortran for the CFD code Strukti

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

SGI HPC Systems Help Fuel Manufacturing Rebirth

High Performance Matrix Inversion with Several GPUs

Using GPUs in the Cloud for Scalable HPC in Engineering and Manufacturing March 26, 2014

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

HPC-related R&D in 863 Program

Scientific Computing Data Management Visions

Building a Top500-class Supercomputing Cluster at LNS-BUAP

GTC Presentation March 19, Copyright 2012 Penguin Computing, Inc. All rights reserved

Next Generation GPU Architecture Code-named Fermi

LS DYNA Performance Benchmarks and Profiling. January 2009

GPU Acceleration of the SENSEI CFD Code Suite

Enterprise HPC & Cloud Computing for Engineering Simulation. Barbara Hutchings Director, Strategic Partnerships ANSYS, Inc.

Clusters: Mainstream Technology for CAE

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Emerging Technology for the Next Decade

High Performance GPGPU Computer for Embedded Systems

The GPU Accelerated Data Center. Marc Hamilton, August 27, 2015

Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing

Performance Evaluation of Amazon EC2 for NASA HPC Applications!

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

Part I Courses Syllabus

PBS Works: The Trusted Suite for HPC Workload Management

5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model

HP Workstations graphics card options

Trends in High-Performance Computing for Power Grid Applications

ST810 Advanced Computing

Overview of HPC Resources at Vanderbilt

Hybrid Cluster Management: Reducing Stress, increasing productivity and preparing for the future

GPUs for Scientific Computing

HP Workstations graphics card options

HP Z Workstations graphics card options

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Cloud Computing. Alex Crawford Ben Johnstone

Simulation Platform Overview

Graphic Processing Units: a possible answer to High Performance Computing?

PRIMERGY server-based High Performance Computing solutions

Benchmarking Large Scale Cloud Computing in Asia Pacific

Cloud Computing through Virtualization and HPC technologies

Kriterien für ein PetaFlop System

Cluster, Grid, Cloud Concepts

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

HP ProLiant SL270s Gen8 Server. Evaluation Report

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

CUDA in the Cloud Enabling HPC Workloads in OpenStack With special thanks to Andrew Younge (Indiana Univ.) and Massimo Bernaschi (IAC-CNR)

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

Recommended hardware system configurations for ANSYS users

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age

Deep Learning GPU-Based Hardware Platform

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

Introduction to HPC Workshop. Center for e-research

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems

Transcription:

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012

Altair Knows HPC Altair is the only company that: makes HPC tools develops HPC applications and uses these to solve real HPC problems 500 Altair engineers worldwide use HPC every day for real-world modeling & simulation

Innovation Intelligence 26+ Years of Innovation 40+ Offices in 18 Countries 1500+ Employees Worldwide

Customers Automotive Aerospace Heavy Equipment Government Life/Earth Sciences Consumer Goods Energy 3,200+ customers worldwide

Hardware Trends and Software Evolution Multi Level Parallelism Single Threaded & Vectorization SMP Parallelization MPI Clusterization Hybrid MPI/OpenMP & Heterogeneous CPU/GPU Source : Intel Corp. 5

Motivations to use GPU for Solvers 1600 60 $ Cost effective solution 1400 1200 50 50 1450 50 Power efficient solution 1000 40 G flops 800 795 30 600 20 How much of the peak can we get? Which part of the code is best suited? How much coding effort is required? What will be the speedup for my application? 400 200 0 140 280 0.74 0.74 1 Node Intel X5670 2 Nodes Intel X5670 1.92 12 8.5 2.1 1 Node+1 Nvidia 1 Node+2 Nvidia M2090 M2090 Gflop Peak (DP) Gflop cost in $ Gflops per Watt 10 0 6

NVIDIA Ecosystem based on CUDA (+ OpenACC with CRAY, PGI and CAPS) Any NVIDIA graphic card supports CUDA Strong market presence Products Tesla and Quadro GPUs based on the Fermi Architecture Announcement of Kepler Architecture, ~2X faster than Fermi, HyperQ, virtualization, 7

AcuSolve GPU Porting on NVIDIA CUDA High performance computational fluid dynamics software (CFD) The leading finite element based CFD technology High accuracy and scalability on massively parallel architectures S-duct 80K Degrees of Freedom Hybrid MPI/OpenMP for Multi-GPU test 1000 750 Xeon Nehalem CPU Nehalem CPU + Tesla GPU 500 549 549 250 279 165 2X* 3.3X* Lower is better 0 4 core CPU 1 core + 1 GPU 4 core CPU 2 core + 2 GPU *Performance gain versus 4 core CPU 8

RADIOSS Porting on GPU Assess the potential of GPU for RADIOSS Focus on Implicit Direct Solver Highly optimized compute intensive solver Limited scalability on multicores and cluster Iterative Solver Ability to solve huge problems with low memory requirement Efficient parallelization High cost on CPU due to large number of iterations for convergence Double Precision required Integrate GPU parallelization into our Hybrid parallelization technology 9

RADIOSS Porting on GPU & Accelerators CUDA official version planned for HW12 RADIOSS implicit direct solver and iterative solver available under Linux Support for NVIDIA Fermi and Kepler (K20 Q1 13) Altair RADIOSS presentation at GTC2012 OpenCL Beta version Only RADIOSS Implicit Iterative Solver available under Linux and Windows Support for AMD FirePro Altair RADIOSS presentation at AFDS2011 10

RADIOSS Direct Solver GPU Porting CUBLAS (DGEMM) perfect candidate to speed up update module Frontal matrix could be too huge to fit in GPU memory Frontal matrix could be too small and thus inefficient w/ GPU Data transfer is not trivial Only the lower triangular matrix is interesting Pivoting is required in real applications Pivot searching is a sequential operation Factor module has limited parallel potential It could be as expensive as update module in extreme case Asynchronous computing Overlap the computation Overlap the communication 11

RADIOSS Direct Solver Non Pivoting Case Linear static Non Pivoting Case Benchmark 2.8 Millions of Degrees of Freedom Platform Intel Xeon X5550, 4 Core, 48GB RAM NVIDIA C2070, CUDA 4.0 MKL 10.3, RHEL 5.1 GPU speedup - non-pivoting case 1600 1400 solver elapsed 1,2 1 base(bcslib-ext) improved 1200 e l a p s e d ( s ) 1000 800 600 400 Lower is better p ro file (% ) 0,8 0,6 0,4 200 0,2 0 4 CPU 4 CPU + 1GPU 3.9X 2.9X 0 update total 12

RADIOSS Iterative Solver (Implicit) Preconditioned Conjugate Gradient (PCG) solves iteratively M -1. (Ax b) = 0 Convergence speed depends on preconditioner M Low memory consumption Efficient parallelization: SMP, MPI Porting under CUDA Few kernels to write in Cuda Sparse Matrix Vector Use of CUBLAS DAXPY & DDOT Extend Hybrid to multi GPUs programming MPI to manage communication between GPU Portions of code not GPU enabled benefit from OpenMP parallelization Programming model expandable to multi nodes with multiple GPUs 13

RADIOSS PCG Linear Benchmark #2 Linear Problem #2 Hood of a car with pressure loads Refined model Compute displacements and stresses Benchmark 2.2 Millions of Degrees of Freedom E la p s e d (s ) 1200 1000 800 600 1106 572 SMP 6-core Hybrid 2 MPI x 6 SMP SMP 6 + 1 GPU Hybrid 2 MPI x 6 SMP + 2 Platform Performance 62 Millions of non zero 380000 Shells + 13000 Solids + 1100 RBE3 5300 iterations NVIDIA PSG Cluster 2 nodes with: Dual NVIDIA M2090 GPUs CUDA v3.2 Intel Westmere 2x6 X5670@2,93Ghz Linux RHEL 5.4 with Intel MPI 4.0 Elapsed time decreased by up to 13X E la p s e d (s ) 400 200 0 350 300 250 200 150 254 143 4.3X* 7.5X* 1 node - 2 Nividia M2090 306 Hybrid 4 MPI x 6 SMP Hybrid 4 MPI x 6 SMP + 4 Lower is better 100 85 Multi GPUs performance is better for bigger models 50 0 13X* 2 nodes - 4 Nvidia M2090 *Performance gain versus SMP 6-core 14

GPU For Explicit? Explicit is very difficult to port on GPU Data movement between CPU and GPU current bottleneck Require full memory allocated on GPU and therefore full code ported Huge development cost Challenge of multiple code bases Potential gain remains small with current technology approach Data locality and cache reuse is not possible on GPU Memory size limitation Follow hardware evolution AMD APU (CPU+GPU) Intel Xeon Phi NVIDIA through OpenACC initiative 15

Altair Solvers on NVIDIA GPUs RADIOSS implicit direct & iterative solvers have been successfully ported on NVIDIA GPU using CUDA Adding GPU improves performance of Altair solvers significantly For iterative solver, Hybrid MPP allows to run on multi GPU card workstation and GPU cluster with good scalability and enhanced performance GPU support for implicit solvers is planned for HyperWorks 12 16

HyperWorks 3D display OpenGL stereoscopic display in HyperMesh & HyperView Supported by NVIDIA graphics cards Compatible with NVIDIA 3D Vision glasses Available in HyperWorks 12.0

HyperView Performance Gains From Shaders Gains in graphics performance 2x to 6x improvement in animation frame rate, model rotation, & pan

Licensing Model for Solvers Licensing has been set to encourage GPU testing One GPU is considered as one additional core only 19

Where does PBS Professional manage NVIDIA GPUs? Tokyo Institute of Technology TSUBAME 2.0 Reliability Robust advance reservations GPU support Mixed architecture support (Linux, Windows) HP s biggest Top500 machine ~100,000 simultaneous jobs ~2,000 users PBS Professional offers THE best scalability in the industry NEC 2.4 Petaflops 17,984 cores 4,200 GPUs 80 TB mem 174 TB SSD 7 PB disk Decreed as one of the Greenest Decreed as one of the Greenest Production Supercomputer in the World on the June Production 2012 Green Supercomputer 500 List, for achieving in the World a high level on the of June energy-efficiency 2012 Green 500 List, for achieving a high level of energy-efficiency.

Purpose-Built for High Performance Computing IDC study (published 2009) on job scheduling ranked PBS Professional #1 Easy to Use Hard to Break Compute Manager Do More (with Less) Keep Track and Plan Open Architecture 5 Strategic Pillars

Accelerating Your Gateway to HPC Cloud Computing Structure our Thoughts High Performance [N/W& H/W] Dynamic Provisioning [Self Service] Multi-Tenant Large Data Handling Elasticity / Scale On Demand / Pay as you go Ease of access and use / Abstract

Accelerating your Gateway to HPC Cloud Computing End Users Needs Focus on My Task / Project Deliver Results! Does have Workstations Limited Access to Software Ideas, Innovation, Agility Does NOT have Expertise in IT Infrastructure Sufficient compute resources Financial / Abundance of Money

Accelerating your Gateway to HPC Cloud Computing End Users Needs Ease of Access: All you need is just a Browser to access HPC & Domain specific resources Ease of Use: All you need is a browser based application hiding the complexity Handle Big Data: All you need is a way to visualize the managed big data again through a browser based application User Interface Portals

Accelerating your Gateway to HPC Cloud Computing Altair Enterprise Portals Ease of Use Ease of Access Handle Big Data HyperWorks Enterprise Applications Compute Manager Display Manager Simulation Manager Process Manager Performance Manager

COMPUTE MANAGER Ease of Use Ease of Access Handle Big Data

Accelerating your Gateway to HPC Cloud Computing Compute Manager Let s See Ease of Use Ease of Access Handle Big Data

Accelerating your Gateway to HPC Cloud Computing Compute Manager Let s See Ease of Use Ease of Access Handle Big Data

Accelerating your Gateway to HPC Cloud Computing HyperWorks Enterprise Complete Simulation Lifecycle Management HyperWorks Units Compute Display Simulation Process Performance Manager Manager Manager Manager Manager HyperWorks Enterprise HyperWorks PBS Professional HyperWorks Units It is HyperWorks in the Cloud

Accelerating your Gateway to HPC Cloud Computing Implementation Models HyperWorks Enterprise PBS Works HyperWorks On-Demand Hybrid Cloud PBS Professional Private Cloud Public Cloud

HYPERWORKS ON-DEMAND A Public Cloud run by Altair

Accelerating your Gateway to HPC Cloud Computing HyperWorks On-Demand: The Basics IaaS State-of-the-art HPC (Infrastructure as a Service) computation nodes Scalable Provision up to 10,000 compute, cores QDR storage, Non-blocking networking InfiniBand and other basic computing Physical and cyber security resources where arbitrary software can be deployed and executed. Resource Provisioning PaaS Scheduling Security (Platform Framework as a Service) Licensing Framework To build and deploy consumer created applications using the infrastructure provided by Altair. Remote Visualization Framework Notification Framework Portal & Mash up Framework Collaboration Framework Powered by PBS Professional & HyperWorks Enterprise HyperWorks Solvers SaaS HyperWorks (Software as Desktop a Service) 1 Altair To Partner consume Alliance the 1 Altair provisioned application deployed on the cloud infrastructure from anywhere using any device through a thin client interface 1: coming a utility soonmode. IaaS + PaaS + SaaS = HWaaS

Accelerating your Gateway to HPC Cloud Computing The High-Powered Altair Data Center Scalable, modular facility located in Troy, MI Can be easily extended to support up to more than 10,000 cores Incorporates extensive physical and cyber security measures Extremely high compute-power density A state-of-the-art facility for HPC simulation computing

Accelerating your Gateway to HPC Cloud Computing HyperWorks On-Demand Altair s user portals is the gateway to HyperWorks On-Demand

Altair Knows HPC Altair is the only company that: makes HPC tools develops HPC applications and uses these to solve real HPC problems 500 Altair engineers worldwide use HPC every day for real-world modeling & simulation

For More Information Contact Us Devin Jensen Director, Americas PBS Field Operations jensen@altair.com Visit Us Online www.altair.com www.altairhyperworks.com www.pbsworks.com Thanks, NVIDIA Thank YOU!