RWTH GPU Cluster. Sandra Wienke wienke@rz.rwth-aachen.de November 2012. Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky



Similar documents
Case Study on Productivity and Performance of GPGPUs

The RWTH Compute Cluster Environment

GPU Tools Sandra Wienke

The Asterope compute cluster

Introduction to Hybrid Programming

Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)

CUDA Debugging. GPGPU Workshop, August Sandra Wienke Center for Computing and Communication, RWTH Aachen University

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

HPC-Nutzer Informationsaustausch. The Workload Management System LSF

Parallel Processing using the LOTUS cluster

A quick tutorial on Intel's Xeon Phi Coprocessor

Parallel Programming Survey

Overview of HPC systems and software available within

Debugging with TotalView

Cluster Monitoring and Management Tools RAJAT PHULL, NVIDIA SOFTWARE ENGINEER ROB TODD, NVIDIA SOFTWARE ENGINEER

Using NeSI HPC Resources. NeSI Computational Science Team

The RWTH HPC-Cluster User's Guide Version 8.2.2

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

OpenMP Programming on ScaleMP

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

GRID VGPU FOR VMWARE VSPHERE

Informationsaustausch für Nutzer des Aachener HPC Clusters

ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop. Emily Apsey Performance Engineer

Introduction to GPU Computing

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

Accelerating CFD using OpenFOAM with GPUs

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

Using the Windows Cluster

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS

OpenACC Basics Directive-based GPGPU Programming

Keeneland Enabling Heterogeneous Computing for the Open Science Community Philip C. Roth Oak Ridge National Laboratory

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Visualization Cluster Getting Started

Resource Scheduling Best Practice in Hybrid Clusters

Introduction to GPU Programming Languages

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

Introduction to HPC Workshop. Center for e-research

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

High Performance Computing in CST STUDIO SUITE

Clusters with GPUs under Linux and Windows HPC

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Streamline Computing Linux Cluster User Training. ( Nottingham University)

XID ERRORS. vr352 May XID Errors

Retargeting PLAPACK to Clusters with Hardware Accelerators

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

ST810 Advanced Computing

1 DCSC/AU: HUGE. DeIC Sekretariat /RB. Bilag 1. DeIC (DCSC) Scientific Computing Installations

HP ProLiant SL270s Gen8 Server. Evaluation Report

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

High Productivity Computing With Windows

5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model

Hybrid Cluster Management: Reducing Stress, increasing productivity and preparing for the future

Cluster performance, how to get the most out of Abel. Ole W. Saastad, Dr.Scient USIT / UAV / FI April 18 th 2013

Martinos Center Compute Clusters

TOOLS AND TIPS FOR MANAGING A GPU CLUSTER. Adam DeConinck HPC Systems Engineer, NVIDIA

Turbomachinery CFD on many-core platforms experiences and strategies

Hodor and Bran - Job Scheduling and PBS Scripts

Parallel Computing with MATLAB

Introduction to GPGPU. Tiziano Diamanti

Debugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014

Programming the Intel Xeon Phi Coprocessor

Assessing the Performance of OpenMP Programs on the Intel Xeon Phi

System Requirements Table of contents

Pedraforca: ARM + GPU prototype

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

Biowulf2 Training Session

SLURM Workload Manager

Performance Characteristics of Large SMP Machines

Parallel Debugging with DDT

HPC Wales Skills Academy Course Catalogue 2015

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPGPU accelerated Computational Fluid Dynamics

Brainlab Node TM Technical Specifications

High Performance Computing Infrastructure at DESY

HPC Software Requirements to Support an HPC Cluster Supercomputer

Windows HPC Server 2008 Deployment

Experiences with Tools at NERSC

Caltech Center for Advanced Computing Research System Guide: MRI2 Cluster (zwicky) January 2014

High Performance Computing in Aachen

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Installation Guide. (Version ) Midland Valley Exploration Ltd 144 West George Street Glasgow G2 2HG United Kingdom

An HPC Application Deployment Model on Azure Cloud for SMEs

IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

Transcription:

RWTH GPU Cluster Fotos: Christian Iwainsky Sandra Wienke wienke@rz.rwth-aachen.de November 2012 Rechen- und Kommunikationszentrum (RZ)

The RWTH GPU Cluster GPU Cluster: 57 Nvidia Quadro 6000 (Fermi) innovative computer architecture High utilization of resources Daytime VR: new CAVE (49 GPUs) HPC: interactive software development (8 GPUs) Nighttime aixcave, VR, RWTH Aachen, since June 2012 HPC: Processing of GPGPU compute jobs (55-57 GPUs) 2

Hardware stack 4 dialogue nodes 24 rendering nodes 1 head node Name linuxgpud[1-4] linuxgpus[01-24] linuxgpum1 Devices # 2 1 details/gpu NVIDIA Quadro 6000 (Fermi) 448 cores 1.15 GHz 6 GB RAM ECC on max. GFlops: 1030.4 (SP), 515.2 (DP) Host processor 2 x Intel Xeon X5650 EP (Westmere) (12-core CPU) @ 2.67GHz Network RAM 24 GB 48 GB QDR InfiniBand 3

Software stack Environment as on compute cluster (modules, ) CUDA Toolkit: 5.0 (4.1, 4.0, 3.2) CUDA OpenCL (1.0) PGI Compiler module load cuda directory: $CUDA_ROOT CUDA Fortran PGI Accelerator Model module load pgi (module switch intel pgi) PGI OpenACC CUDA Debugging TotalView module load totalview Eclipse 4

How to use? Interactive mode Short runs/tests only, debugging 1 dialogue node (linuxgpud1): 24/7 2 dialogue nodes (linuxgpud[2,3]): Mon Fri, 8am 8pm Batch mode No interaction, commands are queued + scheduled For performance tests, long runs 24+1 rendering nodes 2 dialogue nodes Mon Fri, 8pm 8am; Sat + Son, whole day 1 dialogue node (linuxgpud4): Mon Fri, 8am 8pm for short test runs during daytime Note: reboot at switch from interactive to batch mode 5

How to use: Interactive mode You must be in group gpu to get access to GPU Cluster (e-mail) Jump from frontend cluster node to GPU dialogue node: ssh Y linuxgpud[1-3] GPUs are set to exclusive mode (per process) Only one person can access GPU If occupied, e.g. message all CUDA-capable devices are busy or unavailable If not set a certain device in (CUDA) code, automatically scheduled to other GPU within node (if available) Debugging Be aware: debugger run usually on GPU with ID 0 (fails if GPU is occupied) 6

See what is running: nvidia-smi linuxgpud1$> nvidia-smi Mon Oct 17 12:41:01 2011 +------------------------------------------------------+ NVIDIA-SMI 2.285.05 Driver Version: 285.05.09 -------------------------------+----------------------+----------------------+ GPU ID + type nvidia-smi q Lists GPU details display mode Nb. Name Bus Id Disp. Volatile ECC SB / DB Fan Temp Power Usage /Cap Memory Usage GPU Util. Compute M. ===============================+======================+====================== 0. Quadro 6000 0000:02:00.0 Off 0 0 30% 80 C P0 Off / Off 4% 208MB / 5375MB 99% E. Process -------------------------------+----------------------+---------------------- 1. Quadro 6000 0000:85:00.0 On 0 0 36% 84 C P8 Off / Off 0% 22MB / 5375MB 0% E. Process -------------------------------+----------------------+---------------------- Compute processes: GPU Memory GPU PID Process name Usage ============================================================================= process running on GPU ECC (SB: single bit, DB: double bit) compute mode: 1 person (1 process) 0. 30234 nbody 196MB +-----------------------------------------------------------------------------+ 7

How to use: Batch mode Create batch compute job for LSF Select appropriate queue to get scheduled on GPU-cluster q gpu Exclusive nodes Nodes are allocated exclusively at least 2 GPUs for one job Please use resources reasonably! Submit your job bsub < mygpuscript.sh Starts running as soon as: batch mode starts and job is scheduled Display pending reason: bjobs p During daytime: Dispatch windows closed More documentation Reminder: Only one node in batch mode on daytime (for testing): -a gpu (instead of -g gpu) (-q is given priority to -a) 8 RWTH Compute Cluster User s Guide http://www.rz.rwth-aachen.de/hpc/primer Unix-Cluster Documentation https://wiki2.rz.rwth-aachen.de/display/bedoku/usage+of+the+linux+rwth+compute+cluster

Batch script for single GPU (node) usage #!/usr/bin/env zsh ### Job name #BSUB -J GPUTest-Cuda ### File / path where STDOUT & STDERR will be written to #BSUB -o gputest-cuda.o%j ### Request GPU Queue #BSUB -q gpu ### Request the time you need for execution in [hour:]minute #BSUB -W 15 ### Request virtual memory (in MB) #BSUB -M 512 module load cuda/40 CUDA code needs the whole virtual address space of the node Currently, we disabled the memory limit for the gpu queue cd $HOME/NVIDIA_GPU_Computing_SDK_4.0.17/C/bin/linux/release devicequery -noprompt 9