High-Performance Computing



Similar documents
Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge December 1st, 2015

Matlab on a Supercomputer

MFCF Grad Session 2015

Using the Yale HPC Clusters

An Introduction to High Performance Computing in the Department

User s Manual

MATLAB Distributed Computing Server Cloud Center User s Guide

Miami University RedHawk Cluster Working with batch jobs on the Cluster

Introduction to MSI* for PubH 8403

Chapter 2: Getting Started

SSH Connections MACs the MAC XTerm application can be used to create an ssh connection, no utility is needed.

Bringing Big Data Modelling into the Hands of Domain Experts

Parallel Computing with MATLAB

Beyond Windows: Using the Linux Servers and the Grid

HPCC - Hrothgar Getting Started User Guide

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Secure Shell. The Protocol

MATLAB Distributed Computing Server Installation Guide. R2012a

Installing a Symantec Backup Exec Agent on a SnapScale Cluster X2 Node or SnapServer DX1 or DX2. Summary

1. Product Information

Online Backup Client User Manual Linux

LSKA 2010 Survey Report Job Scheduler

RecoveryVault Express Client User Manual

The Asterope compute cluster

Online Backup Linux Client User Manual

Operating Systems OBJECTIVES 7.1 DEFINITION. Chapter 7. Note:

Parallel Computing with Mathematica UVACSE Short Course

Online Backup Client User Manual

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

High-Performance Reservoir Risk Assessment (Jacta Cluster)

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

Calcul Parallèle sous MATLAB

Deployment of Keepit for Windows

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.

CASHNet Secure File Transfer Instructions

SSH and Basic Commands

Uptime Infrastructure Monitor. Installation Guide

Visualization Cluster Getting Started

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

Quick Tutorial for Portable Batch System (PBS)

WinSCP PuTTY as an alternative to F-Secure July 11, 2006

Navigating the Rescue Mode for Linux

FileCruiser Backup & Restoring Guide

Parallel Processing using the LOTUS cluster

How To Set Up A Backupassist For An Raspberry Netbook With A Data Host On A Nsync Server On A Usb 2 (Qnap) On A Netbook (Qnet) On An Usb 2 On A Cdnap (

RenderStorm Cloud Render (Powered by Squidnet Software): Getting started.

Martinos Center Compute Clusters

Backing Up TestTrack Native Project Databases

Instructions for Accessing the Advanced Computing Facility Supercomputing Cluster at the University of Kansas

CycleServer Grid Engine Support Install Guide. version 1.25

The RWTH Compute Cluster Environment

File Transfers. Contents

IBM WebSphere Application Server Version 7.0

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007

MATLAB Distributed Computing Server System Administrator's Guide

Data management on HPC platforms

PuTTY/Cygwin Tutorial. By Ben Meister Written for CS 23, Winter 2007

MATLAB Distributed Computing Server System Administrator s Guide. R2013b

Getting Started with HPC

LOCKSS on LINUX. Installation Manual and the OpenBSD Transition 02/17/2011

New High-performance computing cluster: PAULI. Sascha Frick Institute for Physical Chemistry

Installation and Configuration Guide for Windows and Linux

Bootstrap guide for the File Station

bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 29.

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria

CYCLOPE let s talk productivity

Accessing VirtualBox Guests from Host using SSH, WinSCP and Tunnelling

wu.cloud: Insights Gained from Operating a Private Cloud System

Additional Information: SSH, PuTTY, and VmWare

Online Backup Client User Manual

Connecting to the School of Computing Servers and Transferring Files

HPC system startup manual (version 1.30)

Installation and Configuration Guide for Windows and Linux

Week Overview. Running Live Linux Sending from command line scp and sftp utilities

NETWRIX EVENT LOG MANAGER

LOCKSS on LINUX. CentOS6 Installation Manual 08/22/2013

:Introducing Star-P. The Open Platform for Parallel Application Development. Yoel Jacobsen E&M Computing LTD

File Transfer Examples. Running commands on other computers and transferring files between computers

Pearl Echo Installation Checklist

Using SSH Secure FTP Client INFORMATION TECHNOLOGY SERVICES California State University, Los Angeles Version 2.0 Fall 2008.

PARALLELS SERVER BARE METAL 5.0 README

Hodor and Bran - Job Scheduling and PBS Scripts

HIPAA Compliance Use Case

ADAM 5.5. System Requirements

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

NTP Software QFS for NAS, NetApp Edition Installation Guide

NET SatisFAXtion 8.7 Server Upgrade Guide

Batch Systems. provide a mechanism for submitting, launching, and tracking jobs on a shared resource

PROGRAMMING FOR BIOLOGISTS. BIOL 6297 Monday, Wednesday 10 am -12 pm

Sample copy. Introduction To WebLogic Server Property of Web 10.3 Age Solutions Inc.

AIMS Installation and Licensing Guide

SURE Program Patrick Diez

Job Scheduling with Moab Cluster Suite

Scyld Cloud Manager User Guide

Grid 101. Grid 101. Josh Hegie.

Transcription:

High-Performance Computing Windows, Matlab and the HPC Dr. Leigh Brookshaw Dept. of Maths and Computing, USQ 1 The HPC Architecture 30 Sun boxes or nodes Each node has 2 x 2.4GHz AMD CPUs with 4 Cores each and 16GB RAM. Theoretically possible to run 240 independent simultaneous jobs. One extra node is the controller or administration node. Access to the HPC is via the administration node. One extra node is the Input/Output node it is the disk controller for the HPC. All disk space (5 Tb) is controlled by this node. One extra node for Matlab clients The disk space is visible to all nodes. 2.9Tb backed up, 2.1Tb temporary space not backed up and cleared at reboot. One extra machine, for code testing and interactive jobs, has 4 x 2.4GHz AMD CPUs with 6 Cores each and 64GB. 2

The HPC Architecture 3 The HPC Architecture CPU speed is equivalent to a workstation Computational power is derived from using more than one node/core. There is no advantage if you run one job on one core. 4

Connecting to the HPC habeus only for code testing, some interactive jobs and some batch jobs. usqhpcio normally don t need to connect to it. Controls the disks should never run jobs on this node. usqhpc main access point to the HPC. Contains the queues for submitting jobs. usqhpcm for Matlab jobs using the Parallel Toolbox. The only access to the HPC is via a Secure Shell from the USQ network. This ensures all communication to the HPC is encrypted and secure. The HPC uses RedHat Linux. Most interaction with the HPC does not require a detailed knowledge of the Unix/Linux command line about 10 conmmands 5 Connecting to the HPC Windows Utilities PuTTY creates an SSH connection and provides a command line interface to a remote machine. WinSCP provides a traditional Windows interface for copying files between the local machine and the remote machine. Uses SSH to ensure all communication is encrypted notepad++ a good all purpose text/code editor that recognises Windows, Unix and MacOS text files. 6

PuTTY Main Window Need the name of the machine you wish to connect to 7 PuTTY First Connect First time PuTTY connects to a machine it will ask you whether it should download the remote host s identifying key answer Yes 8

PuTTY Command Line Window Need to enter your HPC username and password. 9 WinSCP Connecting to remote host Need to enter the machine name and your username. 10

WinSCP Preferences In the preferences you can specify which editor to use. Remote files are downloaded edited locally then uploaded. Downloading and uploading of files is done automatically. 11 WinSCP Main Window 12

Types of Jobs Two basic types of jobs are run on multiple computing nodes Distributed Jobs: Also called Course Grained jobs Each process is completely independent of each other Little or no communication between processes Examples: Parameter space search, running the same program repeatedly with different input parameters, &c. Parallel Jobs: Also called Fine Grained jobs Each process deals with a part of the problem Communication synchronisation required between processes. Example: Processing of large data sets that will not fit on one node, computational domains that need to be split across nodes, CFD &c. 13 Distributed Job Example Paradigm Master-Slave processing 14

Parallel Job Example Paradigm Peer-Peer processing 15 HPC Job Submission Jobs are run via a Batch System PBS (Portable Batch System) A batch system requires jobs to run unsupervised! Jobs submitted on batch queue batch system starts job running when requested resources are available Requested Resources: Number of nodes Number of cores per node Memory per process Total amount of memory for the job Maximum amount of time... Jobs are submitted via a Shell Script or via a Matlab script Shell Script examples are available on the HPC web site 16

Shell Script Example #Select resources ##### #PBS -N Test-Matlab #PBS -l nodes=7:ppn=3 ##### Queue ##### #PBS -q standard ##### Mail Options ##### #PBS -m bea #PBS -M leighb@usq.edu.au ##### Change to current working directory ##### cd /home/mcsci/leighb/test ##### Execute Program ##### /usr/local/bin/matlab -nodisplay -nodesktop -nosplash < driver.m 17 Using Matlab 18

Matlab s Parallel Toolbox Provides the infrastructure scripts/commands for Parallel or Distributed computing Ability to create Matlab Workers (a running instance of Matlab). Ability to assign tasks to Workers (pass a script to run on a worker). Ability to communicate between Workers (running scripts can communicate with each other). Provides a Local Scheduler that will allow you to start one worker per core on one node. Maximum number of workers is 8 19 Matlab s Distributed Computing Server Provides the Schedular to create Workers on other nodes. Accessed via the Parallel Toolbox when requesting Workers Currently the HPC has a license for a total of 64 simultaneous Workers. 20

Matlab Client Script Uses one Matlab license! Uses one Distributed Computing Toolbox license! Requests resources from the Scheduler the number of Workers required &c. Client script distributes tasks to the Workers. Client script can run on the HPC (usqhpcm) or on your own machine Client machine must be able to connect to the HPC. 21 Matlab Distributed Processing Minimalist Example running on USQHPCM sched = findresource( scheduler, type, torque ); set(sched, HasSharedFilesystem, true); set(sched, DataLocation, /sandisk1/leighb ) set(sched, RshCommand, ssh ); job=createjob(sched); set(job, PathDependencies, { /home/mcsci/leighb/test }); for i=1:max createtask(job, @distance, 3, {nsim} ); end submit(job); waitforstate(job); results = getalloutputarguments(job); 22

Matlab Distributed Processing torque is the name of the Scheduler to use with the HPC s PBS batch system. createjob() create a new distributed job to submit to the scheduler createtask(): specify the tasks for the job. One task to one worker. A task is a Matlab function to run. submit(): queue the job on the PBS batch system using the torque scheduler. Each worker appears as a separate queued job on the PBS default queue. waitforstate(): wait for the job to complete. Can timeout or wait for specific tasks to finish. getalloutputarguments(): get the return values from the job. 23 Matlab Distributed Processing Required Settings HasSharedFilesystem All the nodes can see the user s home folder on the HPC. Set to true DataLocation Matlab s book-keeping location. Place where Task output/input can be stored by Workers. RshCommand The command Matlab must use to communicate between the Client and the Workers. PathDependencies The paths to all the scripts used in this job so all the Workers can find them. 24

Matlab Distributed Processing Comments Each Worker appears as a submitted job on the PBS queue Workers will be distributed by the PBS system set(sched, SubmitArguments, -q long ); Additional arguments to use when submitting Tasks to the PBS queue. Most common use is to change the queues. 25 Distributed versus Parallel Distributed Job Matlab sessions called Workers Workers cannot communicate with each other. Define any number of tasks (different or the same) in a job Each Task is queued on the PBS system Tasks need not run simultaneously assigned to Workers as they become available. Workers can run several tasks in a job Parallel Job Matlab sessions called Labs Labs can communicate with each other. Define one task for the job duplicates are run on all Labs requested The Job is queued on the PBS system Tasks run simultaneously on as many Labs available at runtime. The start of the job may have to wait until the requested number of Labs is available. 26

Matlab parallel Processing Explicit Example sched = findresource( scheduler, type, torque ); set(sched, HasSharedFilesystem, true); set(sched, DataLocation, /sandisk1/leighb ) set(sched, RshCommand, ssh ); pjob=createparalleljob(sched); set(pjob, PathDependencies, { /home/mcsci/leighb/test }); set(pjob, MaximumNumberOfWorkers, 30) set(pjob, MinimumNumberOfWorkers, 20) t = createtask(job, @distance, 3, {nsim} ); submit(pjob); waitforstate(pjob); results = getalloutputarguments(pjob); 27 Matlab Parallel Processing Comments on Explicit example One task only it is repeated on all Labs! Only one job is submitted on the PBS queue. Matlab defaults to requesting from PBS one node for each lab! If you need more than 30 Labs you must explicitly specify resources required set(sched, ResourceTemplate, -l nodes=30:ppn=2 ); \\ set(pjob, MaximumNumberOfWorkers, 60) set(pjob, MinimumNumberOfWorkers, 60) Currently only have a license for 64 workers at any one time! 28

Explicit Lab Communication and Synchronisation numlabs returns the number of Labs in the current job. labindex returns the index of Lab. Value will be different for each lab. labsend send data to the specified Lab. labreceive block and read data from a specific Lab. labprobe check if data is available from a specific Lab. labbarrier block execution until all labs reach this call.... 29 Matlab Parallel Processing Letting Matlab do the work! sched = findresource( scheduler, type, torque ); set(sched, HasSharedFilesystem, true); set(sched, DataLocation, /sandisk1/leighb ) set(sched, RshCommand, ssh ); set(sched, RcpCommand, scp ); matlabpool(sched,24); parfor i=1:64 result(i,:) = distance(100000); end matlabpool close; 30

Matlab Parallel Processing Points to note matlabpool start 24 Labs for this job. One job appears in the PBS queue with requested resources 24 nodes. parfor distribute the contents of the loop to the Labs in the pool. Each Lab works on one iteration of the loop! The first 24 iterations calculated simultaneously one on each Lab. Loop iterations are not done in loop order but in Parallel results will appear out of order unless stored in an explicitly indexed array! One iteration of a loop cannot depend on a previous iteration. Pool of Labs remain running and available for tasks until the Pool is explicitly closed. 31 Matlab Parallel Processing Single Program Multiple Data (spmd) Interleaving of serial and parallel computing in the one client script Use Matlab smpd... end blocks Parallel computing within the smpd block serial outside! Identical code runs on each Lab different data Useful for running the same program on different data sets when communication and synchronisation is required! The Lab data sets may be part of a large distributed data set! 32

Matlab Parallel Processing SMPD Example matlabpool(sched,24); spmd R1 = rand(240); Z1 = zeroes(240); Z2 = codistributed(z1); Z3 = getlocalpart(z2); Z4 = codistributed.rand(100000,24) Z5 = gather(z2,1); end matlabpool close; 33 Matlab Parallel Processing SMPD Example... R1 is a different array replicated on each Lab. Z1 is an array replicated on each Lab. Z2 is a codistributed array one segment of Z1 to each Lab. Default segmentation is by the last non-unary dimension columns in this case. Z3 contains the local Lab. part of Z2. Create a codistributed array use if the distributed array is too large to replicate on each Lab. Z5 in Lab 1 contains the reconstructed Z2. Without the 1 all Labs contain the reconstructed array. 34

Matlab Parallel Processing SMPD... Many Matlab functions are capable of working with codistributed arrays Elementary Array operations +, -, *, /, /, dot variants, &c. Elementary Matrix operations find, diag, reshape, size, sort, is*, &c. Matrix functions Eigen, inverse, LU factorization, SVD, Norms, &c. Elementary trig., log, hyperbolic functions, &c. help codistributed/functionname For-loops on codistributed arrays can only loop over the parts local to each Lab! 35