High-Performance Computing Windows, Matlab and the HPC Dr. Leigh Brookshaw Dept. of Maths and Computing, USQ 1 The HPC Architecture 30 Sun boxes or nodes Each node has 2 x 2.4GHz AMD CPUs with 4 Cores each and 16GB RAM. Theoretically possible to run 240 independent simultaneous jobs. One extra node is the controller or administration node. Access to the HPC is via the administration node. One extra node is the Input/Output node it is the disk controller for the HPC. All disk space (5 Tb) is controlled by this node. One extra node for Matlab clients The disk space is visible to all nodes. 2.9Tb backed up, 2.1Tb temporary space not backed up and cleared at reboot. One extra machine, for code testing and interactive jobs, has 4 x 2.4GHz AMD CPUs with 6 Cores each and 64GB. 2
The HPC Architecture 3 The HPC Architecture CPU speed is equivalent to a workstation Computational power is derived from using more than one node/core. There is no advantage if you run one job on one core. 4
Connecting to the HPC habeus only for code testing, some interactive jobs and some batch jobs. usqhpcio normally don t need to connect to it. Controls the disks should never run jobs on this node. usqhpc main access point to the HPC. Contains the queues for submitting jobs. usqhpcm for Matlab jobs using the Parallel Toolbox. The only access to the HPC is via a Secure Shell from the USQ network. This ensures all communication to the HPC is encrypted and secure. The HPC uses RedHat Linux. Most interaction with the HPC does not require a detailed knowledge of the Unix/Linux command line about 10 conmmands 5 Connecting to the HPC Windows Utilities PuTTY creates an SSH connection and provides a command line interface to a remote machine. WinSCP provides a traditional Windows interface for copying files between the local machine and the remote machine. Uses SSH to ensure all communication is encrypted notepad++ a good all purpose text/code editor that recognises Windows, Unix and MacOS text files. 6
PuTTY Main Window Need the name of the machine you wish to connect to 7 PuTTY First Connect First time PuTTY connects to a machine it will ask you whether it should download the remote host s identifying key answer Yes 8
PuTTY Command Line Window Need to enter your HPC username and password. 9 WinSCP Connecting to remote host Need to enter the machine name and your username. 10
WinSCP Preferences In the preferences you can specify which editor to use. Remote files are downloaded edited locally then uploaded. Downloading and uploading of files is done automatically. 11 WinSCP Main Window 12
Types of Jobs Two basic types of jobs are run on multiple computing nodes Distributed Jobs: Also called Course Grained jobs Each process is completely independent of each other Little or no communication between processes Examples: Parameter space search, running the same program repeatedly with different input parameters, &c. Parallel Jobs: Also called Fine Grained jobs Each process deals with a part of the problem Communication synchronisation required between processes. Example: Processing of large data sets that will not fit on one node, computational domains that need to be split across nodes, CFD &c. 13 Distributed Job Example Paradigm Master-Slave processing 14
Parallel Job Example Paradigm Peer-Peer processing 15 HPC Job Submission Jobs are run via a Batch System PBS (Portable Batch System) A batch system requires jobs to run unsupervised! Jobs submitted on batch queue batch system starts job running when requested resources are available Requested Resources: Number of nodes Number of cores per node Memory per process Total amount of memory for the job Maximum amount of time... Jobs are submitted via a Shell Script or via a Matlab script Shell Script examples are available on the HPC web site 16
Shell Script Example #Select resources ##### #PBS -N Test-Matlab #PBS -l nodes=7:ppn=3 ##### Queue ##### #PBS -q standard ##### Mail Options ##### #PBS -m bea #PBS -M leighb@usq.edu.au ##### Change to current working directory ##### cd /home/mcsci/leighb/test ##### Execute Program ##### /usr/local/bin/matlab -nodisplay -nodesktop -nosplash < driver.m 17 Using Matlab 18
Matlab s Parallel Toolbox Provides the infrastructure scripts/commands for Parallel or Distributed computing Ability to create Matlab Workers (a running instance of Matlab). Ability to assign tasks to Workers (pass a script to run on a worker). Ability to communicate between Workers (running scripts can communicate with each other). Provides a Local Scheduler that will allow you to start one worker per core on one node. Maximum number of workers is 8 19 Matlab s Distributed Computing Server Provides the Schedular to create Workers on other nodes. Accessed via the Parallel Toolbox when requesting Workers Currently the HPC has a license for a total of 64 simultaneous Workers. 20
Matlab Client Script Uses one Matlab license! Uses one Distributed Computing Toolbox license! Requests resources from the Scheduler the number of Workers required &c. Client script distributes tasks to the Workers. Client script can run on the HPC (usqhpcm) or on your own machine Client machine must be able to connect to the HPC. 21 Matlab Distributed Processing Minimalist Example running on USQHPCM sched = findresource( scheduler, type, torque ); set(sched, HasSharedFilesystem, true); set(sched, DataLocation, /sandisk1/leighb ) set(sched, RshCommand, ssh ); job=createjob(sched); set(job, PathDependencies, { /home/mcsci/leighb/test }); for i=1:max createtask(job, @distance, 3, {nsim} ); end submit(job); waitforstate(job); results = getalloutputarguments(job); 22
Matlab Distributed Processing torque is the name of the Scheduler to use with the HPC s PBS batch system. createjob() create a new distributed job to submit to the scheduler createtask(): specify the tasks for the job. One task to one worker. A task is a Matlab function to run. submit(): queue the job on the PBS batch system using the torque scheduler. Each worker appears as a separate queued job on the PBS default queue. waitforstate(): wait for the job to complete. Can timeout or wait for specific tasks to finish. getalloutputarguments(): get the return values from the job. 23 Matlab Distributed Processing Required Settings HasSharedFilesystem All the nodes can see the user s home folder on the HPC. Set to true DataLocation Matlab s book-keeping location. Place where Task output/input can be stored by Workers. RshCommand The command Matlab must use to communicate between the Client and the Workers. PathDependencies The paths to all the scripts used in this job so all the Workers can find them. 24
Matlab Distributed Processing Comments Each Worker appears as a submitted job on the PBS queue Workers will be distributed by the PBS system set(sched, SubmitArguments, -q long ); Additional arguments to use when submitting Tasks to the PBS queue. Most common use is to change the queues. 25 Distributed versus Parallel Distributed Job Matlab sessions called Workers Workers cannot communicate with each other. Define any number of tasks (different or the same) in a job Each Task is queued on the PBS system Tasks need not run simultaneously assigned to Workers as they become available. Workers can run several tasks in a job Parallel Job Matlab sessions called Labs Labs can communicate with each other. Define one task for the job duplicates are run on all Labs requested The Job is queued on the PBS system Tasks run simultaneously on as many Labs available at runtime. The start of the job may have to wait until the requested number of Labs is available. 26
Matlab parallel Processing Explicit Example sched = findresource( scheduler, type, torque ); set(sched, HasSharedFilesystem, true); set(sched, DataLocation, /sandisk1/leighb ) set(sched, RshCommand, ssh ); pjob=createparalleljob(sched); set(pjob, PathDependencies, { /home/mcsci/leighb/test }); set(pjob, MaximumNumberOfWorkers, 30) set(pjob, MinimumNumberOfWorkers, 20) t = createtask(job, @distance, 3, {nsim} ); submit(pjob); waitforstate(pjob); results = getalloutputarguments(pjob); 27 Matlab Parallel Processing Comments on Explicit example One task only it is repeated on all Labs! Only one job is submitted on the PBS queue. Matlab defaults to requesting from PBS one node for each lab! If you need more than 30 Labs you must explicitly specify resources required set(sched, ResourceTemplate, -l nodes=30:ppn=2 ); \\ set(pjob, MaximumNumberOfWorkers, 60) set(pjob, MinimumNumberOfWorkers, 60) Currently only have a license for 64 workers at any one time! 28
Explicit Lab Communication and Synchronisation numlabs returns the number of Labs in the current job. labindex returns the index of Lab. Value will be different for each lab. labsend send data to the specified Lab. labreceive block and read data from a specific Lab. labprobe check if data is available from a specific Lab. labbarrier block execution until all labs reach this call.... 29 Matlab Parallel Processing Letting Matlab do the work! sched = findresource( scheduler, type, torque ); set(sched, HasSharedFilesystem, true); set(sched, DataLocation, /sandisk1/leighb ) set(sched, RshCommand, ssh ); set(sched, RcpCommand, scp ); matlabpool(sched,24); parfor i=1:64 result(i,:) = distance(100000); end matlabpool close; 30
Matlab Parallel Processing Points to note matlabpool start 24 Labs for this job. One job appears in the PBS queue with requested resources 24 nodes. parfor distribute the contents of the loop to the Labs in the pool. Each Lab works on one iteration of the loop! The first 24 iterations calculated simultaneously one on each Lab. Loop iterations are not done in loop order but in Parallel results will appear out of order unless stored in an explicitly indexed array! One iteration of a loop cannot depend on a previous iteration. Pool of Labs remain running and available for tasks until the Pool is explicitly closed. 31 Matlab Parallel Processing Single Program Multiple Data (spmd) Interleaving of serial and parallel computing in the one client script Use Matlab smpd... end blocks Parallel computing within the smpd block serial outside! Identical code runs on each Lab different data Useful for running the same program on different data sets when communication and synchronisation is required! The Lab data sets may be part of a large distributed data set! 32
Matlab Parallel Processing SMPD Example matlabpool(sched,24); spmd R1 = rand(240); Z1 = zeroes(240); Z2 = codistributed(z1); Z3 = getlocalpart(z2); Z4 = codistributed.rand(100000,24) Z5 = gather(z2,1); end matlabpool close; 33 Matlab Parallel Processing SMPD Example... R1 is a different array replicated on each Lab. Z1 is an array replicated on each Lab. Z2 is a codistributed array one segment of Z1 to each Lab. Default segmentation is by the last non-unary dimension columns in this case. Z3 contains the local Lab. part of Z2. Create a codistributed array use if the distributed array is too large to replicate on each Lab. Z5 in Lab 1 contains the reconstructed Z2. Without the 1 all Labs contain the reconstructed array. 34
Matlab Parallel Processing SMPD... Many Matlab functions are capable of working with codistributed arrays Elementary Array operations +, -, *, /, /, dot variants, &c. Elementary Matrix operations find, diag, reshape, size, sort, is*, &c. Matrix functions Eigen, inverse, LU factorization, SVD, Norms, &c. Elementary trig., log, hyperbolic functions, &c. help codistributed/functionname For-loops on codistributed arrays can only loop over the parts local to each Lab! 35