Tutorial: Using WestGrid Drew Leske Compute Canada/WestGrid Site Lead University of Victoria
Fall 2013 Seminar Series Date Speaker Topic 23 September Lindsay Sill Introduction to WestGrid 9 October Drew Leske Tutorial: Using WestGrid 23 October Jonatan Aronsson Tutorial: Introduction to the WestGrid Development Environment 6 November Fiona Brinkman Case Study: Genomics, Bioinformatics and HPC How Computational Analyses Are Transforming Infectious Disease Control More information on these and other seminars offered: https://www.westgrid.ca/support/training
WestGrid User Basics To use WestGrid systems effectively, you will need to know: * Where to get help and information * Which systems are suited to your project * How to log on to those systems * Basic Linux commands * How to define and submit batch jobs
Finding Information and Getting Help * The WestGrid website: www.westgrid.ca * Guidance on choosing systems and running jobs, to information about specific systems * System health and upcoming maintenance events * WestGrid Support: support@westgrid.ca * For everything from account problems to parallelization questions and code optimization advice * No question too big or too small These are the most important items to take with you today.
Choosing a System The WestGrid website describes each computing facility and its size, architecture, memory, interconnect, and associated storage: https://www.westgrid.ca/support/systems Some systems will be better suited to your project than others. As well, some software is only available on certain systems. We can help you find the best system for your needs.
Choosing a System: Here Are a Few System Cores Memory Interconnect Storage Hermes 2112 24 GB/node 2 x GigE, 10:1 IB 1.2 PB Nestor 2304 24 GB/node IB QDR 1.2 PB Hungabee 2048 16 TB shared IB QDR 405 TB Silo n/a n/a n/a 3.15 PB These four systems represent, in general terms, a general purpose system appropriate for serial jobs; a cluster with a high- speed interconnect, suitable for parallel jobs; a shared memory system for problems requiring large amounts of memory; and a storage site.
Connecting to WestGrid Your workstation Scheduler Login nodes Nestor
Connecting: Software You Will Need * Access to WestGrid systems is via Secure Shell (ssh) * Linux and Mac clients are included in OS * Windows: PuTTy, WinSSH * File transfer via Secure Copy (scp) or Secure FTP (sftp) * Linux and Mac clients included in OS * Windows: WinSCP, FileZilla * Grid tools also available Everything you need to know: https://www.westgrid.ca/support/quickstart/new_users
Basic Linux Commands * List directory contents: ls, ls -a, ls l * Create, remove directory: mkdir <dir>, rmdir <dir> * Change to directory, change to parent: cd <dir>, cd.. * Copy files: cp <src> <tgt> * Copy directory: cp r <src> <tgt> * Move file or directory: mv <file> <dir> * Rename file or directory: mv <oldfile> <newfile> * Remove file: rm <file> * View file: less <file>, cat <file> Everything you need to know: https://www.westgrid.ca/support/quickstart/new_users
Job Basics * Login nodes are for data management, editing and compiling source code, quick tests, and job management * The real work is done on worker nodes * Requests are submitted to the batch system and enter into an appropriate queue * Jobs are dispatched to worker nodes by the scheduler according to their priority mainly FairShare
Job Basics: Job Dispatch Your workstation Scheduler Login nodes Nestor
Job Basics: FairShare * Everybody gets a fair share based on allocation (if any) and usage * In essence, a job whose owner has had little usage over the past while will have higher priority than a job whose owner has been a heavy user hence, fair share * Some groups are given a bigger share RAC allocations
Job Basics: Job Definition * A batch job is defined by a script with special directives embedded that tell the cluster what s required for the job * Memory * Cores * Wall time * If your job exceeds these resources, your job may be terminated before completion :- (
Job Basics: Essential Commands * Submit a job: qsub <script> * Check status of jobs: qstat <job> * Check scheduling: showq * Delete a job: qdel <job> * When will my job start? showstart <job> * How do I use that command? man qstat * How do I use showq? showq --help Everything you need to know: https://www.westgrid.ca/support/quickstart/new_users
Job Basics: hello.pbs #!/bin/bash #PBS -l procs=1 #PBS -j oe #PBS -W Output_Path=$HOME/20131009/${PBS_JOBID%%.*} date echo "Hello, world!" echo "Am having a wonderful time in $(/bin/hostname)." echo "Love, $(whoami)"
Job Basics: Submitting hello.pbs * Here s where we submit the job to the cluster: westgrid# qsub hello.pbs 16363886.moab01.westgrid.uvic.ca * Here we can check the status of the job: westgrid# qstat 16363886 Job id Name User Time Use S Queue -------------------- ---------------- --------------- -------- - ----- 16363886.moab01 hello.pbs dleske 0 Q hermes * If you get unknown job ID, job has completed
Job Basics: Results! * When the job has completed, the output files you specified in the job script will contain the results * For example: Wed Oct 9 08:48:51 PDT 2013 Hello, world! Am having a wonderful time in hermes0195. Love, dleske * Whoop! Science! * There may be other output in these files provided by the batch system
Job Basics: Your First Jobs * Everything you need to get started is at: * https://www.westgrid.ca/support/quickstart/new_users * Run a couple of goofy little test jobs to get familiar with how the system works * qsub, qstat, showq, qdel * Something didn t work? * Job output usually provides the best clues * E- mail support@westgrid.ca BUT PLEASE
Job Basics: HELP! * If your job failed and you can t figure out what went wrong, send a note to support@westgrid.ca * Please include essential details: * The name of the system you are using * The job ID * Your WestGrid user ID * Also anything else we may need to know to solve your real problem.
Recap: WestGrid User Basics To use WestGrid systems effectively, you will need to know: * Where to get help and information * Which systems are suited to your project * How to log on to those systems * Basic Linux commands * How to define and submit batch jobs
Information and Help The most important things to take away with you today: * WestGrid website: www.westgrid.ca * WestGrid Support: support@westgrid.ca You may also have local support at your institution. Don t be shy. We are here to support and enable you and your work.
Thanks for coming!