The Asterope compute cluster ÅA has a small cluster named asterope.abo.fi with 8 compute nodes Each node has 2 Intel Xeon X5650 processors (6-core) with a total of 24 GB RAM 2 NVIDIA Tesla M2050 GPGPU cards, each with 3 GB of memory The network is 4x QDR Infiniband (also 1 Gb Ethernet) Disk server with 24 TB IB Net Cluster ethernet Login FE (scaleout) ClusterFE (vm) AdminFE (scaleout, vmhost) Cluster nodes GridFE (vm) Disk server (DL360) Disk pool 1 Using the Asterope cluster The cluster is part of FGI, the Finnish Grid Infrastructure can both be used locally and as part of the national grid resources Local users access the cluster through the front-end node uses SSH keys for authentication (not passwords) log in with SSH to the front-end asterope.abo.fi Uses a separate file system to store user files it does not mount the normal home directory have to explicitly copy files to the system Uses the Environment Modules package to manage the software environment see http://modules.sourceforge.net or man module on Asterope have to load all software modules that you will use (compilers, libraries, tools, ) The nodes are named asg1, asg2,, asg8 2 1
Setting up SSH keys The Asterope cluster uses SSH keys for user authentication public key encryption scheme instead of a password You should access the cluster from ÅA:s login server tuxedo.abo.fi login to tuxedo from your local machine with your normal ÅA user name and password on Windows you can use Putty or some other terminal emulator on Linux open a terminal window and give the command ssh X username@tuxedo.abo.fi when you are logged in to Tuxedo, generate a SSH key with the command ssh-keygen if you already have a SSH key you don t need to do this again your public key will be stored in the file.ssh/id_rsa.pub! send it to Mats Aspnäs and ask for an account on the cluster ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCsPDBbfOU/dS6K4ay2eBeXGtsGkEVhFrxRZoyksdXuTMgGp3pO7RBRkhaAfop0m! kqrsdmsphtk+rldbgl+yil6dozaygafgzsliixzowumcvlqyw5whcf/osabvdbykoxdzvhbpeesibnlhzd1uapzf0aahn/zo7gyxq! KsQ9e6HsdP3P123fZLiu3IzrU511IDI79zhkqJtxevHIn0c1bOVhINgkyoE5to7fl7iX+LYFkKY3J/eJJOHrRTipLLBdMYe4956yN! cypv6+eemla3vommyyoqgtsvwh6+/w5gpo7wf+3kpacr0teel6us9+ozugm0ywscjxnbo4k7es3rd mats@tuxedo.abo.fi! 3 Logging in to Asterope First log in to tuxedo.abo.fi Tuxedo is a login server, so it can be accessed also from outside ÅA:s network On Tuxedo, log in to the front-end node of Asterope with ssh X username@asterope.abo.fi Your home directory is /home/username this is not your normal ÅA home directory, so initially it will be empty you can transfer files to/from the system with scp Asterope Tuxedo ssh You ssh 4 2
Setting up your environment Load the software modules that you need: Gnu compiler and MPI module load PrgEnv-gnu! module load mvapich2! Can list all available modules with the command module avail list the modules that you currently have loaded with module list The module system sets the environment variable that are needed by the programming tools, like PATH, LIBRARY_PATH, MANPATH, xxx_include and xxx_lib (where xxx is the name of a loaded module) simplifies the design of Makefiles, no need to set up paths 5 Compiling and running programs Copy the example program hello.c to your home directory on Asterope Compile the MPI program with mpicc -O3 hello.c -o hello! Submit the program for execution on 4 cores srun n 4./hello! % srun -n 4./hello! srun: job 28165 queued and waiting for resources! srun: job 28165 has been allocated resources! Hello World from process 0 running on asg7! Hello World from process 1 running on asg7! Hello World from process 2 running on asg7! Hello World from process 3 running on asg7! Ready! %! 6 3
Executing programs with SLURM The cluster uses the SLURM resource manager to execute jobs on the nodes see https://computing.llnl.gov/linux/slurm/quickstart.html To execute a program on X cores on the cluster srun n X./myprogram Useful SLURM commands: srun: run a parallel job on cluster managed by SLURM squeue: view information about jobs in the SLURM scheduling queue sbatch: submit a batch script to SLURM sinfo: view information about SLURM nodes and partitions scancel: signal jobs that are under the control of Slurm, for instance to cancel submitted jobs 7 Partitions The cluster is divided into a number of partitions jobs submitted through SLURM are always allocated resources from some partition The default partition in named user and contains 48 cores (4 nodes) max. run time is 30 minutes If you don t specify which partition to use, your job goes to users use srun p local to specify the local partition Partition Nodes Nr of nodes Cores Max time users asg[1-8] 4 48 30 min local asg[1-8] 8 96 5 days grid asg[1-6] 2 24 2 days Please don t use the system for anything else than course work 8 4
Debugging parallel programs There are some well developed debuggers for parallel programs, supporting MPI, OpenMP and CUDA, like TotalView and Allinea DDT however, these are commercial products It is possible to attach gdb (the Gnu debugger) or ddd (a graphical frontend to gdb) to each MPI process you get one debugger window for each MPI process can set breakpoints in the code, step forward, inspect the value of variables etc. Compile your program without optimization (no O flag) and with the g switch run your program with srun -n 2 --x11=all ddd hello! the flag --x11=all instructs srun to forward X-windows connections from all processes ddd hello starts ddd (the Data Display Debugger) on the program hello 9 5