CERM Cluster User's guide

Transcription

1 CERM Cluster User's guide

2 CERM Cluster User's guide Abstract These pages collect some info to allow CERM users to use our computational resources at best. You can download this manual in PDF format here [

3

4 Table of Contents 1. How to get an account How to login Login Info and Enviroment Setup OpenPBS/Torque Usage... 4 Batch Processing... 4 PBS Options... 4 Maui commands... 5 PBS Environment Variables... 5 Job Script Template... 6 Job script examples... 6 Submitting a Job... 8 Monitoring a Job Running serial codes Running MPI parallel codes Running interactive MPI programs Job Script Template iv

5 List of Tables 4.1. PBS Options Maui commands PBS Environment Variables PBS Environment Variables Commands to monitor a job... 8 v

6 Chapter 1. How to get an account To ask for an account send your request to morelli AT cerm.unifi.it specifying the project that you'll use for calculations. To request a cluster account: You must to have an active project You have to be the project's owner 1

7 Chapter 2. How to login To ensure a secure login session, users must connect to machines using the secure shell, ssh program. Telnet is not allowed because of the security vulnerabilities associated with it. The "r" commands rlogin, rsh, and rcp are also disabled on this machine for similar reasons. These commands are replaced by the more secure alternatives included in SSH --- ssh,scp. To submit, monitoring and deleting jobs, you have to login on the cluster server named athlon. On atlhon it's also possible to do backups on CD or DVD. Important Plase note that interactive login is only allowed on the cluster server (athlon). Computing nodes are accessed and used only using the queue system. 2

8 Chapter 3. Login Info and Enviroment Setup The default shell is the bash shell. To change it use the chsh command. At login /etc/motd file is displayed: please take care of reading it, because information about system are usally written there. A basic default environment is already set up by means of system login configuration files, this includes variables and paths for the all the compilers and their MPI wrappers of the MPI standard, and OpenPBS/ Torque batch queuing system with the MAUI scheduler. Check your environment with the env command. You should be careful modifying the shell customization files (.cshrc.profile.login.bashrc), since they could overwrite the default values altering the behaviour of the compilers and of the batch queuing system. 3

9 Chapter 4. OpenPBS/Torque Usage Batch Processing The Portable Batch System, PBS, is a workload management system for Linux clusters. It supplies command to submit, monitor, and delete jobs. It has the following components. Job Server - also called pbs_server provides the basic batch services such as receiving/creating a batch job, modifying the job, protecting the job against system crashes, and running the job. Job Executor - a daemon (pbs_mom) that actually places the job into execution when it receives a copy of the job from the Job Server. Mom creates a new session as identical to a user login session as is possible and returns the job's output to the user. Job Scheduler - a daemon that contains the site's policy controlling which job is run and where and when it is run. PBS allows each site to create its own Scheduler. We are using the Maui Scheduler. The Maui Scheduler can communicate with various Moms to learn about the state of a system's resources and with the Server to learn about the availability of jobs to execute. Below are the steps needed to run production code: 1. Create a job script containing the following PBS options: request the resources that will be needed (i.e. number of processors, wall-clock time, etc.) and use commands to prepare for execution of the executable (i.e. cd to working directory, etc.). 2. Submit the job script file to PBS. 3. Monitor the job. PBS Options Below are some of the commonly used PBS options in a job script file. The options start with "#PBS". Table 4.1. PBS Options Option #PBS -N myjob #PBS -l nodes=4:ppn=2 #PBS -l walltime=01:00:00 #PBS -o mypath/my.out Description Assigns a job name. The default is the name of PBS job script. The number of nodes and processors per node. Only for parallel jobs The maximum wall-clock time during which this job can run. The path and file name for standard output. 4

10 OpenPBS/Torque Usage Option #PBS -e mypath/my.err #PBS -j oe #PBS -k oe #PBS -W stagein=file_list #PBS -W stageout=file_list #PBS -r n #PBS -V Description The path and file name for standard error. Join option that merges the standard error stream with the standard output stream of the job. Define which output of the batch job to retain on the execution host. Copies the file onto the execution host before the job starts. (*) Copies the file from the execution host after the job completes. (*) Indicates that a job should not rerun if it fails. Exports all environment variables to the job. Note (*) File staging can specify which files should be copied onto the execution host before the job starts and which files should be copied off the execution host when it completes. The file_list regardless of the direction of copy, is of the following form, where the name local_file is the name of the file on the system where the job executes, and the remote_file is the destination name on the host specified by hostname: local_file@hostname:remote_file. stagein=my.input@frontend-0:/home/login_name/my.input stageout=my.output@frontend-0:/home/login_name/my.output Maui commands There are some quite useful Maui commands: Table 4.2. Maui commands Command showq showbf checkjob job.id showstart job.id Description Show a detailed list of submitted jobs Show the free resources (time and processors available) at the moment show a detailed description of the job job.id gives an estimate of the expected started time of the job job.id PBS Environment Variables There are a number of predefined environment variables. These include the following: Variables defined on the execution host; Variables exported from the submission host to the execution host; and Variables defined by PBS. 5

11 OpenPBS/Torque Usage The following environment variables relate to the submission machine: Table 4.3. PBS Environment Variables Variable PBS_O_HOST PBS_O_LOGNAME PBS_O_HOME PBS_O_WORKDIR Description The host machine on which the qsub command was run. The login name on the machine on which the qsub was run. The home directory from which the qsub was run. The working directory from which the qsub was run. The following variables relate to the environment where the job is executing: Table 4.4. PBS Environment Variables Variable PBS_ENVIRONMENT PBS_O_QUEUE PBS_JOBID PBS_JOBNAME PBS_NODEFILE Description This is set to PBS_BATCH for batch jobs and to PBS_INTERACTIVE for interactive jobs. The original queue to which the job was submitted. The identifier that PBS assigns to the job. The name of the job. The file containing the list of nodes assigned to a parallel job. Job Script Template The following job script template should be modified for the need of the job. A job script may consist of PBS directives, comments and executable statements. A PBS directive provides a way of specifying job attributes in addition to the command line options. For example: #PBS -N Job_name #PBS -l walltime=10:30,mem=320kb # step1 arg1 arg2 step2 arg3 arg4 Job script examples Dyana/Pseudyana/Paramagneticdyana 6

12 OpenPBS/Torque Usage To run dyana's programs family having a RUN script like: #!/bin/bash /prog/pseudyana << EOF./ANNEAL exit EOF you can write a job script named run, for example, with the following content: #!/bin/bash -f #PBS -k oe #PBS -m n LAUNCH="./RUN" cd ${PBS_O_WORKDIR} ${LAUNCH} exit Amber8 To run amber calculations you can write the following job script (changing all the filename's occurences with real filenames and adding other options if you need): #!/bin/bash -f #PBS -k oe #PBS -m n #PBS -V LAUNCH="/prog/amber8/exe/sander -O -i filename -o filename -c filename -p filename -r filename" cd ${PBS_O_WORKDIR} ${LAUNCH} exit Bash script To run bash script based calculations you can write the following job script (remember to change the LAUNCH entry): #!/bin/bash -f #PBS -k oe #PBS -m n LAUNCH="/home_nXX/project/bash_script" cd ${PBS_O_WORKDIR} ${LAUNCH} exit 7

13 OpenPBS/Torque Usage Haddock 1.3 To run Haddock 1.3 calculations you can write the following job script (the WORKDIR entry points to the directory containing the user's haddock data, remember to change it): #!/bin/bash #PDS -j oe #PBS -k oe #PBS -V HADDOCK="/prog/haddock1.3" HADDOCKTOOLS="$HADDOCK/tools" PYTHONPATH=$HADDOCK NACCESS="/prog/naccess2.1.1/naccess" PROFIT="/prog/profit/profit" WORKDIR="/home_nXX/project/HADDOCK/run1" LAUNCH="python $HADDOCK/Haddock/Runhaddock.py" cd $WORKDIR $LAUNCH Submitting a Job Use the qsub command to submit the job script (in this example the name of the job script is run). $ qsub run PBS assigns a job a unique job identifier once it is submitted (e.g. 123.athlon). After a job has been queued, it is selected for execution based on the time it has been in the queue, wall-clock time limit, and number of processors. Monitoring a Job Below are commands for monitoring a job: Table 4.5. Commands to monitor a job Command qstat -a qstat -f canceljob job.id qhold job.id Description check status of jobs, queues, and the PBS server get all the information about a job, i.e. resources requested, resource limits, owner, source, destination, queue, etc. delete a job from the queue hold a job if it is in the queue 8

14 OpenPBS/Torque Usage Command qrls job.id Description release a job from hold 9

15 Chapter 5. Running serial codes Important On the master node no production is allowed and any serial execution program lasting more than 5 minutes is automatically deleted. Note Serial codes are all non parallel programs like dyana, pseudyana, cyana. Execution of serial application on computational nodes can be only done the through the queuing system, even for interactive runs. 10

16 Chapter 6. Running MPI parallel codes Note To run MPI parallel program users have to use the lam environments. Running interactive MPI programs Suppose for instance you want to run your a test.x interactively on four processors then you could use the following sequence of commands: $ qsub -l nodes=2:ppn=2,walltime=0:30:00 -I at this point (if there are free resources) you will enter in the batch interactive session, and you could run your test with: $ lamboot -v $PBS_NODEFILE $ cd testdir $ mpirun -n 4 -no-shmem test.x $ mpirun -np 4 Example of an interactive execution: $ qsub -l nodes=2:ppn=2,walltime=0:30:00 -I $ cd testdir $ mpirun -n 4 test.x Job Script Template The following job script template should be modified for the need of the job. #!/bin/bash -f #PBS -l nodes=2:ppn=2 #PBS -k oe LAMSTART="lamboot $PBS_NODEFILE" LAMSTOP="lamhalt $PBS_NODEFILE" HOME="/home_n01/guest" LAUNCH="mpirun -np 4 cpmd.x" WORKDIR="${HOME}/cp_test" export PP_LIBRARY_PATH=${WORKDIR} cd ${WORKDIR} ${LAMSTART} ${LAUNCH} au_surf_job1.in > au_surf_job1.out 11

17 Running MPI parallel codes ${LAMSTOP} # exit The following job scripts should be used for GROMACS parallel calculations. The first one is for preminimization and the second one is to launch the dinamic calculation. #!/bin/bash -f #PBS -k oe #PBS -m n PBS_O_WORKDIR="/home_n11/hetdyn/GROMACS/SPI_1ns_cluster" lamboot LAUNCH="./SPI_MINI.csh" cd ${PBS_O_WORKDIR} ${LAUNCH} exit #!/bin/bash -f #PBS -k oe #PBS -m n PBS_O_WORKDIR="/home_n11/hetdyn/GROMACS/SPI_1ns_cluster" lamboot LAUNCH="./SPI_MD_5PR_1ns.csh" cd ${PBS_O_WORKDIR} ${LAUNCH} exit 12