PBSPro scheduling PBS overview Qsub command: resource requests Queues a7ribu8on Fairshare Backfill Jobs submission 9 mai 03 PBS
PBS overview 9 mai 03 PBS
PBS organiza5on: daemons frontend compute nodes tables qsub server scheduler mom mom mom 9 mai 03 PBS 3
Bellatrix PBS Soumission de jobs sélec8f défaut interac8f Qsub - q «queue name» qsub qsub - I Q_free T_debug R_bellatrix P_queues privées ACL groupes Job STDIN P_shares- queues ACL groupes T_speciales queues ACL groupes Q_queues shares ACL groupes Rejet 9 mai 03 PBS
Q_free T_debug Bellatrix PBS job submission Selec8ve qsub -q queue_name Default qsub R_default P_group exclusive Queue types R P Q T rou8ng execu8on with ACL on groups default exclusive (private) standard special S P_share_queue T_special R qmove Q_queue shares Reject qmove ask 3@epfl.ch for access to T_debug 9 mai 03 PBS 5
Iden5fica5on user: owner of job. groups: one of groups associated with this user: primary group is default. Get queue value Not defined: default is rou8ng queue defined by server. queue name specified in op8on - q Parameters used to define the queue shared queues: group ACL (Access Control List) wall8me private queues: group ACL (wall8me) Special queues queue free : all users, all groups debug queue : ACL users queues test : groups ACL and/or users ACL 9 mai 03 PBS 6
server qsub - - Iden8fica8on Groups user Get queue value Parameter scanning Parameters check Queue validity Assign jobid error error eject User Groups : grp,grp,., grpn Queue Job name Number of nodes Number of cores by node Wall8me Place mpiprocs memory - W group_list - J - o - S x Job in Input queue scheduler 9 mai 03 PBS 7
scheduler Assignment of priori8es for all jobs in input queues queue priority fairshare preemp5on wait 5me A7ribu8on of resources : backfill Jobs in running state Q R Wait - cycle : 600 s - new requests job submission end of job 9 mai 03 PBS 8
NODES Private nodes Private queues 59 60 Shared nodes Shared queues 9 mai 03 PBS 9
NODES 59 60 Private nodes Share nodes Special queues Private queues (preemp8on) Share queues Free queue 9 mai 03 PBS 0
Qsub command Resource requests 9 mai 03 PBS
qsub: select resources Bellatrix nodes cpus mpiprocs place node display default 6 excl /6 n 6 6 excl 6/6 n Job memory requested must be available. (default = max of node) 9 mai 03 PBS
qsub: select resources Bellatrix nodes cpus mpiprocs place node display default 6 excl /6 n 6 6 excl 6/6 n 8 8 excl 8/8 n 8 8 shared n 8/8 8/8 " " " " n Job memory requested must be available. (default = max of node) 9 mai 03 PBS 3
qsub: select resources Bellatrix nodes cpus mpiprocs place node display default 6 excl /6 n 6 6 excl 6/6 n 8 8 excl 8/8 n 8 8 shared n 8/8 8/8 " " " " n 8 6 excl 8/6 n 8 6 shared 8/6 n " " " " 8/6 n Job memory requested must be available. (default = max of node) 9 mai 03 PBS
Select resources Antares -l select=x:ncpus=y:mpiprocs=z x y z nodes cpu mpiprocs 8 x8 x node mpi 8 x8 x node x mpi node 8 8 x8 x8 node x8 mpi node 8 6 x8 x6 node x6 mpi node 9 mai 03 PBS 5
Select resources Antares -l select=:mpiprocs= node mpi node mpi node3 mpi node mpi -l select=:mpiprocs=:mem=gb node mpi node mpi node mpi node mpi 8 8 -l select=:ncpus=:mpiprocs= node mpi node mpi node mpi node mpi 9 mai 03 PBS 6 8
Sca7er parameter Select resources Antares -l select=:ncpus=:mpiprocs =:mem=gb 8 cpus: node mpiprocs node mpiprocs node mpiprocs node mpiprocs 8 -l place=scatter 8 cpus: node mpiprocs node mpiprocs node mpiprocs node mpiprocs 8 node node node node mpiprocs mpiprocs mpiprocs mpiprocs node node node node mpiprocs mpiprocs mpiprocs mpiprocs 9 mai 03 PBS 7
Default parameters aries bellatrix Number of nodes Number of cpus by node 8 6 Wall8me 5 mn 5 mn Queue name R_default_aries R_bellatrix place excl excl mpiprocs memory node = 9 gb node = 3 gb 9 mai 03 PBS 8
Queues airibu5on - Aries - Bellatrix 9 mai 03 PBS 9
qsub ARIES R_default_aries Private queues ACL groups : [wall8me] P P Pn Member of grp Member of grp Member of grpn Q Q Q Share queues ACL groups : wall8me Q_aries_express Q_aries ACL groups : wall8me 0 h ACL groups : wall8me h Q Q Q_aries_long ACL groups : wall8me 7 h Q Q_aries_week ACL groups : wall8me 68 h Q error 9 mai 03 PBS 0
qsub BELLATRIX R_bellatrix Private queues ACL groups : [wall8me] P P Pn Member of grp Member of grp Member of grpn Q Q Q Share queues ACL groups : wall8me Q_express Q_normal ACL groups : wall8me 0 h ACL groups : wall8me h Q Q Q_long ACL groups : wall8me 7 h Q Q_week ACL groups : wall8me 68 h Q error 9 mai 03 PBS
ARIES : order of scheduling set queue Rdefault_aries route_destinations = P_aries_gr-yaz set queue Rdefault_aries route_destinations += P_aries_theos set queue Rdefault_aries route_destinations += P_aries_tlong set queue Rdefault_aries route_destinations += P_aries_lmm set queue Rdefault_aries route_destinations += P_aries_lis set queue Rdefault_aries route_destinations += P_aries_lmc set queue Rdefault_aries route_destinations += Q_aries_express set queue Rdefault_aries route_destinations += Q_aries set queue Rdefault_aries route_destinations += Q_aries_long set queue Rdefault_aries route_destinations += Q_aries_week 9 mai 03 PBS
BELLATRIX : order of scheduling set queue R_bellatrix route_destinations = P_texpress set queue R_bellatrix route_destinations += P_theos set queue R_bellatrix route_destinations += P_tlong set queue R_bellatrix route_destinations += P_lammm_expr set queue R_bellatrix route_destinations += P_lammm set queue R_bellatrix route_destinations += P_mathicse set queue R_bellatrix route_destinations += P_lsu set queue R_bellatrix route_destinations += P_c3pn set queue R_bellatrix route_destinations += P_lastro set queue R_bellatrix route_destinations += P_wire set queue R_bellatrix route_destinations += P_updalpe set queue R_bellatrix route_destinations += P_lbs set queue R_bellatrix route_destinations += P_lcbc set queue R_bellatrix route_destinations += P_ltpn set queue R_bellatrix route_destinations += P_ctmc set queue R_bellatrix route_destinations += P_upthomale set queue R_bellatrix route_destinations += P_lsmx set queue R_bellatrix route_destinations += Q_express set queue R_bellatrix route_destinations += Q_normal set queue R_bellatrix route_destinations += Q_long set queue R_bellatrix route_destinations += Q_week 9 mai 03 PBS 3
qsub command Select resources Scheduler default parameter Ø op8on to decide how to distribute jobs to the node. round_robin : run one job on each node Select default parameters -l select=:ncpus=8:mpiprocs= Antares -l select=:ncpus=8:mpiprocs= Aries -l select=:ncpus=6:mpiprocs= Bellatrix Default place parameter -l place=excl node exclusive for each job 9 mai 03 PBS
Fairshare 9 mai 03 PBS 5
FAIRSHARE Fairshare concept A fair method for ordering the start 8mes of jobs, using resource usage history. A scheduling tool which allocates certain percentages of the system to specified users or groups of users. Ensures that jobs are run in the order of how they are. The job to be run next is selected from the set of jobs belonging to the most deserving en8ty, and then the next most deserving en8ty, and so on. Fairshare parameters fairshare only on shared nodes. fairshare en8ty : groups fairshare usage : ncpus*wall8me fairshare init: every six months total resources : 00% of share nodes. unknown shares: 0% 9 mai 03 PBS 6
Ex: group=grp shares=5% resources FAIRSHARE grp 5% Unknown 0% 0 6 months Ex: groups grp shares=5% grp shares=0% grp 0% grp 5% Unknown 0% 0 6 months 9 mai 03 PBS 7
Backfill 9 mai 03 PBS 8
Backfill concept The scheduler makes a list of jobs to run in order of priority. The scheduler looks for smaller jobs that can fit into the usage gaps around the highest- priority jobs in the list. The scheduler looks in the priori8zed list of jobs and chooses the highest- priority smaller jobs that fit. Filler jobs are run only if they will not delay the start 8me of top jobs. 9 mai 03 PBS 9
BLACKFILL qsub nodes Jobs J J J3 J J5 J6 J7 J8 Nodes 6 5 Wall8me 3 8 8 7 6 5 3 3 5 6 7 8 9 0 8me cluster 8 nodes 9 mai 03 PBS 30
BLACKFILL qsub nodes Jobs J J J3 J J5 J6 J7 J8 Nodes 6 5 Wall8me 3 8 8 7 6 5 3 J 3 5 6 7 8 9 0 8me cluster 8 nodes 9 mai 03 PBS 3
BLACKFILL qsub nodes Jobs J J J3 J J5 J6 J7 J8 Nodes 6 5 Wall8me 3 8 8 7 6 5 3 J J 3 5 6 7 8 9 0 8me cluster 8 nodes 9 mai 03 PBS 3
BLACKFILL qsub nodes Jobs J J J3 J J5 J6 J7 J8 Nodes 6 5 Wall8me 3 8 8 7 6 5 3 J J3 J 3 5 6 7 8 9 0 8me cluster 8 nodes 9 mai 03 PBS 33
BLACKFILL qsub nodes Jobs J J J3 J J5 J6 J7 J8 Nodes 6 5 Wall8me 3 8 8 7 6 5 3 J J3 J J 3 5 6 7 8 9 0 8me cluster 8 nodes 9 mai 03 PBS 3
BLACKFILL qsub nodes Jobs J J J3 J J5 J6 J7 J8 Nodes 6 5 Wall8me 3 8 8 7 6 5 3 J J3 J J J5 3 5 6 7 8 9 0 8me cluster 8 nodes 9 mai 03 PBS 35
BLACKFILL qsub nodes Jobs J J J3 J J5 J6 J7 J8 Nodes 6 5 Wall8me 3 8 8 7 6 5 3 J3 J6 J J J J5 3 5 6 7 8 9 0 8me cluster 8 nodes 9 mai 03 PBS 36
BLACKFILL qsub nodes Jobs J J J3 J J5 J6 J7 J8 Nodes 6 5 Wall8me 3 8 8 7 6 5 3 J3 J6 J J J J5 J7 3 5 6 7 8 9 0 8me cluster 8 nodes 9 mai 03 PBS 37
BLACKFILL qsub nodes Jobs J J J3 J J5 J6 J7 J8 Nodes 6 5 Wall8me 3 8 8 7 6 5 3 J3 J6 J J J8 J J5 J7 3 5 6 7 8 9 0 8me cluster 8 nodes 9 mai 03 PBS 38
Jobs submission 9 mai 03 PBS 39
Job in shared queue #!/bin/bash # #PBS -l select=:ncpus=6:mpiprocs=6 #PBS -l walltime=00:05:00 # #PBS -S /bin/bash #PBS -j oe #PBS -o /home/leballe/training #PBS -N j_job # echo "" echo "==> Contents of PBS_NODEFILE " cat $PBS_NODEFILE # job share echo " =======> shared job" echo "" echo "==> Number of ncpus for mpirun" CPUS_NUMBER=$(wc -l $PBS_NODEFILE cut -d ' ' -f ) echo "" echo "==> CPUS_NUMBER = $CPUS_NUMBER" echo "" # echo " ==> debut du job " cd /home/leballe/training echo " cd /home/leballe/training" echo " sleep 0" sleep 0 echo " ==>fin du job" 9 mai 03 PBS 0
================================= Prologue ======= --> PBSPro prologue for 9990.bellatrix (leballe, dit-ex) ran on Tue May 8 :0:55 CEST 03 --> Nodes file contents: b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster --> NODE = b7.cluster ============================ End of prologue========= 9 mai 03 PBS
==> Contents of PBS_NODEFILE b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster b7.cluster =======> shared job ==> Number of ncpus for mpirun ==> CPUS_NUMBER = 6 ==> debut du job cd /home/leballe/training sleep 0 ==>fin du job =================================== Epilogue ==== --> PBSPro epilogue for leballe's 9990.bellatrix (group dit-ex) ran on Tue May 8 ::05 CEST 03 --> Nodes used: b7.cluster --> NODE = b7.cluster --> USER = leballe ============================= End of epilogue ===== 9 mai 03 PBS
Bellatrix #!/bin/bash # #PBS -l select=:ncpus=8:mem=8gb #PBS -l place=shared # #PBS -S /bin/bash #PBS -j oe #PBS -o /home/leballe/training #PBS -N j_job echo " ==> debut du job " echo " ==> fin du job" #!/bin/bash # #PBS -l select=:ncpus=8 #PBS -l place=shared # #PBS -S /bin/bash #PBS -j oe #PBS -o /home/leballe/training #PBS -N j_job echo " ==> debut du job " echo " ==> fin du job" 9 mai 03 PBS 3
Job in private queue primary group = dit-ex ACL group in private queue: group [leballe@bellatrix ~/training]$ id uid=008(leballe) gid=0075(dit-ex) groups=0075(dit-ex),699999(group) #!/bin/bash # #PBS -l select=:ncpus=6:mpiprocs=6 #PBS -l walltime=00:05:00 #PBS -W group_list=group # #PBS -S /bin/bash #PBS -j oe #PBS -o /home/leballe/training #PBS -N j_job # # job in private queue echo "========> job in private queue" echo "" Job in private queue with ACL group = group 9 mai 03 PBS
Job in shared queues #!/bin/bash # #PBS -l select=:ncpus=6 #PBS -l walltime=00:05:00 # #PBS -S /bin/bash #PBS -j oe #PBS -o /home/leballe/training #PBS -N j_job echo " ==> debut du job " cd /scratch/leballe sleep 0 echo " ==>fin du job" #!/bin/tcsh # #PBS -l select=:ncpus=6 #PBS -l walltime=7:00:00 #PBS -q Q_free #PBS -S /bin/tcsh #PBS -o /scratch/leballe/output #PBS -e /scratch/leballe/error #PBS -N jobname Job in free queue 9 mai 03 PBS 5
Bellatrix : qmove command To move a job from the queue in which it resides to another queue. Used to move a job from private queues to shared queues and vice versa. From private queue to share queue: qmove P_share_queue jobid From share queue to private queue: qmove R_bellatrix jobid 9 mai 03 PBS 6
Ex: Q_free of Bellatrix: qmgr -c " p q Q_free" Default parameters of all queues: qmgr c "p q queuename" create queue Q_free set queue Q_free queue_type = Execution set queue Q_free Priority = 50 set queue Q_free max_queued = [o:pbs_all=50] all # max input queue set queue Q_free max_queued += [u:pbs_generic=0] # user max input queue set queue Q_free acl_user_enable = False set queue Q_free acl_users = leballe set queue Q_free resources_max.walltime = :00:00 # max walltime set queue Q_free resources_default.place = excl # mode «exclusif node» set queue Q_free acl_group_enable = False set queue Q_free default_chunk.gnall = True set queue Q_free max_run = [o:pbs_all=38] # all max running jobs set queue Q_free max_run += [u:pbs_generic=0] # user max running jobs set queue Q_free max_run_res.ncpus = [o:pbs_all=08] # all max ncpus set queue Q_free max_run_res.ncpus += [u:pbs_generic=6] # user max ncpus set queue Q_free enabled = True set queue Q_free started = True 9 mai 03 PBS 7