KISTI Supercomputer TACHYON Scheduling scheme & Sun Grid Engine 슈퍼컴퓨팅인프라지원실 윤 준 원 (jwyoon@kisti.re.kr) 2014.07.15
Scheduling (batch job processing) Distributed resource management Features of job schedulers (SW) Broad scope Support for algorithms Capability to integrate with standard resource manager Sensitivity to compute node and interconnect architecture Scalability Fair-Share capability Efficiency Dynamic capability Support for preemption - Job Scheduling in HPC Cluster, Dell Power Solution, February 2005. -
Sun Grid Engine Open source batch-queuing system, developed and supported by Sun Microsystems (Oracle) SGE History CODINE(Computing in Distributed Networked Environments) - 1991 GRD(Global Resource Director) 1996 Merged with GridWare - 1999 acquired by Sun Microsystems - in August of 2000 Sun renamed the product Grid Engine and released a free version - 2001 Oracle acquired Sun in January 2010 By the end of 2010, Oracle had closed the open source community, stopped shipping source code, increased the license fees In January of 2011, Univa announced that it had hired the core Grid Engine development team who had worked on Grid Engine for several years.
Job scheduling in SGE Tachyon2 - SGE 6.2u6 / Tachyon1 - SGE 6.1u5 The scheduler was a separate daemon(qmaster) before 6.2 released Scheduling a job has two distinct stage Job selection Job scheduling
Sun Grid Engine Overview Queue A logical abstraction that aggregate a set of job slots across one or more execution hosts. Slots A container for jobs that execute on a single host Default queue configuration : Slot count set equal to CPU count Standard Job Types Batch, Interactive, Parallel, Checkpoint Terminology cluster queue all.q queue instance all.q@node004
Host Group & Queue Configuration in SGE Host Group mgt. qconf ahgrp, -mhgrp, -dhgrp, -shgrp qconf -m{q,e,p,ckpt} <파일이름> -m : 수정 파일을 작성할 텍스트 편집, q : 대기열, e : 실행 호스트, p : 병렬 환경, ckpt : 체크포인트 환경 switch option a:추가, m:변경, d:삭제, r:교체, s:보기 Q mgt. qconf -[aq, mq, dq, sq] queuename // 큐 생성,수정,삭제, 확인 Host Group, PE, UserSet List 수정, userset list NONE(기본값)인 경우 모든 사용자 submit이 가능 qmaster/usersets 에서 큐 그룹별로 관리(#qconf [au, mu, du, su] user1,user2,.. user_lists) qtype, slots, shell, shell_start_mode, prolog, epilog, complex_values 및 resources 등 수정 h_rt (walltime clock)은 Tachyon 1st long queue 168 hours, normal queue 48 hours 로설정 long queue는 1cpu 이상, normal queue는 17cpu 이상이며, 그 미만 실행 불가 qconf [ahgrp, mhgrp] @hostgroup, qconf -shgrpl // hostgroup 생성,수정, 확인
Scheduling Decisions
Policy Components
Sun Grid Engine Scheduler Grid Engine Tickets All policies are defined using tickets Jobs get tickets from all the various policies Jobs with more tickets are more important Administrator controls the total number of tickets in the system # of tickets assigned to each policy determines how important each of the different available policies are To disable a policy within scheduler, assign zero tickets to it
Three Classes of Policies Ticket Policies (Entitlement) Share Tree (or Pair-share) Functional Ticket Override Ticket Urgency Policies Deadline Wait time Resource urgency Custom Policies POSIX Priority Administrator to push a particular job to the front of the pending job list
Three Classes of Policies Ticket Policies (Entitlement) Share Tree (or Pair-share) Functional Ticket Override Ticket Urgency Policies Deadline Wait time Resource urgency Custom Policies POSIX Priority Administrator to push a particular job to the front of the pending job list
Entitlement Share tree Ticket Policies (Job Selection) Share Tree(fair-share) Policy Start with N tickets, Divvy up across tree Job sorting based on ticket count Memory(historical) of past usage Leaf nodes must be project or user nodes [root@sge03qs pe]# qconf -ssconf grep weight_tickets* weight_tickets_functional 0 weight_tickets_share 100000 weight_ticket 0.010000
Entitlement Function Ticket Ticket Policies (Job Selection) Functional Ticket Policy Start with N tickets, Divide into four categories Users, Dept, Projects, Jobs By default all categories have equal weight Divide within category among all jobs weight_tickets_functional 0 weight_user 0.250000 weight_project 0.250000 weight_department 0.250000 weight_job 0.250000 Sum ticket count for each job within each category, Highest count wins No memory(historical) of past usage Leaf nodes must be project or user nodes By default, the functional ticket policy is inactive
Entitlement Override Ticket Ticket Policies (Job Selection) Override Policy Used to make temporary changes Override tickets disappear with job exit Admin can assign extra tickets User, project, department or job Can also use quota to add override entitlements to a pending jobs share_override_tickets Does job count dilute override ticket count. Default is TRUES [root@sge03 pe]# qconf -ssconf grep share* weight_tickets_share 100000 share_override_tickets TRUE
Relevant parameters
Three Classes of Policies Ticket Policies (Entitlement) Share Tree (or Pair-share) Functional Ticket Override Ticket Urgency Policies Wait time Deadline Resource urgency Custom Policies POSIX Priority Administrator to push a particular job to the front of the pending job list
Urgency Wait Time Policy As a job remains in the pending queue, the wait time policy increases the urgency for that job. It can be useful for preventing job starvation U wait = T wait X W wait U wait : wait-time urgency T wait : the time spent since being submitted W wait : wait-time weighting factor weight_waiting_time 100.000000 weight_urgency 0.100000
Urgency Deadline Policy The deadline is the time by which the job must be scheduled. In order to submit a job with a deadline, a user must be a member of the deadlineusers group. U deadline = : deadline time : current time are given in Unix time(in seconds) : wait-time weighting factor weight_deadline 3600000.000000 weight_urgency 0.100000
Urgency Resource Policy If some resources in a cluster are particularly valuable, it might be advantageous to make sure those resources stay as busy as possible.
Three Classes of Policies Entitlement (ticket) based Share Tree (or Pair-share) Functional Ticket Override Ticket Urgency Policies Wait time Deadline Resource urgency Custom Policies POSIX Priority Administrator to push a particular job to the front of the pending job list
Combining Policies Final dispatch priority assigned to all pending jobs is determined by combining the contributions entitlement, urgency, and custom policies P = Ne We + Nu Wu + Nc Wc Ne : entitlement priority We : entitlement weighting factor # weight_ticket 0.010000 Nu : urgency priority Wu : urgency weighting factor # weight_urgency 0.100000 Nc : custom priority Wc : custom weighting factor # weight_priority 1.000000
Scheduler weighting factors Reference in Text Weighting Factor Parameter Name Tachyon1 Tachyon2 W deadline Deadline weight_deadline 3600000 3600000 W wait Wait-time weight_waiting_time 0 100 W e Entitlement (Ticket) weight_ticket 0.01 0.01 W u Urgency weight_urgency 0.1 0.1 W c Custom (POSIX) weight_priority 1 1 weight_tickets_share 100000 100000 weight_tickets_funct ional share_override_tick ets True 0 0 True
ref. ) Job Priorities and Tickets -urg = rrcontr + wtcontr + dlcontr -tckts = ftckt + otckt + stckt - job_priority = weight_urgency * normalized_urgency_value + weight_ticket * normalized_ticket_value + weight_priority * normalized_posix_priority_value ntckts The total number of tickets in normalized fashion. tckts The total number of tickets assigned to the job currently ovrts The override tickets as assigned by the -ot option of qalter. otckt The override portion of the total number of tickets assigned to the job currently ftckt The functional portion of the total number of tickets assigned to the job currently stckt The share portion of the total number of tickets assigned to the job currently share The share of the total system to which the job is entitled currently. nurg urg The jobs total urgency value in normalized fashion. The jobs total urgency value. rrcontr The urgency value contribution that reflects the urgency that is related to the jobs overall resource requirement. wtcontr The urgency value contribution that reflects the urgency related to the jobs waiting time. dlcontr The urgency value contribution that reflects the urgency related to the jobs deadline initiation time. deadline The deadline initiation time of the job as specified with the qsub -dl option. npprior The jobs -p priority in normalized fashion. ppri The jobs -p priority as specified by the user.