LSKA 2010 Survey Report Job Scheduler Graduate Institute of Communication Engineering {r98942067, r98942112}@ntu.edu.tw March 31, 2010
1. Motivation Recently, the computing becomes much more complex. However, sometimes we need to use applications that require many resources. In this way, the cluster computing or parallel computing are proposed to reduce time consumption and to enhance the utility. When switch-based parallel computers and cluster-based computing systems get widely used, job scheduling becomes another important issue other than the processor allocation. In Linux, it already provides basic job scheduling capabilities such as cron together with at. In many cases, cron is sufficient to handle the most simplistic scheduling requirements, such as running a certain job once a day (i.e., backups). Even jobs that need to run at more frequent intervals (every 15 minutes), less frequently (once a month), or even on specific dates (the first of the month) can be handled by cron. However, the problem faced is the OS-based supplied job scheduling system will not usually provide the ability to schedule beyond a single OS instance or outside the remit of the specific program, we must have a enhanced system. To reemphasis, since many tasks need to be managed on multiple machines, so the better scheduling software allows us to manage all of our machines from a central point, remotely start jobs, and so forth. Thus, in this survey we want to find such a system and get into the functionality of job scheduler, moreover, present some basic utilities of the software we have chosen. 2. Overview of the Job Scheduler Job schedulers are one of the major components of the IT infrastructure since the early mainframe systems. So what is all about the job scheduling? A job scheduler is a software application that is in charge of unattended background executions, commonly known for historical reasons as batch processing. Today's job schedulers typically provide a graphical user interface and a single point of control for definition and monitoring of background executions in a distributed network of computers. Besides, this is different from the term process scheduling, which is the assignment of currently running processes to CPUs by the operating system.
2.1 Features We can call a software job scheduler when it provides the basic features such as: Interfaces which helps to define workflows and/or job dependencies Automatic submission of executions Reuse existing programs and schedules Interfaces to monitor the executions Priorities and/or queues to control the execution order of unrelated jobs However, in this survey we try to find an enhanced system with more advanced features like: Maintain across a network of computers Real-time scheduling based on external, un-predictable events Automatic restart and recovery in event of failures Alerting and notification to operations personnel Generation of incident reports 2.2 System Architectures The scheduler is made of two main components, the scheduler and the Resource Manager. Each of them has its own functionality: 1. The scheduler is in charge of registering jobs submitted and put them in a queue according to a scheduling policy. Then, it has to ask for resources at the Resource Manager, and execute jobs on those retrieved resources. 2. The Resource Manager (RM) handles a set of available resource available for scheduling jobs. Resource Manager provides the scheduler with resources, according to criteria (Operating System, dynamic libraries, Memory...).
Figure 1 Scheduler architecture Another thing which is worth mentioning is the architectures of the job scheduling software. There are two commonly used: Master/Agent architecture the traditional one for job scheduling software. The software is installed on a main server (Master), with all other production machines (Agent) that await commands from the Master, and returns the exit code back to the Master while the execution is done. Cooperative architecture a decentralized one where each machine is capable of helping with scheduling and can offload locally scheduled jobs to other cooperating machines. This enables dynamic workload balancing to maximize hardware resource utilization and high availability to ensure service delivery. 2.3 Types of Scheduling Batch processing - the traditional date and time based execution of background tasks based on a defined period during which resources were available for batch processing (the batch window). In effect the original mainframe approach transposed onto the open systems environment.
Event-driven process automation as it says the process will be launched if some event happens. In this way the background processes cannot be simply run at a defined time. Service Oriented job scheduling - recent developments in Service Oriented Architecture (SOA) have seen a move towards deploying job scheduling as a reusable IT infrastructure service that can play a role in the integration of existing business application workload with new Web Services based real-time applications. 2.4 Related Works Many works are related in scheduling and also can be extended to the future works. For example, job priorities, computation of resource availability, execution time allocated to users, number of simultaneous jobs allowed for a user, estimated/elapsed execution time, availability of peripheral devices etc. 3. Software Developing The job schedulers provides control over batch jobs and distributed computing resources. One popular product is known as Portable Batch System (PBS) project. We will get into the main functionality of PBS and besides, we will mention some other developed job schedulers. 3.1 Portable Batch System (PBS) PBS is a computer software job scheduler that allocates network resources to batch jobs. 3.1.1 Components of PBS 1. commands An interface through command line or GUI, let the users submit, monitor and delete the jobs. 2. pbs_server To manage the jobs provided.
3. pbs_mom Receive the batch jobs from pbs_server and execute the corresponding program, report back to the pbs_server when the work is finished. 4. pbs_sched Responsible for the job scheduling, resources and nodes management. 3.1.2 Framework Server Computing nodes User commands jobs pbs_sched... pbs_mom pbs_server pbs_mom 3.2 TORQUE Resource Manager TORQUE is an open source resource manager providing control over batch jobs and distributed compute nodes. It is a community effort based on the original OpenPBS project and. In future work, we will apply TORQUE as our resource manager. Since it provides enhancements over standard OpenPBS in many areas such as fault tolerance utility, scalability of clusters/jobs and better scheduling interface. Also, the most important is TORQUE is totally free. 3.3 Maui The Maui Cluster Scheduler is an open source job scheduler for clusters and supercomputers. It is an optimized, configurable tool capable of supporting an array of scheduling policies, dynamic priorities, extensive reservations, and fair share capabilities. Work together with TORQUE.
3.4 Sun Grid Engine (SGE) SGE is typically used on a computer farm or high-performance computing (HPC) cluster and is responsible for accepting, scheduling, dispatching, and managing the remote and distributed execution of large numbers of standalone, parallel or interactive user jobs. It also manages and schedules the allocation of distributed resources such as processors, memory, disk space, and software licenses. 4. Conclusions We have surveyed and learned the fundamentals about how job scheduling mechanism works. To reduce the workload of our computer, many tasks need to be managed on multiple machines. Planning and scheduling jobs can mean a lot of work, so with the help of network and good job scheduler, we are allowed to easily manage all of the machines from a central point. We also survey three of the developing software and find each of them with different capabilities. TORQUE is widely used while SGE is somehow more advanced with license-free. As a future work, we plan to implement these frameworks on our computers and make a simple comparison of them if possible.
References [1] Wikipedia http://en.wikipedia.org/wiki/job_scheduler [2] Cluster Resources http://www.clusterresources.com/ [3] Open Source Job Schedulers, Linux Magazines http://www.linux-magazine.com/w3/issue/97/job_scheduler.pdf [4] OpenPBS http://www.pbsworks.com/ [5] Sun Grid Engine http://gridengine.sunsource.net/