Design and Development of a Batch Scheduling System in a Heterogeneous Environment

Similar documents

LSKA 2010 Survey Report Job Scheduler

Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest

Final Report. Cluster Scheduling. Submitted by: Priti Lohani

Virtual Machines.

CHAPTER 15: Operating Systems: An Overview

Key Requirements for a Job Scheduling and Workload Automation Solution

The Truth Behind IBM AIX LPAR Performance

Compaq Batch Scheduler for Windows NT

Distributed Systems LEEC (2005/06 2º Sem.)

Software design (Cont.)

Chapter 18: Database System Architectures. Centralized Systems

Muse Server Sizing. 18 June Document Version Muse

Chapter 1 - Web Server Management and Cluster Topology

Grid Scheduling Dictionary of Terms and Keywords

Chapter 1: Introduction. What is an Operating System?

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL

3 - Introduction to Operating Systems

CA Scheduler Job Management r11

Requirements document for an automated teller machine. network

Operating Systems OBJECTIVES 7.1 DEFINITION. Chapter 7. Note:

Microsoft HPC. V 1.0 José M. Cámara (checam@ubu.es)

How To Understand The History Of An Operating System

Managing large sound databases using Mpeg7

Rackspace Cloud Databases and Container-based Virtualization

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Disaster Recovery and Business Continuity Basics The difference between Disaster Recovery and Business Continuity

Thesis work and research project

Contents. Chapter 1. Introduction

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Principles and characteristics of distributed systems and environments

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

CS 3530 Operating Systems. L02 OS Intro Part 1 Dr. Ken Hoganson

Chapter 4 IT Infrastructure: Hardware and Software

CA Workload Automation Agents Operating System, ERP, Database, Application Services and Web Services

CA Workload Automation Agents for Mainframe-Hosted Implementations

Automated deployment of virtualization-based research models of distributed computer systems

Software: Systems and Application Software

automates system administration for homogeneous and heterogeneous networks

Running a Workflow on a PowerCenter Grid

Computer Science 4302 Operating Systems. Student Learning Outcomes

Real Time Programming: Concepts

Chapter 2: OS Overview

MQSeries. Clients GC

SOSFTP Managed File Transfer

Portable Bushy Processing Trees for Join Queries

Delivering a platform-independent based ESB for universal connectivity and transformation in heterogeneous IT environments.

Operating Systems 4 th Class

Distributed File Systems

Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings

Instrumentation Software Profiling

The Importance of Software License Server Monitoring

RevoScaleR Speed and Scalability

BatchControl. Functional Characteristics

CS550. Distributed Operating Systems (Advanced Operating Systems) Instructor: Xian-He Sun

Computing in High- Energy-Physics: How Virtualization meets the Grid

Decomposition into Parts. Software Engineering, Lecture 4. Data and Function Cohesion. Allocation of Functions and Data. Component Interfaces

How To Provide Qos Based Routing In The Internet

Tidal Enterprise Scheduler and Microsoft Windows

French Scheme CSPN to CC Evaluation

Load-Balancing for a Real-Time System Based on Asymmetric Multi-Processing

Guideline for stresstest Page 1 of 6. Stress test

Memory Database Application in the Processing of Huge Amounts of Data Daqiang Xiao 1, Qi Qian 2, Jianhua Yang 3, Guang Chen 4

IBM WebSphere Operational Decision Management Improve business outcomes with real-time, intelligent decision automation

OpenMosix Presented by Dr. Moshe Bar and MAASK [01]

Optimizing Shared Resource Contention in HPC Clusters

Computers: Tools for an Information Age

F Cross-system event-driven scheduling. F Central console for managing your enterprise. F Automation for UNIX, Linux, and Windows servers

BMC CONTROL-M Agentless Tips & Tricks TECHNICAL WHITE PAPER

Software development process

A Comparison of Distributed Systems: ChorusOS and Amoeba

IBM Software Group. Lotus Domino 6.5 Server Enablement

1.5 Distributed Systems

DISTRIBUTED SYSTEMS AND CLOUD COMPUTING. A Comparative Study

How To Compare Real Time Scheduling To A Scheduled Scheduler On Linux On A Computer System With A Visualization System

How To Use Ibm Tivoli Monitoring Software

Data processing goes big

CA Workload Automation

VMware Server 2.0 Essentials. Virtualization Deployment and Management

Principles of Operating Systems CS 446/646

Star System Deitel & Associates, Inc. All rights reserved.

Veritas Cluster Server from Symantec

Linux A multi-purpose executive support for civil avionics applications?

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

Base One's Rich Client Architecture

Network operating systems typically are used to run computers that act as servers. They provide the capabilities required for network operation.

Load Balancing in Distributed Data Base and Distributed Computing System

Weighted Total Mark. Weighted Exam Mark

Postgres Plus xdb Replication Server with Multi-Master User s Guide

Software-Defined Networks Powered by VellOS

Spécication et analyse formelle des politiques de sécurité dans le cloud computing

System Structures. Services Interface Structure

Event-Driven and Dynamic Process Automation. Enabling the Real Time Enterprise with Redwood Software

Architecture and Mode of Operation

Transcription:

Universität Karlsruhe (TH) Institut für Algorithmen und Kognitive Systeme (IAKS) Fakultät für Informatik Design and Development of a Batch Scheduling System in a Heterogeneous Environment Diplomarbeit von Narges Hadji-Hosseini WS 2006/2007 Betreuer: Prof. Dr. Jacques Calmet

Selbsständigkeitserklärung Hiermit bestätige ich, dass ich diese Arbeit alleine und ohne fremde Hilfe erstellt habe. Ich habe keine anderen als die angegebenen Literaturhilfsmittel verwendet. Statement of originality Hereby, I declare this thesis to be the work of my own, written independently. All sources and references are correctly cited with complete references to their sources. Karlsruhe, den May 2, 2007 Narges Hadji-Hosseini 3

4

Danksagung Ich möchte mich hiermit für die Unterstützung meiner Eltern, meines Bruders und meiner Freunde, die über die ganze Welt verstreut leben, bedanken, ohne deren Hilfe hätte ich vieles nicht geschat. Insbesondere möchte ich mich bei Alexander Elbs bedanken, der das Korrekturlesen übernahm. Einen groÿen Dank gilt auch meinem Bruder, Babak Hadji-Hosseini, der sehr geduldig meine Arbeit las. Mots de remerciement Je remercie Prof. Dr. Jacques Calmet pour son encadrement tout au long du doublediplôme ainsi que pour ce mémoire. Son aide non-bureaucratique et rapide pour toutes les dicultés qui sont apparues lors de ce programme était un facteur primordial pour ma réussite dans celui-ci. En particulier, je remercie Eric Angenault pour ses conseils et ses réponses aux questions nombreuses sur le scheduling ainsi que sur les particularités des ordonnanceurs. Grace à plusieurs discussions et conversations une grande inspiration pour ce travail s'est élaborée. Je remercie surtout Denny Mansart qui a rendu possible ce travail et qui a été toujours a l'écoute pour tous les problèmes administratifs et autres. 5

6

Contents 1 Introduction 11 1.1 Abstract..................................... 11 1.2 Introduction................................... 13 1.2.1 Financial Applications......................... 13 1.2.2 Distributed Computing........................ 13 1.2.3 Machine environment......................... 14 1.2.4 Heterogeneity.............................. 15 1.3 Organization of this thesis........................... 15 2 Job Scheduling Model 17 2.1 Preliminaries on Job Schedulers........................ 17 2.1.1 Features................................. 18 2.1.2 Scheduling................................ 19 2.1.3 Type of Schedulers........................... 19 2.1.4 Architecture............................... 20 2.1.5 Dierent Schedulers.......................... 20 2.1.6 Autosys TM............................... 21 2.1.7 Dollar Universe TM ($U)......................... 23 2.1.8 Crontab/Windows Scheduler..................... 26 2.1.9 Limitations of presented schedulers.................. 26 2.2 Proposition of a new Job Scheduling Model................. 27 2.2.1 High-Level General Model....................... 28 2.2.2 Basic Concepts............................. 28 2.2.3 Physical Properties of a Job...................... 29 2.2.4 Schedule and Execution Date calculation............... 31 2.2.5 Batch Execution............................ 32 2.2.6 Conditions and Triggers........................ 34 2.2.7 Verication and Monitoring...................... 35 2.3 Advantages of proposed job scheduling model................ 35 3 Batch Scheduling System 37 3.1 Overall Architecture.............................. 37 3.1.1 Job Denition Management Module................. 37 3.1.2 Meta Scheduling Module........................ 37 3.1.3 Job Scheduling Module........................ 38 7

Contents 3.1.4 Monitoring Module........................... 38 3.2 Heterogeneity.................................. 39 3.3 Relational Database Design.......................... 39 3.3.1 Job Denition.............................. 40 3.3.2 Machine Denition........................... 40 3.3.3 Group Denition............................ 41 3.3.4 Schedule Denition........................... 41 3.3.5 Execution................................ 42 3.4 Advantages of Relational Database Design.................. 42 4 Scheduling Algorithms 51 4.1 Basic notions on Scheduling.......................... 51 4.2 Description of our Scheduling Problem.................... 58 4.2.1 Simplications with regard to the real environment......... 58 4.2.2 Meta Scheduling............................ 59 4.2.3 Machine environment......................... 59 4.2.4 Job properties.............................. 60 4.2.5 Optimality Criteria........................... 61 4.2.6 Formulation of our scheduling problem................ 61 4.2.7 Complexity............................... 62 4.2.8 Related Work and Variants of the problem.............. 62 4.2.9 Summary of concrete problems.................... 63 4.3 Developed Scheduling Algorithms....................... 64 4.3.1 Global Scheduling........................... 65 4.3.2 Meta Scheduling............................ 65 4.3.3 Local Scheduling based on latest start dates............. 66 5 Experimental Analysis 71 5.1 Job Model.................................... 71 5.2 Scheduling algorithms............................. 71 5.2.1 Prototype engine............................ 71 5.2.2 Selected jobs.............................. 72 5.2.3 Results................................. 73 6 Conclusion and Future Work 75 6.1 Conclusion.................................... 75 6.2 Future Work.................................. 76 Bibliography 81 A Appendix 87 A.1 Relational Database Design.......................... 88 A.1.1 Complete Database Schema...................... 88 A.1.2 Application............................... 88 8

Contents A.1.3 Batch.................................. 90 A.1.4 Batch_has_group........................... 91 A.1.5 Group.................................. 91 A.1.6 Batch_has_machine.......................... 92 A.1.7 Machine................................. 92 A.1.8 Execution................................ 93 A.1.9 Instructions............................... 93 A.1.10 Status.................................. 94 A.1.11 Schedule................................. 94 A.1.12 Rules.................................. 95 A.1.13 Schedule_has_Rules.......................... 96 A.2 Application: TAUX.............................. 97 9

Contents 10

1 Introduction 1.1 Abstract With the tremendous growth of large parallel systems, new computing platforms have emerged, needing more sophisticated administration tools. These systems can be composed of powerful machines that are connected worldwide and, hence, need to interact with each other. One of these systems can be a nancial application, whose entities are supposed to run on dierent sites. At the same time, nancial applications, used by traders, often have hard time constraints to meet, and therefore, they can be considered as time critical or real time systems. A nancial application consists of up to several hundreds of small job, each one containing a schedule and realizing small functions. These jobs ensure altogether the entire functionality of a nancial application. They are supposed to run on a heterogeneous environment, consisting of several machines dierent in terms of operating systems as well as technical specications. In order to respect these constraints, in terms of time conditions and distributed architecture, several existing scheduling systems have been developed. We have studied intensively existing environments for batch scheduling, but the specic requirements, especially concerning time constraints and priorities for certain applications and executions, demand a more adapted concept. After comparing some existing environments and extracting the requirements generated by our specic scheduling constraints, a new general model is proposed. The rst objective of this diploma thesis is to develop a new unifying model containing a variety of concepts that allows to dene scheduled jobs for nancial applications. Nevertheless, the proposed job model is also applicable for general purpose jobs. Our high-level model is general, as it unies the dierences between existing concepts. Another advantage is to have one single interface for dening batch processing. We have developed a job scheduling model that allows the denition and the submission of batches in a homogeneous way, even batches with a complex scheduling structure, where many parameters have to be taken into account. When making scheduling decisions, there are many parameters that have to be taken into account by a job scheduling module. These multiple parameters and constraints, as well as, the fact that even simple scheduling problems on a single machine are already 11

1 Introduction NP-hard, clearly show, that an approximative solution is the most auspicious method to be applied in a practical context. On the other side, most of the algorithms, which have been studied for solving scheduling problems, assume a very simplied job model, which is insucient for a real scheduling environment, taking into account all constraints that can emerge. The second objective of this work is to propose a scheduling system for the proposed job model, where a scheduling algorithm can be applied on. The goal of our batch scheduling system is to prioritize high-critical batches in order to minimize loss. Some real world parameters have to be modied, some others neglected, in order to obtain a reasonable model on which an algorithm can be applied. We have designed a centralized solution, containing four dierent modules undertaking dierent activities. Scheduling theory is concerned with the optimal allocation of resources to activities over time. Jobs can be seen as activities. The goal of scheduling is to produce a good schedule; this denition of good depends heavily on the application and can be measured by so called objective functions. The third objective of this work is to model our concrete scheduling problems by a theoretical model. First, we will present some preliminaries about job constraints and objective functions and justify which of them we want to take as an optimality criterion. Our problem can be formalized as the weighted throughput maximization problem on parallel machines. To our best knowledge, this problem has not yet been studied extensively in the literature. We propose an approximative algorithm to a strongly NP-hard problem, which can be implemented in O(n log n). We have studied and analyzed the path from theoretical models and algorithms in order to nd an implementation on an actual environment of a real distributed platform. We will outline in details the divergences between classical theoretical models and a real batch scheduling system and describe how we have found a way to combine both. Both the general model and the proposed scheduling algorithm are going to be tested on a small set of batches and are going to be applied on a whole scheduling environment, containing about 15000 batch executions running on worldwide installed machines. By further extending the usage of this algorithm to a live system, we are going to show that the application, as well as the bridge between theory and practical environment, can be viable. 12

1.2 Introduction 1.2 Introduction This section describes the situation we were confronted with during this work. It explains the details of problems we encountered and the needs that came up in this context. 1.2.1 Financial Applications For a start, we will present a motivating example of an environment using scheduled background batches in order to guarantee the right work ow. All batches have schedules attached. Those schedules imply the automatic execution of background tasks (batch jobs) at pre-set points in time (e.g. every day at 8pm, midday on Wednesday). Financial applications are often operating on nancial markets and are used by traders to enter deals. One example of a nancial market is the nancial market of xed incomes and derivatives. These applications have to meet time constraints with hard deadlines. They consist of many smaller job units and run as batches within several scheduler software guaranteeing these time constraints. Batches can extract information about nancial products like swaps or options as well as current prices of commodities, foreign exchange, credit rates or interest rates from one site to another. These batches are supposed to meet hard time constraints. Many of these batches serve to aliment databases, belonging to these nancial applications. These databases are requested by actors on nancial markets like traders when negotiating a deal. The correctness of this data is a major factor for success. After passing a deadline, high loss in terms of money can occur (several millions of dollars), because the user (an actor on nancial markets) used wrong data when negotiating a deal. Thus, these batches are called high-critical batches and therefore are required to have a high priority. 1.2.2 Distributed Computing Distributed computing is a frequently used programming concept. It can be employed in dierent levels of computing and focuses on the concept of concurrency. It interrelates tightly with concurrent programming, which is the simultaneous execution of multiple interacting computational tasks. Distributed computing is in particular very commonly used while connecting computers in networks. There are several stages and levels of distributed computing and concurrency: 13

1 Introduction Multiprocessor systems A multiprocessor system is a computer having more than one CPU on its motherboard. This allows the operating systems to run dierent threads of a task on dierent CPUs. Computer Clusters A cluster is a set of multiple stand-alone machines being connected by a Local Area Network (LAN). Computer clusters can be either homogeneous or heterogeneous in terms of connected machines. In a homogeneous distributed system all CPUs are similar, whereas a heterogeneous distributed system is made up of dierent kinds of computers vastly diering in memory sizes, processing power and even basic underlying architecture. Computer clusters can also consist of machines, which are widely separated geographically and therefore being connected by a Wide Area Network (WAN). Cluster computing systems have become more and more present in the last recent years. This is due to the fact, that more complex applications need more computing capacities. On the other hand, these computing resources have become less expensive, thus allowing the growth of these clusters. Grids Grid computing is a computing model that provides the ability to perform computing by taking advantage of many networked computers. These computers model a virtual computer architecture and are connected by a network (usually the Internet) to solve large-scale computation problems. A grid uses the resources of many separate computers connected by a network to solve large-scale computation problems. The main dierence between computer clusters is the fact, that not each computer trusts the other. A middle ware, handling trust issues, has to be deployed. 1.2.3 Machine environment Financial applications usually run on dierent machines that are separated locally but connected to the same network. There are no trust issues because these machines are known to the network. Hence, only computer clusters form our working environment. Our computer clusters contain several machines. We model computer clusters with the 14

1.3 Organization of this thesis help of virtual machines. A virtual machine is a non-physical machine that can be composed of several physical machines. A physical machine has a certain capacity, which allows to run m batches at the same time. Our batch scheduling system is concerned with deciding, how to arrange batches on a machine in the most optimal way. The computer architecture of the physical machines, we deal with, corresponds to the common architecture of personal computers. The operating systems we employ are the most common operating systems in the nancial or industrial production environment. Nevertheless, the computer architecture of the machines is less relevant for our scheduling issues. Of course, many of the scheduling problems as well as their algorithms could be applied to any distributed environment, regarded from a theoretical point of view. 1.2.4 Heterogeneity Heterogeneity can characterize multiple components of a system. It means dierences in terms of operating systems and several physical properties of a machine (memory size, hard disk speed, networking card etc.). These dierences as well as the networking conguration can lead to latencies and dierent execution speeds which have to be taken into account when scheduling jobs. We are confronted with these heterogeneity issues. Nevertheless, the practical type of heterogeneity as described above is very far from classical theoretical models, in which machines (nodes) can be homogeneous, related or unrelated. Further details are presented in section 4.1. In these cases only the processing speeds of the machines are taken into account. It is obvious, that a best approximation between real systems and theoretical models has to be found. 1.3 Organization of this thesis We start by designing a general job model, that allows to dene complex jobs including high critical jobs. This general job model is especially tted for nancial applications, that can be seen as real-time applications. After comparing dierent commercial batch scheduling system, we will propose a general job model in chapter 2. In chapter 3, we will design a modularized architecture, comprising a central site and remote local sites, where the jobs are sent to. database design, where jobs are stored. The central site contains a relational In chapter 4, we will rst present basic notions necessary to model our scheduling situation theoretically. Afterwards, we will model our situation with these constraints and justify our choice of adapting a real world situation into a theoretical model. We will show that our problem, which is weighted throughput maximization on parallel machines, is NP-hard in the strong sense. Finally, we will develop approximative algorithms. 15

1 Introduction We will summarize results in chapter 6 and give a conclusion as well as hints for future work. 16

2 Job Scheduling Model A note on vocabulary In the following two chapters, we will use the terms job and batch interchangeably, whereas in chapter 4 we will distinguish between them in order to be congruent with the research community. But here, both terms signify an entity of an activity which needs to be scheduled. Nevertheless, a slight dierence can be remarked: a job means a task that has to be executed containing several specications that can be summarized as a job denition. a batch means this same piece of work emphasizing the aspect of automatic execution in the background (non-interactive processing). 2.1 Preliminaries on Job Schedulers A job scheduler is a software application that is in charge of scheduling and handling background executions, commonly known as batch processing. These executions are executions of jobs containing a command or a script. A job can use another IT-resource, e.g. a database/a server, via its script. This type of software is also known as a batch queue manager, because it arranges incoming batches successively commonly using priorities for executing multiple jobs. Batch schedulers are also in charge of distributing resources, which are necessary for executing a batch. Resources can be mandatory conditions for starting a batch. The scheduler software veries if resource conditions as well as time conditions are met in order to decide which batch to start in the next step. When a virtual machine is used to access a cluster of physical machines, the scheduler interacts as a dispatcher. One of the tasks of schedulers is to determine when a job is going to run taking into account its frequency, e.g. daily, weekly, monthly or annually. At the same time, a scheduler has to regard both dependencies of jobs within an execution - a job needs results from a previous job to be able to start - and the schedule attached to the job. Another feature of a job scheduler is to monitor the executions of jobs and to deliver a user-interface that reports the status of jobs as well as information concerning the execution like error codes or user etc. Resources can be very dierent in nature: the presence of a le, the availability of a machine, the success of a previous job etc. A 17

2 Job Scheduling Model resource can be as well a physical resource necessary for the execution of the batch (disk storage, memory, CPU). A note on batch schedulers It is important to note, that there is another quite different denition of the term batch scheduler used in the software community: A batch scheduler in the sense of a cluster or grid environment. In this case a batch scheduler is a tool for exploiting the cluster/grid. It is a system that manages resources, which are nodes of a cluster. It is in charge of accepting user submitted jobs that require parallel processing and to schedule them according to available resources. Thus, the objective of a batch scheduler in this sense is to ease the use of resources for the users. They should not have to worry about the availability of nodes as the batch scheduler will handle these issues and group the user submitted jobs into schedules. Very often a job in this model requires parallel computation. A survey on this kind batch schedulers can be found among others in [BFY95]. We are not aiming at designing such a system. A note on operating system schedulers In the general context of scheduling tasks and scheduling algorithms it is important to note, that each operating system has its own embedded scheduler in the kernel. Scheduling is a key concept in multiprocessing operating system design. In general-purpose operating systems, the goal of the scheduler is to balance processor loads, and prevent one process from either monopolizing the processor or being starved for resources. An operating system scheduler acts locally on distributing operating system resources to processes. However, this kind of scheduling is not regarded in our work either. Instead, our goal is to design a job scheduler that schedules jobs containing time constraints as well as other dependencies. Our job model will not require parallel computing, but rather allow to dene a variety of attributes for a job. In our job model one physical machine can execute several jobs at the same time and the scheduler decides which jobs are going to run. 2.1.1 Features Basic features expected of job schedulers are: 1. Interfaces to dene jobs in terms of properties as well as resource conditions 2. Interfaces to dene workows and dependencies between jobs 3. Interfaces to dene a schedule for a batch or for a workow of batches 4. Automatic submission of executions of batches 5. Interfaces to monitor the executions of batches 18

2.1 Preliminaries on Job Schedulers 6. Interfaces to modify current executions and their status 7. Priorities and/or queues to control manually the execution order of unrelated jobs 2.1.2 Scheduling Each job has a schedule attached, which contains the time parameter indicating when to run the job. Various schemes are used to decide which particular batch to run. Parameters that might be considered include: Job priority In order to prioritize the execution of a job, priorities inuence the scheduler when deciding which job to run. Compute resource availability This parameter checks if the indicated machine, where the job has to be run, is available. If it is not available, the job will not be started. License key This parameter checks if the job is using licensed software. Execution time allocated to user This parameter denes the total execution time allocated to jobs of a user. Number of simultaneous jobs allowed for a user One user can have a limited number of simultaneously running jobs. Estimated execution time This parameter takes into account the run time, which has been estimated for one job. Elapsed execution time This point deals with the real elapsed execution time of a job. Deadline dened for the execution of the job Each job has a deadline, dening when a job should have been nished. 2.1.3 Type of Schedulers Three types of job scheduling can be distinguished: native, basic and advanced scheduling. Most operating systems and some business solutions software come equipped with native scheduling tools that provide a limited service (e.g. Windows Scheduled Tasks, UNIX TM Crontab, SAP CCMS) locally to each installation. However, complex applications may span multiple platforms, applications, countries and companies. Their complexity often requires much more functional power than provided by basic scheduling including national and regional variations in the working calendar, sort variations according to the day of the month, triggering of jobs by the successful completion of preceding jobs, elimination of gaps and reduced batch windows. Basic scheduling already has major benets. However, it can not handle scheduling on clusters and job dependencies. 19

2 Job Scheduling Model Advanced scheduling can handle these advanced requirements: condition-driven scheduling for a real-time synchronization with interactive processing, just-in-time scheduling to run operations as soon as possible, cross-platform and cross application services and realtime overall monitoring to track background operations for all applications on all servers. The standard benets of job scheduling are drastically amplied when job schedulers can handle the end-to-end automation and monitoring requirements for all background operations. One application might be using several job schedulers to organize its jobs, while it is obvious that these jobs can not communicate between each other in terms of dependencies, as batch scheduling software from dierent editors are not compatible in general. 2.1.4 Architecture Two major architectures exist for job scheduling software: Master/Agent architecture The historic architecture for job scheduling software. The job scheduling software is installed on a single machine (Master) while on production machines only a very small component is installed (Agent) that awaits commands from the Master, executes them, and returns the exit code back to the Master. This is called centralized architecture. Cooperative architecture A decentralized model, where each machine is capable of helping with scheduling and can ooad locally scheduled jobs to other cooperating machines. Since theoretical work on scheduling algorithms is not well adapted to real environments (simplications have to made in order to get a useful job model) and since theoretical work is not well known in the software community, the algorithms in batch scheduling systems are often simple heuristics with no theoretical background. 2.1.5 Dierent Schedulers Although dierent schedulers have been developed, they do not dier signicantly in their main concepts. There are commercial products as well as open-source software systems. Examples can be found in [BFY95]. In particular, we will present three main types of schedulers: a centralized solution with a master/agent architecture (Autosys TM ), a decentralized solution with a distributed architecture (Dollar Universe TM ) 20

2.1 Preliminaries on Job Schedulers native schedulers included in the operating system (Crontab for UNIX TM, Windows Scheduler for WINDOWS). Autosys TM as well as Dollar Universe TM are commercial software products mainly used in industrial production. We will only outline the general scheduling concepts of these systems in order to compare and to propose a high-level general purpose scheduling model. In particular, we are interested in two major points: the architecture used in the distributed heterogeneous environment and the job model used to declare jobs. We will not regard in details technical specications of these software solutions: neither their interaction/integration within further processing ERP packages, such as SAP R/3, PeopleSoft nor their user interfaces (Java, Windows, Web). Further information can be found in the corresponding user guides or on the websites of the corresponding editors. 2.1.6 Autosys TM Autosys TM is a commercial job scheduler software by Computer Associates TM allowing to schedule and to monitor the execution of multiple job in a heterogeneous network environment. It is an example of a centralized solution having a master/agent architecture. Furthermore, it is an advanced scheduler using an event-driven solution when making its scheduling decision. A complete reference can be found in [Ass06]. We will only present the most important concepts and the their functions. Autosys can schedule and monitor jobs in a distributed heterogeneous environment. Supported server and agent platforms include: HP-UX, IBM AIX, NCR/AT&T UNIX, Pyramid DC/OSx, Silicon Graphics IRIX, Sun Solaris, Data General DG-UX, DEC OSF/1, Sequent DYNIX, Siemens Nixdorf SINIX, DEC, Compaq Tru64, AS/400, VMS, RedHat Linux, Windows NT, Windows 2000 and Windows XP. Architecture Autosys Job Management has an event-driven, tiered architecture comprising three components: 1. Relational Database: Stores all events and alarms 2. Event Processor: Interprets events and, based on job denitions, initiates actions via the Remote Agent. 3. Remote Agent: Performs its tasks and sends the resulting job status back to the database. 21

2 Job Scheduling Model Job Model Each activity organized and controlled by Autosys is based on the concept of jobs (programs, workow, le watchers). A job in terms of Autosys is associated to a command or a script and is attached to a set of rules that condition its schedule. Jobs handled by Autosys can be executed on dierent client machines. Denition of jobs Three types of jobs are available: Command, File watcher, Box These tree entities have a majority of attributes in common. Autosys treats them in the same way via its graphical user interface or its command language for dening jobs. Job Command This job can be represented by a script, an executable program or a program transferring les. As soon as all starting conditions are met, Autosys executes this job and catches its exit code once the program has terminated. Thus, the exit event and the exit code are sent back to the central database of Autosys. Job File Watcher A job File Watcher is similar to a job Ccommand. Instead of starting a program or command on a distant machine, Autosys is in charge of controlling the presence of a le and several attributes (control of its size) at a certain time. The use of this job allows integrating external events as starting conditions for jobs of Autosys. For example, a le that should be transferred at 3 am on a machine can constitute the starting condition of a job or a chain of jobs. Job Box The job Box represents a logical organization containing several jobs. This arrangement implies no action and serves only for grouping jobs with similar conditions together instead of attaching the same conditions to each job. If the jobs contained in the box possess no further starting conditions, they are executed in parallel. Dependency Conditions between jobs With Autosys it is possible to dene chains of dependency and work ows for jobs. An execution of a job might depend on the result of a previous job. The denition of dependency conditions is explicit and allows to test the following items: The status of a previous job determining if a job was successful or in failure. A value of a exit code, which can be between 0 and 255; the exit code 0 signies the success of a job whereas other values signify several distinct failure states. The value of a global variable. 22

2.1 Preliminaries on Job Schedulers A set of the previous presented conditions can be combined with a simple propositional logic using logical operators AND, OR and parentheses. Time and Date conditions Dissociated from inter job dependencies it is also possible to dene execution periods for jobs. Two kind of calendars can be attached to a job: a run calendar determining the execution days an exclude calendar excluding dates It is possible to use user-dened calendars as well as standard calendars. Additionally to the calendar, it is necessary to declare the execution frequency of a job which determines within the calender the concrete execution dates. Events Autosys is an event-based job scheduler. Each scheduling action is triggered by events listed in the table of a centralized database. Events can be of various nature and trigger almost every action of this scheduling software. Events occur when starting conditions of a job are met, when the status of a job changes and when the user launches them. 2.1.7 Dollar Universe TM ($U) Dollar Universe TM is a job scheduler software package by Orsyp TM allowing to industrialize the production on a heterogeneous environment (VMS, OS400, UNIX and Windows). Dollar Universe also belongs to the category of advanced schedulers. We will also only present the parts relevant for our work (architecture and job model). For further information please refer to [ORS06]. Architecture Dollar Universe is based on a client-server architecture allowing to supervise a production on multi-sites as well as multi-platforms. Dollar Universe is purely based on event-driven processing and takes in charge regular execution of dened work ows in real time by overtaking daily scheduling tasks. Within Dollar Universe, two essential notions can be distinguished: 23

2 Job Scheduling Model The Company The notion of company means that, in a given computing environment, a single copy of Dollar Universe can manage the operations of several distinct companies independently. The Management Unit Within the company, Dollar Universe can handle distinct environments where several dierent operations scenarios will be run in parallel, as might be the case with a company using the same applications on dierent data. These environments are called management units. Dollar Universe can manage several thousand management units. Dollar Universe uses standard protocols, such as DECNET, APPC and TCP/IP to provide client-server communications and operations between the dierent Dollar Universe modules on a network. Distributed operations via Management Units The management unit is Dollar Universe's main contribution to the management of distributed operations. A management unit can reside at any node on the network, provided the node is identied in the descriptive information of the management unit. In addition, Dollar Universe qualies management units through further notions like their type or by dependencies between them. Within this framework, each Dollar Universe function referring to the notion of management unit (expression of conditions in the job descriptions, denition of jobs dependent on a job in a session etc.) is able to be interpreted by the conditioning and properties of the management unit it belongs to. Co-operative management of distributed operations The expression of conditions in the job descriptions can use logical expressions targeting the management units. In this way, it becomes possible to synchronize jobs running on management units residing on dierent nodes: In this case without needing to dispatch applications pseudo-les, Dollar Universe implements co-operative multi-machine management. It issues requests toward the nodes of residence of management units of the type in question, to determine if the condition is actually met. As soon as the expected event occurs, the requesting machine will be automatically informed. Job Model Elementary job description: UPROC The description of a job, called Uproc rests, essentially on identication of the procedure, denition of its technical characteristics, and the conditions required for its execution. Dollar Universe distinguishes three main types of conditions: Sequence conditions, representing the dependence of one job on another, 24

2.1 Preliminaries on Job Schedulers Conditions of mutual non-simultaneity of jobs, Conditions of resource availability. For each of these conditions, Dollar Universe allows associating criteria such as: A user submission account, A functional date (date for which the job was run), The awaited status, The management unit or group of management units for which the job ran. These conditions can be combined in an expression associating the constraints using the logical operators: AND, OR, =, not = and parentheses (). Description of a job stream: Session From a more macroscopic point of view, Dollar Universe oers the notion of job stream, referred to as a session, for procedures presenting homogeneous operations constraints (same scheduling conditions, for example). The session allows jobs to be ordered in a tree structure, whereby each job is told which jobs follow it in both normal and abnormal operating circumstances. The sequences so-dened within a session do not substitute for the dependencies dened at the individual job level, but rather supplement them. It is therefore possible to dene the associated functional conditions for each job and, via the session, superimpose on these the execution sequence required by the operator. Doing so, operations imperatives (time constraints, optimization of resource consumption etc.) can be integrated without altering the expression of the functional conditions. Scheduling of Sessions Scheduling cannot be applied to individual jobs but only on sessions (groups of jobs). It rests on prior denition of the following objects: A series of time reference bases created using civil calendars; each management unit may have its own calendar. A series of execution frequencies called scheduling rules, either predened or denable, depending on requirements. A scheduled session is referred to as a task. Scheduling Methods Dollar Universe proposes several main scheduling methods, which can be used together if required, for handling recurrent and/or random runs: 1. Associating from one to seven scheduling rules, to obtain the desired nal schedule, 25

2 Job Scheduling Model 2. dening the job execution time (such as a particular time for a particular day of the week, or up to 150 launches per day, etc.), as well as a waiting period for the conditions to be met. After this interval, the execution will be abandoned or forced, 3. dening up to 52 exclusion dates. Operations monitoring Dollar Universe allows tracking and supervision of operations with the aim of facilitating job monitoring, accelerating the identication and diagnosis of incidents, and permitting the necessary recovery procedures. There is a dedicated job monitoring function. 2.1.8 Crontab/Windows Scheduler Crontab for UNIX TM and Unix-like operating systems and Windows Scheduler for the Windows operating systems are native job schedulers limited on local installations. They can schedule jobs (a command) to be executed with the following schedule characteristics: Minute Hour Day of month Month Day of week Clearly, this kind of scheduler is very limited, as it only allows specifying an execution date of a job on a local machine and is not able to dene neither inter job dependencies nor resource conditions. Furthermore, it is more complicated to schedule a job according to a particular scheduling rule, as no calendars can be taken into account. Example: a job, which should be executed each rst working Monday of a month, is dicult to be scheduled by this kind of scheduler. 2.1.9 Limitations of presented schedulers Among the main limitations of existing batch scheduling systems and in particular those presented in the previous sections the following are the most signicant: Absence of theoretical foundations The job schedulers, we have presented previously, are not based on optimality criteria but rather event-driven. When several jobs are going 26

2.2 Proposition of a new Job Scheduling Model to run on a machine, a simple batch queue is used. There is no theoretical well funded algorithm used for scheduling decisions. This is due to the fact that theoretical work on scheduling algorithms is not well adapted to real world requirements and often even not well known to the software engineering community. When algorithms are used, they are not more than simple heuristics. Lack of criticality There is no real mechanism for dening and prioritizing high critical batches having hard deadlines to meet. This is a indispensable concept for nancial applications. Complicated Job Model The job models are complex, powerful and general-purpose, but there is a lack of clearness when dening jobs and inter-job dependencies. We want to have an unied homogeneous job model interface for declaring jobs. At the same time, a sophisticated job model containing only necessary concepts improves user-friendliness. Our job model is adapted to specic requirements of nancial applications and contains only necessary and simple concepts but at the same time it allows dening complex chains and work ows. Limitations of a commercial software product One of the main advantages of using a commercial software product is guaranteed support by the manufacturer. The main disadvantage in our context is that a commercial software package does neither allow us to adapt the job model according to our needs and evolutions nor to modify the scheduling engine, containing the scheduling algorithm, for improvement and testing purpose. It is only possible to personalize the commercial job schedulers to our specic distributed environment to a limited degree. By developing our own job scheduling software it will be possible to extend the scheduling engine and to add components to the job model. Altogether our solution will be extensible to fulll further evolution of requirements. 2.2 Proposition of a new Job Scheduling Model In this chapter, we will propose a new job scheduling model that will be the basis of our our batch scheduling system. 27

2 Job Scheduling Model 2.2.1 High-Level General Model In order to create a high-level general model, we are as generic as possible in dening scheduling terms for our purpose. This approach allows using one single job scheduler that delivers all features. We rst dene our own scheduling concepts, then we present a high-level general model. From this model, we deduce a concrete data base model, which is explained in details in appendix A.1. This database stores all information available for the environment and is a part of the job information module on the central site. 2.2.2 Basic Concepts Application A set of applications represents the functional specications of all jobs. An application is the most general concept and species the functional domain of a job. This concept allows determining a purely applicative segmentation between a set of jobs. Job A job or batch is an atomic concept. This means, it is the smallest entity to be scheduled. It can contain a script or in the basic case a command to be executed at a certain time. In case that it contains no script, the job is of the type le watcher. This means, that it veries the presence of a le at a specied time. A job has specications in terms of technical denitions, which can be referred to as a job denition. Groups or Batch streams Batch streams can be used to group several jobs with similar time constraints and schedules together. The aim of this arrangement is to underline job dependencies as well as to give similar execution conditions to a group of batches. Grouping several jobs together allows also attaching the same schedule to them. We will refer to this concept as a group. A group contains jobs with like scheduling parameters, not as means of grouping jobs organizationally. 28

2.2 Proposition of a new Job Scheduling Model 2.2.3 Physical Properties of a Job Machine A machine denes the physical place, where the job or more precisely its script is supposed to be running. Node A node is a cluster on which the job has to be executed. We will refer to this concept also as a virtual machine. In our context a virtual machine contains several physical machines. This concept allows load balancing when choosing on which concrete physical machine the job will be started. Furthermore, it is possible to increase or decrease the number of physical machines behind a cluster or even to rename them, as the job will only be dened on a virtual machine. One of the tasks of the scheduler is to decide on which machine to run the batch execution by mapping a job to a concrete physical machine. Users The user is the name under which the job will be running in terms of operating systems. Usually, it corresponds to the user running the script attached to the job. Hence, we assume that the user is dened on each machine, where the job is supposed to run. The notion of a user can be taken into account as a supplementary scheduling parameter when deciding which job to start. The number of running jobs of a user can be limited. Files A batch may be using several les for reading input and writing output and error. These les are purely technical issues and only dene the name of the les which will be used by the job script when executed. They only serve to complete and extend the job denition and they are no necessary resources that are obligatory for the execution (these kind of les can be found in the concept resources). 29

2 Job Scheduling Model Execution conditions A job contains an estimated runtime as well as a maximal runtime, both parameters are set by the user when declaring the job. Estimated runtime The estimated runtime will be taken into account by the scheduling algorithm in section 4.3 as processing time for the job. Our model connes the responsibility for dening exact processing times for a job to the user. Nevertheless, the real runtime can dier from the estimated runtime, and so, the maximal runtime denes when a job should be turned into an error state. Maximal runtime status due to a timeout. This parameter can also be seen as a deadline at which a job has to be executed with success. Both estimated runtime and maximal runtime (as the due date) will be taken into account by the scheduling algorithm in section 4.3, when determining the order of job executions. If the job exceeds its maximal runtime, it will turn into the Failure Priority The priority expresses the criticality of a job. A job with a higher priority will be advantaged to run. This parameter will also be taken into account by the scheduling algorithm in section 4.3. Technical Resources A job can use technical resources of the information system other than the machine while being executed. The presence of this kind of resource is no mandatory condition for the job to be started by the scheduler but rather a technical issue contributing to the job denition. These resources are accessed by the script or command attached to the job. Similar to the declaration of used les, the denition of resources aims to complete the job denition in technical manner, although these resources will not be veried by the scheduler. The listing of them helps the user to overview the scope of its jobs. Examples of these resource are various: a database, a ftp-server, a mail server etc. 30

2.2 Proposition of a new Job Scheduling Model 2.2.4 Schedule and Execution Date calculation Time constraints are very important in time critical systems as nancial applications. A schedule allows dening at which time the job has to run. It can be attached to a group as well as to a single batch. Schedule An element in a batch schedule system can be attached to a schedule. This element may be a single job or even a whole session of jobs, that is why batch scheduling systems are also called batch processing system. A schedule is characterized by the following items: Calendars A calendar denes the working days. The working days dier from one time zone to another and are also depending on country specic holidays. Rules A scheduling rule is: A base cycle, that is, a number of days, weeks or months, A base cycle start date, The list of days of the week authorized for execution, An oset direction from the execution date, whereby, if the date obtained by applying the rule to the calendar is not a workday, this date will be shifted to the rst workday preceding or following the targeted date. Rules are applied to the calendar and dene on which days of the calendar the job is to run. They refer to calendar entries and they dene the frequency of a job. Examples: Every working day, The rst working Tuesday in each month, Every 87 days, First Monday of each Month, Each Thursday (Working Day) - if a Thursday coincides with a holiday, the job will NOT be executed Daily 31

2 Job Scheduling Model Start Times Examples: Start times determine at which specic time of a day a job should run. 1. At 9.00 pm 2. At 9.15 pm; 9:30 pm; 9.45 pm 3. Between 9.00 pm and 10.00 pm A calendar, a rule and start times form a schedule. 2.2.5 Batch Execution A batch can have several executions. We store these executions in order to have a certain amount of historical data. An execution item contains information about the machine, where the job has run, the user, the start and end times and possible modications that have been made manually. Whenever an instance of a batch is started, a batch execution item is created. This batch execution item helps to keep track about the possible manual intervention and the incidents that have occurred while running the batch. Thus, a batch execution can be seen as an instance of a batch. Job Status A batch or more precisely an execution of a batch always has a status specifying at which advancement level this batch is situated. The following main status types describe our general concepts and the ow of job processing. Inactive The job is not ready to run, its starting conditions have not been fullled yet. Active The job can run, its starting conditions have been met. It is ready to be put on a batch queue and to be processed. Starting The job is about to start. This state also describes a job that is sent to a local site in order to be scheduled. Therefore a job that is in the state of starting can be sent directly into the status of failure in the case that its maximal execution time has exceeded before being able to run on a machine. This can happen, when the priority of the job is too low and other jobs have been privileged. Although this situation is not desired, because it means that the job has not been treated at all, we will allow it in order to favor jobs with a higher priority. Running The job is running. Success The job has been terminated with success. Failure The job has been terminated with failure. 32

2.2 Proposition of a new Job Scheduling Model Terminated The job has been terminated without success and without being on failure. This status is normally manually set by the user. On Ice The job has not been terminated but rather stopped from running. This status means, that the job does not block others from being executed, when they need its success in order to be processed. This status is set, if the job has already fullled all conditions that a successor job might need and therefore the successor jobs should not be blocked. On Hold The job has not been terminated but rather stopped from running. In contrary to the preceding status, this status blocks further job (those needing success from this job) from being processed. The last two status on hold and on ice are abnormal execution paths and serve to set execution paths for depending jobs. They can also result from manual intervention. In gure 2.1 the transitions between states can be seen. The job scheduling module of our architecture (section 3.1) is responsible for changing the status of jobs and to control their execution. The user can modify manually job status not regarding their current execution progress. Figure 2.1: States and State Transitions of a Job Group Executions Grouping several jobs together allows attaching the same schedule to several jobs without dening it for each job. A group contains jobs with like scheduling parameters, not as means of grouping jobs organizationally. For example, if there is a number of jobs that run daily at 1:00 a.m., they can be put in a group assigned a daily start condition in form of a schedule. However, a variety of account processing jobs with diverse starting 33

2 Job Scheduling Model conditions do not form a group. As soon as a group starts running, all the jobs in a group change to status ACTIVATED, meaning they are eligible to run. Then each job is analyzed for additional starting conditions. All jobs with no additional starting conditions are started, without any implied ordering or prioritizing. Jobs with additional starting conditions remain in the ACTIVATED state until those additional dependencies have been met. The group remains in the RUNNING state, as long as there are activated or running jobs in it. If a group is terminated before a job in it was able to start, the status of that job will change directly from ACTIVATED to FAILURE. Note: Jobs in a group cannot start unless the group is running. However, once the job starts running, it will continue to run even if the group is later stopped for some reason. This is due to the fact, that the executions of jobs of the same group do not necessarily need to depend on each other. 2.2.6 Conditions and Triggers Besides the time conditions, other conditions can be dened as being mandatory for the execution of a batch. These conditions might be various resources and are forcing a job to get executed. Resources Physical Resources File Resource Some jobs may require the presence of a le as a mandatory condition before being executed. This concept allows us to dene pure le watching job without an executing script. Machine Resource In order to be processed, each job needs the availability of a machine. Otherwise, a job cannot be started. Logical Resources Success of a previous job execution A starting condition for a batch can be the success of another job and more specically an exit code that this jobs returns on full termination with success. Dependencies between Jobs Dependencies between jobs are dependencies in terms of execution order, which means that a job with a higher order has to be processed before its successor is executed. 34

2.3 Advantages of proposed job scheduling model Triggers Triggers are submitted by the user and enforce the start of a job beyond its schedule. 2.2.7 Verication and Monitoring Verication A job schedule model should include the possibility to dene actions to be taken before, during and after the execution of a job whether it is normal or abnormal. We will refer to these actions as Tests or Instructions. In the case that the job has an undesired state, e.g. when it is on failure state, instructions can be dened that have to be executed, they help to handle incidents occurring while executing a job. These instructions are static and dened as a set of procedures. These procedures indicate which actions should be taken when a job has a certain status. At the same time, tests and instructions can be dened to be run before or after the execution of a job. The goal of this concept is to reduce losses produced by erroneous jobs. Tests as well as instructions, attached to a job, can be accessed via the monitoring module. Run Commands Tests and Instructions to be executed during job processing on certain states (failure). Pre Commands Tests and Instructions to be executed before job processing. Post Commands Tests and Instructions to be executed after the job has been terminated. Monitoring The concept of monitoring allows retrieving the current progress of jobs on dierent machines as well as their status. A monitoring module is also able to let the user manually intervene by changing the status of a job, triggering a job start or by removing a job. Thus, the user can manually inuence scheduling decisions. 2.3 Advantages of proposed job scheduling model The proposition of our new general job scheduling model allows using signicant concepts for job scheduling. In particular, the following advantages have been gained: We have gradually developed the job model according to specic requirements. Therefore, the set of concepts does not contain more than the minimal concepts necessary for dening a scheduled job. A job, which has been dened according to these concepts, automatically respects the given constraints and job scheduling parameters, thus, it represents a general job and is not linked to a specic commercial job scheduler. 35

2 Job Scheduling Model The concepts have been dened from the point of view of their necessity with regard to a nancial application. This allows dening a criticality for jobs as well as other signicant time constraints. We will be able to easily transform our general job model into a relational database, which we will see in section 3.3. This shows, that single concepts can be separated from each other as well as regrouped together into more complex concepts. This underlines the improved semantical meaning of the concepts with regard to commercial job schedulers. We will be able to transform our general job model into a purely theoretical model (section 4.2) without losing much of its functionality. Finally, we will develop an algorithm (section 4.3) for this theoretical model. 36

3 Batch Scheduling System 3.1 Overall Architecture In this section, we are presenting the overall architecture of our batch scheduling system. Our system is composed of four dierent modules. We describe each single module in sections 3.1.1 to 3.1.4. We have chosen a centralized architecture having remote agents on the client sites (also called local sites), which control the executions of jobs and communicate to the central site the current progress of batch processing. This allows in particular the use of a centralized database. We will justify essential design choices concerning the centralized architecture of our job scheduling system and the resulting advantages. Figure 3.1 shows the modules of the centralized architecture. Their interaction is shown in gure 3.2. We can distinguish between four modules on the central site: Job Denition Managment Meta Scheduling Job Scheduling Job Monitoring 3.1.1 Job Denition Management Module This module contains a relational database with all job denitions according to our general job model that we have presented in section 2.2. With the help of this module, it is possible to manage the submission (declaration) and the manual modication of jobs. 3.1.2 Meta Scheduling Module Our job denition allows dening virtual machines for the execution of a job. This enables us to use load balance in order to equally distribute the jobs on each machine. Addi- 37

3 Batch Scheduling System tionally, the concept of virtual machines helps us to handle heterogeneity. With a high probability, a system conguration will include machines of varying processing power, thus, it will be necessary to specify the factor attribute value for each real machine. The meta scheduling module is responsible for checking whether the starting conditions for a job are fullled by querying the job denition module. It permanently checks triggers for setting a job in the state of active, which means that the job is able to start. Triggers can either be time conditions (the job is in its schedule to run) or success conditions. If the batch is executed on request, the trigger is set manually by the user. The meta scheduler adds all jobs which have the status active to the batch queue dened for the machine. Also it decides to which local site an active job is send. 3.1.3 Job Scheduling Module The job scheduling module controlls the arrival of new jobs on local sites. It handles the state of a job that is reported by the local site, therefore it can be seen as a communication module between the central site and local sites. Each local site is comprised of a local scheduler handling jobs on its own. Thus, we avoid unnecessary communication overhead between a local and the central site as only job status are reported. 3.1.4 Monitoring Module The monitoring module is responsible for managing job executions. It shows their status and executes possibly dened instructions. These have to be done in case of job failure or are dened to be executed before the start of a job and when a job has been terminated. Summarizing, the monitoring module is in charge of managing job executions as well as indicating run-, pre- and post commands (see 2.2.7) to a user. In section 4.3, we are going to regard our scheduling problem from a theoretical point of view. We will formalize the scheduling situation that has arisen from the job model. We will see that the real model has to be simplied in order to be transformed into an appropriate formal job model that allows applying a scheduling algorithm. Also, we will develop algorithms for all scheduling situations that have resulted from formalizing our situation. In particular, we are going to develop algorithms for local scheduling and for meta scheduling. 38

3.2 Heterogeneity 3.2 Heterogeneity We are confronted with the problem of heterogeneity in the following way: our jobs can be declared on machines diering in terms of physical as well as software properties. This issue was resolved by dening remote agents at each local site interacting with the central site. This centralized architecture allows improving load balancing of jobs and especially for gathering as many dierent operating systems as possible, as the remote agent remains a light-weighted module and the calculating part remains on the central site. Furthermore, specications for a remote agent with a central architecture are less complex than the design of a distributed scheduling system. This approach also permits to apply our developed scheduling algorithm on a local site not taking into account heterogeneity. The meta-scheduling algorithm 2 in section 4.3 causes the workload to be spread across multiple machines, based on each machine's capabilities. In fact, the scheduling decisions are not made system wide but rather locally. A machine can execute several jobs at the same time. Therefore, it can be seen as an identical multi-machine environment. Thus, we are confronted with homogeneity when local scheduling decisions are made. Heterogeneity is a very dicult issue in scheduling theory and even complicated to formalize [Eyr06]. As we have already mentioned in the introduction section 1.2.4, theoretical models only allow making specications about dierent processing times of the same job on dierent machines not taking into account other issues. Our approach toward heterogeneity helps us to design an algorithm that emphasizes our needs (processing of high-critical jobs) and that is pretty ecient at the same time. 3.3 Relational Database Design We have described abstract concepts concerning a job and its properties, which allow us dening and running jobs in a generic manner. We have transformed these concepts into a relational database containing all job denitions. The relational database is being queried by the meta scheduling module, which checks regularly in the database if the starting conditions of a batch have been fullled. We will explain the main tables as well as their their links to each other. In our context, a view is a selection of some tables of a database design. One view contains tables that belong to a concept (as dened in 2.2). Further information about the relational database design and the complete data base schema can be found in appendix A.1. Additionally, we will justify the choice of a relational database in section 3.4. 39

3 Batch Scheduling System 3.3.1 Job Denition The view of the database in gure 3.4 contains the tables that are linked together and form a purely technical denition of a job. Concepts The following concepts are contained in this view: A job or batch denition The corresponding application Job conditions Machine denitions Relations The table Batch contains all physical denitions of a job and is linked to the table Application, which serves only as a applicative segmentation. The table Conditions contains starting and execution conditions of a job as well as dependencies. A job can be running on one or several machines which are described in the table Machine. The cardinalities between tables are the following: One application can gather several jobs (1:n) A job can be running on several machines and several jobs can be running on one machine (n:m) A job can possess several conditions (for each condition there is one entry in the table) (1:n) We will explain in details the table Batch which contains denitions related to a batch in table 3.5. 3.3.2 Machine Denition With the help of the table Machine, it is possible to dene virtual machines or physical machines. A virtual machine can be composed of several physical machines. All real machines within a virtual machine must be of the same type of operating system. The ag virtual is set to true in this case, and the table entry has entries in the same table with physical machines. A load balance factor, an integer number, factor helps the meta scheduler 3.1.2 to decide on which physical machine to add the job. We will describe in details the elds of table Machine in table 3.6. 40

3.3 Relational Database Design 3.3.3 Group Denition The tables of this view, which is shown in gure 3.7, give a denition of a group of batches (or job streams). This view shows the concept of gathering jobs with similar conditions into groups. Cardinalities The following cardinalities are reected by the database design: Several jobs can be put into a group and one job can belong to several groups (n:m). It is possible to attach conditions to a single job as well as to a stream of jobs. (1:n) The order of jobs in a group can be dened by setting the attribute order in the table Batch_has_Group. 3.3.4 Schedule Denition Figure 3.8 shows tables concerning a schedule denition. In our job denition, we can attach a schedule to a job as well as to a group of jobs. We have not found this concept in other job scheduling models, yet it is an essential concept, which we will need for declaring complex job and group denitions. The possibility to attach a schedule to a job as well as to a stream of jobs has the following advantages: 1. Using job streams with a schedule, denes an overall schedule in which the jobs are placed. If a job has no schedule, it will get the schedule of his group. 2. By attaching a schedule to a single job, it is possible to precise at which time the job is to run with preference. If a group schedule as well as a job schedule are set, the job schedule will be preferred. The following cardinalities concern a schedule denition: A batch or a group can only have one schedule (1:1). A schedule can be composed of several rules (n:m). 41

3 Batch Scheduling System 3.3.5 Execution In our relational database model we have a table containing each execution instance of a batch or a group. It contains the current status of a job instance, its execution time as well as user modications possibly made during execution. This table serves in particular as a history for job executions. It is important to remark, that an item of a batch execution is always related to a scheduled entry (a job or a group), that means that the tables Batch and Group only contain denitions, whereas the schedule represents the concrete item that is going to be executed. Each time when a group or a job is detected to be started, an item of an execution (a job instance) will be generated. During the execution this item will contain its status and serves for monitoring purpose. This approach decreases overhead in terms of data and allows a fast data access, as only integer keys are used to access an instance of a job. All other information about the job are contained in other tables. On this table we can also apply several kind of common database methods in order to decrease overhead due to storage issues. These methods can include automatic deletion of old entries in the database as well as regular data backups. We are also using indices in order to decrease data access. Figure 3.9 shows the table Execution, and table 3.10 describes each eld. As we are mainly using integer keys in order to access to information, this table is linked to others containing the corresponding value to an integer key and thus the corresponding information. 3.4 Advantages of Relational Database Design We have chosen a relational database model, because it can be implemented in a relational database engine, which allows applying the SQL Query Language. This is motivated by the following main reasons: The use of a general purpose database on a central site is the best choice for powerful data analysis and extractions, and it allows to store a very high number of jobs. The database takes a central place within the architecture of our system and is not limited to backup purposes, but rather used for triggering job executions. In the case of recovery on server failure, a standard database engine contains highly sophisticated complex backup and recovery mechanisms. Data safety issues will be handled by the data base engine and do not need to be implemented as further modules. Robustness and eciency are important benets of a database engine: although, we have to calculate some necessary work on writing SQL queries, a relational database engine shows high performance even under high workload. Moreover, a 42

3.4 Advantages of Relational Database Design relational database is far from being a bottleneck for the system, as it can handle eciently more than thousand queries at the same time. 43

3 Batch Scheduling System Figure 3.1: Central Architecture 44

3.4 Advantages of Relational Database Design Figure 3.2: Interaction between modules Figure 3.3: Interaction in heterogeneous environment 45

3 Batch Scheduling System Figure 3.4: Job Denition 46

3.4 Advantages of Relational Database Design Field idbatch Application_name batch_name user criticality script description std_input_le std_output_le std_error_le owner le_sent recipient run_time max_run_time Comment numeric identier of the batch application which the batch belongs to name of the batch name of the user running the job priority of the batch name of the script that is going to be executed description of the function of the batch name of the input le name of the output le name of the error le physical person responsible for the batch name of the le that is sent (if dened) place where the le is sent processing time maximal processing time Figure 3.5: Table Batch Field idmachine Machine_idMachine machine_name drp operating_system virtual load_balance Comment primary id of machine id of virtual machine (if entry is physical machine) name of machine in the network ag to set if it is a backup machine (disaster recovery plan) operating system ag to set if machine is virtual load balance factor Figure 3.6: Table Machine 47

3 Batch Scheduling System Figure 3.7: Group Denition 48

3.4 Advantages of Relational Database Design Figure 3.8: Schedule Denition 49

3 Batch Scheduling System Figure 3.9: Table Execution Field Schedule_idSchedule Schedule_Group_idGroup Schedule_Batch_idBatch Status_idStatus Machine_idmachine execution_type begin end user modications Comment id of schedule id of group (set if item is a schedules group) id of batch (set if item is a scheduled batch) id of current status id of machine type of execution (historical, present) start of execution end of execution (if terminated) user running the script (operating system) manual intervention Figure 3.10: Fields of Table Execution 50

4 Scheduling Algorithms In this chapter, we will rstly present some basic notions about job scheduling parameters and problems as well as their algorithms in general. We will not give an exhaustive overview of all scheduling parameters/problems. More details and a general survey can be found among others in the introductive chapters of the book [LKA04]. Furthermore, we will present the problem at hand, and choose some parameters in order to model our situation. We will have to adapt our real world situation into a formal model in order to develop an algorithm. This formal model is a simplied version of the real world situation, because some parameters had to be omitted necessarily. We will develop several algorithms for all problem situations: global scheduling from the point of view of the central site, meta-scheduling and scheduling jobs on local sites. 4.1 Basic notions on Scheduling A note on scheduling vocabulary In this section we will distinguish between the term The term batch stands batch and job, although we did not in the previous sections. for a set of scheduled jobs that form a batch and are processed simultaneously. This separation of terms will ease the use of the vocabulary, as the research community often uses the term batch in this sense whereas in our general job model the two terms can be used interchangeably. In order to follow the vocabulary of the research community, we have decided not to use the term batch in its previous sense. Note: The term schedule is a concept in our job model that is related to time conditions of a job. However, the research community uses it to express an arrangement of jobs with starting times on machines. This term will be used in this sense in our context. Each scheduling problem and its algorithm can be modeled by a scheduling model. A scheduling model is dened by the machine environment, the characteristics (or constraints) of the jobs J = {j 1,..., j n } to be scheduled and the optimality criterion which will be considered by the scheduling algorithm. A schedule S for a set of jobs J is an arrangement specifying which processing time units and which machines are allocated to each job. 51

4 Scheduling Algorithms Job A job j is the entity that has to be scheduled. In our context, we deal with atomic jobs that cannot be broken down into a series of operations. When all jobs contain only a single operation, we speak of a mono-operation problem. By contrast, we speak of a multi-operation problem. We will thus only regard mono-operation jobs. A solution of a scheduling problem must always satisfy a certain number of constraints or parameters. From the point of view of scheduling theory, which is a simplied model of a real environment, a job j can have the following characteristics: p i : Processing time of job j i. The processing time p i of a job describes the absolute time duration that a job requires for being completed. w i : Weight of job j i. The weight w i of a job describes its priority. It is possible that not all jobs are equal in terms of importance. By dening weights for jobs it is possible to give importance to some jobs; this can be modeled by assigning a weight w i > 0 to each job. d i : Due date or Deadline of job j i. The due date d j of a job is the date at which the job should be completed. In the research community the term due date is usually used to dene cost objective functions, whereas deadlines restrict the availability of jobs. In our context, a due date corresponds to the date when the job has to be nished. The case that it has been completed before its due date does not modify our cost function. Therefore, the two terms can be used interchangeably by us. r i : Release date r i of job j i. In a scheduling environment with release date constraints a job can be associated with a release date r i, that means that job j i is only available for processing from time r i on. Ci S: Completion time of job j i in S. j i can be denoted by Ci S. Given a schedule S, the completion time of a job L i : Lateness of job j i. Given a schedule S, where each job has a due date, we can dene the lateness L i = Ci S d i of a job. We are dealing with a non-preemptive scheduling model, when each job requires to be processed for a period of time without being interrupted. If interruption is allowed, we 52

4.1 Basic notions on Scheduling have a preemptive model. Note: All time parameters are time durations, that can be counted from a time t = 0 when the scheduling process starts, we assume implicitly natural numbers for all characteristics. Example A job j i has the following characteristics: w i = 3: a weight of three p i = 4: a processing time of four d i = 5: a due date of ve r i = 1: a release date of one (that means the job has to be started immediately in order to be on-time We assume, that the scheduler starts the job as soon as it has been released, then the following holds C S i = 5 L i = 0 Machine Environment The number m describes the number of machines on which jobs can be scheduled. A lot of research work has been done on scheduling jobs on multiple machines. This approaches become more and more interesting on grid environments that have been growing in the last years. One issue in this context is to distribute resources equally to user jobs. Nevertheless, we will not take into account the scheduling problem between several sites, like jobs on grids, as the meta scheduler 3.1.2 already distributes jobs on physical machines using a load balance factor. In addition to that, we are not regarding a grid environment with trust issues either. Some interesting work on scheduling jobs on grids has been done in [Eyr06], [DEMT04] and [CCG + 05]. For completion purpose, we will present all possible machine environments: Single machine environment In the one machine environment there is only one machine that can process at most one job at a time. Parallel machines environment In parallel machines environments there are m machines. A job j i with required processing time p i can be processed on any of the machines. 53

4 Scheduling Algorithms According to the characteristics of the machines, further restrictions can be made. In the identical parallel machines environment all machines are identical and a job j requires equal processing time on any machine. In the uniformly related machines environment each machine i has a speed and therefore a job processed on one machine takes p j /s i time. In the unrelated parallel machines environment machines have dicult capacities and thus their relative processing time on a job is not related. Shop environment A shop environment includes n machines that model various sorts of production environments. A job j is split up into m operations, where each operation requires a processing time on a specic machine. In the open shop environment the operations of a job can be processed in any order as long as no two operations of a job are processed at the same time on dierent machines. In the job shop environment there is a total order dened on the operations of a job. An operation can not be processed before its predecessor has not been processed. A ow shop environment is a special case of a job shop environment where the order of operations is equal for each job: they all have the same processing routing. The dierence lies in the amount of required processing time on dened machines. In the scheduling notation, the identical, uniformly related and unrelated machine environments are denoted as P, Q and R. The open, ow and job shop environments are denoted by O, F and J. If the number of machines is known in the environment, this number is included in the specication. Example: P 4 means a parallel environment with four identical machines. Batching Machines When talking about batching machines, we are dealing with a one machine environment that can handle several jobs at the same time with the help of a batching machine. A batching machine (or batch processing machine) is a machine that can process b jobs at the same time. The jobs that are processed together form a batch. Corresponding to the batch processing, there are two kinds of batching machines that can be distinguished: a serial batching machine and a parallel batching machine. A serial batching machine can only process one job in a batch at the same time. The processing time of a batch in a serial batching machine is equal to the sum of processing times of jobs belonging to the batch. A parallel batching machine of size b can process up to b batches simultaneously. The processing time of a batch in a parallel batching machine is the maximum of the processing times of all jobs. In both cases, the completion time of a job is equal to the completion time of the batch. 54

4.1 Basic notions on Scheduling Optimality Criteria The goal of scheduling jobs is to have a schedule that can be viewed as optimal or as good. In order to evaluate schedules, we can use a certain number of criteria. There are a variety of so called objective functions that can be optimized. The notion of optimal is dierent from application to application and depends strongly on the underlying objective function. It is obvious that for our system of nancial applications respecting priorities and deadlines will be important. We took denitions common in scheduling literature. For more information about objective functions the reader is referred to [TB06]. It is important to note, that here we are at the frontier between the notions of criteria and constraints. If a constraint represents a fact, which denitely must be respected, optimizing a criterion allows rather a certain degree of freedom. The goal will be to nd a compromise between constraints and freedom degree. According to the criteria, that one wants to favor and according to the machine environment (which can restrict certain choices), several objective functions are possible. We will rstly present some of the basic objective functions, and afterward in section 4.2 we will explain, which ones are interesting for us to optimize with regard to other modeling parameters we have chosen. Using the completion time Ci S of job j i in schedule S, which can be denoted as C i if the context is clear, several objective functions can be derived. Average completion time The average completion time is a basic optimality criterion. It can be denoted as 1 n n i=1 CS i. Note, that minimizing the average time is equal to minimizing n i=1 CS i. Weighted Average completion time The weighted average completion time is the weighted version of the average completion time taking into account weights (priorities) of jobs. It can be denoted as 1 n n i=1 w ici S. It is also possible to denote average completion times as C S i and w i C S i respectively, if the context is clear. Makespan The makespan is one of most basic and oldest optimality criteria. It is dened as the maximal completion time of any job in a schedule S. Therefore it can be written as Cmax S = max i=1,...,n Ci S. The goal of a good schedule is to minimize the makespan. 55

4 Scheduling Algorithms Lateness If due dates d i are associated to each job j i and its lateness is dened as L i = Ci S d i, two dierent optimality criteria arise: Maximum Lateness over all jobs Let L i be the lateness of a job j i, then L max = max i=1,...,n L i denotes the maximum lateness of any job in the schedule. It is interesting to minimize the maximum lateness in order to obtain a good schedule. (Weighted) Tardiness The (weighted) tardiness is dened as n i=1 T i or n i=1 w it i respectively. T j is dened as max {Li, 0}. This expresses the sum of overall lateness of jobs. The weighted version will take into account priorities. (Weighted) number of on-time jobs Let U i be 1 if C i d i (the job is completed before its due date) and U i = 0 otherwise, then an objective will be to maximize n i=1 U i or n i=1 w iu i respectively. This means that the number of jobs being completed before their due dates should be maximized. This problem is also referred to as the (weighted) throughput maximization problem. The inverse versions n i=1 (1 U i) and n i=1 w i(1 U i ) respectively are also possible. According to the choice between the original (sum of early jobs) and the inverse version (sum of late jobs), the objective function is either to be maximized or minimized. Notation of scheduling problems The notation most widely used in the literature was introduced by [GLLK79]. notation is divided into three elds: α β γ This Field α refers to the machine typology presented above and describes the structure of the problem. It breaks down into two elds: α = α 1 α 2. The values of α 1 and α 2 refer to the machines environment of the problem and possibly to the number of available machines. Field β contains the explicit constraints of the problem. Field γ contains the criterion/criteria to be optimized. If properties or parameters are not set in this notations, this means they are not further specied and can be taken as arbitrary for the scheduling problem. Concerning a more complete presentation of the dierent possible criteria, the interested reader may refer to [AHG76]. 56

4.1 Basic notions on Scheduling Scheduling Algorithms Two kinds of situations can be distinguished for scheduling algorithms: on-line scheduling and o-line scheduling. In the case of o-line scheduling the number of jobs is xed at the moment of the scheduling decisions. In on-line scheduling the number of jobs to be scheduled is not known, as jobs arrive regularly and have to be treated on-line. An on-line situation signies the fact, that the whole set of information concerning the jobs is not known at a time t = 0, which is the case in the o-line situation. On-line Scheduling Many scheduling problems fall under the rubric of on-line scheduling. That means, that the scheduler receives jobs that arrive over time and has to schedule the job without knowledge of the future and the processing time of a job. This also means, that the scheduler has to decide about starting a job as it arrives. This lack of knowledge often prevents an optimal schedule. Thus, research has been focused on nding scheduling algorithms that guarantee schedules that are not too far from being optimal. The reader is referred to [PTS04] or [Sga96] for an exhaustive survey on on-line scheduling. One issue in on-line scheduling is on-line admission control, this implies the decision whether a job should be run at its arrival or not. O-line Scheduling In the case of o-line scheduling the number of jobs to be processed and all their characteristics like processing time etc. are known from time t=0. Many scheduling problems have been studied intensively in the literature for this context. [Bru01], [TB06] and [LKA04] among others give an overview about all kind of problems, their algorithms as well as their complexity. Semi On-line Scheduling The situation we are confronted with both aspects: on-line scheduling as well as o-line scheduling. The fact that jobs arrive regularly and that the scheduler has no knowledge of arriving jobs induces an on-line situation. At the same time when a job has been presented to a local site the current number of jobs in the queue and their processing times is known. When deciding which job to run or how to reschedule the queue, the scheduler knows all these parameters. We have both situations depending on the context, therefore, we will call our situation semi on-line scheduling. 57

4 Scheduling Algorithms 4.2 Description of our Scheduling Problem In this section, we will justify the choices, we have made for modeling our scheduling problem. It is important to note that our real world problem needs some relaxation as well as simplication in order to be solved by an algorithm, and much of the resulting performance with regard of our self-dened optimality criteria will heavily depend of the appropriateness of our model. 4.2.1 Simplications with regard to the real environment When we started to design a batch scheduling system, we constructed the specications based on a real problem within scheduling nancial applications. In order to formalize this concrete problem into a scheduling model, the following simplications were made: 1. We decided not to take into account the number of submitted jobs of a user. This means, that each user (which runs the job) can submit as many jobs as he wants, the aspect of a job quota for each user will play no role in our scheduling decisions. We chose this approach because weights and due dates are more important for our system. 2. The user denes how many jobs are allowed to be processed on a local site in parallel. The user sets the size of a machine, thus he has to estimate the resources its jobs might need. 3. The estimated processing time p i for each job j i has to be submitted manually, when dening a job. This processing time may not be the real execution time that has to be allocated to a job j i. In other words, it is up to the user to estimate his scheduled jobs and to allocate a processing time. Nevertheless, we assume that the estimation about the processing time of a job is exact when taking it into account for the scheduling decision. 4. The user has to dene as well a maximal processing time p imax. If the job has not been terminated after the maximal processing time, its status will be set on a value dened by the user otherwise the default value is failure. This maximal processing time will serve as the due date d i for the scheduling algorithm, because after passing the due date without being terminated the job is in failure. The last points strongly show the gap between classical job scheduling models and a real application. In a real application events due to physical decits or due to other decits can not be foreseen and therefore, it is dicult to make exact scheduling as proposed by a theoretical model. We have decided to make the entities we are scheduling to a strong degree user dened jobs. This means, that the user is strongly inuencing the behavior of the scheduler by setting weights and estimated processing and maximal run times 58

4.2 Description of our Scheduling Problem for jobs. On the other side, this user estimation might not reect the real processing times of jobs, if it has been estimated wrongly (too short or too long) or if the job cannot be executed due to machine failure or another technical failure. In both cases, the constructed schedules will not be optimal. Nevertheless, we chose this approach, because we think the user has the best knowledge of resource allocation for his jobs, although unattended events (technical failure) can not be foreseen by him. 4.2.2 Meta Scheduling A lot of work has also been done on batch scheduling on multiple processors or multiple machines (cluster). We decided to introduce a meta scheduling module (see section 3.1.2), distributing jobs on physical machines. As soon as a job is ready to be running, the meta scheduler decides according to a load balancing factor to which local site to sent the job. We have the situation of on-line scheduling, when the meta scheduler submits regularly jobs to the local sites and they arrive regularly there. We have chosen this simplied scheduling rule by using a meta scheduler, when several physical machines are dened for a job. Another approach could queuing the job on each physical machine. Then, the machine that executes the job informs all others (via the central scheduler). A similar approach has been considered in [PRS05]. At the same time, it has been showed that this solution only slightly improves performance. With regard to network overhead that will be caused by adding the job on each physical machine behind a cluster and by interchanging the status of the job, our choice of a simple meta scheduler is justied. 4.2.3 Machine environment Although the meta scheduler distributes jobs equally to dierent sites, m jobs can be treated at the same time at a local site. Therefore, we have a multi machine environment on each local site. This context could be modeled in two ways: 1. A parallel batching machine of size m 2. m parallel identical machines 3. A single machine The rst approach has the advantage, that m jobs are started as a schedule at the same time and nished at the same time. The execution time of a schedule is equal to the execution time of the slowest job. In other words, the slots are set and freed simultaneously. On the other site, this approach does not allow much exibility, when distributing jobs on free slots as one always has to wait for the end of the whole schedule. 59

4 Scheduling Algorithms In case of a single machine, only one job can be treated at the same time. This constitutes a big limitation in terms of machine capacity, as the current machines are much more powerful and this choice would be unwise in terms of eectiveness. Therefore, we have chosen to model our situation as m parallel identical machines. This was realized by dening run_queues of size m signifying the number of jobs executed in parallel. On the other side, this choice adds much to the complexity of our problem because m machine scheduling problems especially in an on-line situation are often NP-hard. We have non-preemptive jobs, which can not be interrupted and restarted while being executed on a machine. This has two reasons: 1. Preemption is not a desired feature for user dened jobs. The user wants them to be executed uninterruptedly, as they should be executed in the background as batch jobs. Our scheduling module is not in charge to decide about preemption with regard to our jobs. Nevertheless, preemption at a level of operating system processing on entities of our jobs is allowed and can even not be avoided. 2. Preemption is a redundant factor [BK06], which complicates our job scheduling model. Although preemptive scheduling may also reduce the throughput time in literature, in practice it is rather unusual for a machine to operate a job immediately after interrupting another without adding undesired set up times. 4.2.4 Job properties Based on our application environment, we will now specify, which objective functions we want to optimize. We are interested in processing jobs with a high criticality, because, as mentioned above, their failure will cause high losses. Thus, we will dene weights w i for each job j i. In addition to that, we will be interested in treating jobs on-time, as nancial applications used by traders are time critical systems requiring on-time processing in order to avoid loss. A deadline or due date d i is the latest time when a job should be nished in order to be on-time. This value is equal to the maximal processing time when compared to t = 0. Our jobs have release dates equal to zero, as they are released at the same time from the point of view of the scheduling algorithm. Summarizing, a job j i is only characterized by: p i p imax w i d i = p imax 60

4.2 Description of our Scheduling Problem 4.2.5 Optimality Criteria The makespan and the average completion time are not sucient for our purposes, as they do not take into account priorities and neither complex job structures. In the case of parallel batch processing, the makespan favors jobs with a short processing time without regarding priorities. The same can be applied to average completion time. The makespan is an interesting criterion from the viewpoint of a single user and does not handle priorities. Furthermore, completion metrics like makespan and average completion time do not make much sense in an on-line situation, because the use of these criteria assumes a time zero in time against which all completion times are compared. Lateness is an interesting criterion candidate: Nevertheless, the maximum lateness over all jobs will not take into account priorities either. (Weighted) tardiness has the disadvantage, that it tolerates a certain gap between a deadline of a job and its real nishing time. This approach weakens deadlines in the sense that it allows job to be nished after the due date. Therefore, the weighted number of on-time job is the most interesting criteria for us, as it favors jobs that can be nished exactly before their deadline with regard to late jobs that can be scheduled arbitrary without contributing to our prot function. We are interested to maximize weighted throughput, which can be seen as overall prot gain as the success of a job signies a single prot gain. The weight w i of a job j i corresponds in our model to a prot. More precisely we are interested in maximizing: n i=1 w iu i where U i is 1, if the job nishes on-time and 0 otherwise. 4.2.6 Formulation of our scheduling problem The optimality criterion, which we are interested in, is the weighted number of on-line jobs n i=1 w iu i. We have introduced a notation introduced in section 4.1, where the problem can be stated in three elds α β γ. The rst eld α contains the machine environment, the second eld β contains explicit job constraints and γ is the optimality criterion. Following this notation, our problem can be stated as: P m d i n w i U i i=1 The number m signies the number of slots on the machine and mentioning d i means the job have distinct and non equal due dates. We could leave out d i, thus we could state the problem as n P m w i U i i=1 61

4 Scheduling Algorithms However, we keep d i in the problem notation in order to emphasize that we are dealing with non equal deadlines. 4.2.7 Complexity Scheduling problems are optimization problems. When addressing a scheduling problem, it is always necessary to estimate its complexity, since this determines the nature of the algorithm to implement. If the problem under consideration belongs to the class P, an exact polynomial algorithm exists to deliver optimal schedules. By contrast, if the problem is NP-hard, two alternatives are possible: 1. Propose an approximated algorithm, which calculates in polynomial time a solution, which is as close as possible to the optimal solution. 2. Propose an algorithm which calculates the optimal solution of the problem and whose maximal complexity is exponential. Here, the challenge is to design an algorithm which can solve problems of the largest possible size. To calculate the complexity of scheduling problems, a certain amount of traditional results exist in literature. They show the links between dierent problems with a single criterion scheduling problems. Trees of polynomial Turing reductions (see for example [Pin95]), where the vertices characterize the problems: the presence of an arc between vertices A and B exists if and only if a polynomial Turing Reduction from the problem A toward problem B exists. Such trees exist for types of problems, types of constraints and criteria. With the help of these trees, it is clear that our problem is strongly NP-hard or NP-hard in the strong sense, that means it is NP-hard, even if the input is polynomially bounded. It is interesting to mention, that the problem of maximizing the weighted number of on-time jobs is already NP-hard for a single machine. With regard to the complexity of our problem, we will be interested in nding an approximation solution that will be presented later. It is also important to note, that even an approximate algorithm can have dierent results in practice, as processing times can vary in reality. Therefore, we will evaluate our algorithm not only with regard to its theoretical performance but also with regard to its performance in practice. 4.2.8 Related Work and Variants of the problem To the best of our knowledge, few works have been done yet in scheduling jobs with different weights, processing times and deadlines in order to maximize the weighted number 62

4.2 Description of our Scheduling Problem of on-time jobs. On the other side, the problem of scheduling jobs with equal processing times on one machine as well as on parallel machines has been studied intensively in the literature. Although these variants are not well adapted to our problem situation, we will present them here. In [BK06] it has been shown, that the problem for equal processing time jobs can be solved exactly in O(n log n). However, an equal processing time model does not correspond to our needs. In [Bap00] it has been derived, that in a serial or parallel batching machine jobs with equal processing times and dierent release dates can be solved polynomially by dynamic programming. Again, neither a batching machine nor equal processing times are appropriate properties for modeling our situation. In [DZ06] a purely on-line situation is handled. In the model, jobs have identical processing times and are associated to a release date and a deadline, neither of which is known until the jobs arrive. The paper deals with on-line admission control in order to decide whether a job should be run at its arrival or not. An approximation algorithm is proposed. This model does not correspond either to our situation, as we do not deal with equal processing time jobs. Furthermore, we do not have a purely on-line situation, because our situation is a mix of both. 4.2.9 Summary of concrete problems We are going to present each aspect of our scheduling situation for which an algorithm has to be formulated. The job scheduling problem on each local site, where the meta scheduler distributes the jobs, can be abstracted in the following way, which is illustrated in gure 4.1. 1. We are given a set of jobs J = {j 1,..., j n }, where each job j i has an (estimated) processing time p i, a due date d i and a weight w i signifying its priority. The set of jobs increases each time the meta scheduler submits a new job to a local site. 2. We have several running slots named run_queue with the size m. That means, that m jobs can be executed at the same time. In other words: we have a multi-machine environment with size m. There are n jobs that can be scheduled on M 1,..., M m machines. 3. A serial job queue named wait_queue contains a bounded number n of jobs (the set of jobs J) that wait to be processed. Each time a job nishes, a slot will be empty, where the next job j can be processed. The order of jobs in wait_queue corresponds to execution order, as rst jobs will be processed as soon as slots are empty. The wait_queue might be reordered, when a new job j n+1 arrives. 63

4 Scheduling Algorithms 4. A schedule S is a set of triples (j, m, s), where such a triple schedules job j i on machine (slot) M m starting at time s and thus ending at time s + p i. A schedule S is valid if for each m = {1,..., m} the time intervals assigned for jobs are disjoint. In section 4.3, we will give a more formal denition of valid. Figure 4.1: Problem Illustration Let m be number of available slots on a machine and r be the number of free slots. The following two problems have to be solved: 1. If r > 0 and r < m empty slots are available on the run_queue: Which one of the jobs in J is going to run? How to reorder the waiting queue of jobs? How could one take into account priorities (weights w i and due dates d i )? 2. If r = m (empty queue): Which ones of the n jobs are going to run? How could an initial valid schedule be created that prioritizes jobs with a higher weight and maximizing throughput? Mark: The scheduling problem for an empty queue does not correspond to the one with some empty slots, because we have to nd a whole new schedule and the objective function is reset. In this case we are confronted with an o-line situation, as the number of jobs as well as their characteristics are known at time t = 0. 4.3 Developed Scheduling Algorithms This section contains the scheduling algorithms that we have developed for dierent scheduling stages: global scheduling from the global point of view of the central architecture, meta-scheduling from the point of view of one job denition and local scheduling on each local site. 64

4.3 Developed Scheduling Algorithms 4.3.1 Global Scheduling The scheduling algorithm from the point of view of the central site can be described in algorithm 1. It contains the whole life cycle of a job. Algorithm 1 Global Scheduling Algorithm 1. If a job j i has reached its schedule date, its status will be changed into active and the meta scheduling module distributes the job j i with processing time p i, weight w i as well as due date d i to physical machines (local site) according to the meta scheduling algorithm (see below). 2. The local site schedules the jobs and reports their status back to the central site. 3. Status transitions take place as dened in gure 2.1. 4. Pre, Run and Post Commands (see section 2.2.7) are showed in the monitoring module according to the status of the job, 5. If a job j i has passed its maximal runtime p imax, it will be removed from the local site and its status will be changed on the central site. 4.3.2 Meta Scheduling When a job is ready to start, the meta scheduler determines, which of the specied machines is best suited to run the job based on available processing power. Let l p be the load balance factor for a physical machine p, as it is dened in our relational database design in section 3.3. Our meta scheduler algorithm is listed in algorithm 2. Algorithm 2 Meta Scheduling Algorithm 1. Determine the percentage of CPU available on each real physical machine in the specied virtual machine. This is accomplished by a procedure contained in the remote agent on a client machine. 2. Multiply it by the load balance factor value l p (default: 1). 3. Choose the machine with the largest result (that is the machine with the most relative processing power available). 4. Add the job to the batch queue of a physical machine letting a local scheduling module decide which job to run. 65

4 Scheduling Algorithms In addition to load balancing, this feature is a useful way to ensure reliable job processing. For example, if one of the machines is down, load balancing will run the job on another machine. 4.3.3 Local Scheduling based on latest start dates The scheduling problem at the local site can be formulated into P m d i n i=1 w iu i. As this problem is known to be strongly NP-hard, our goal is to have an approximation algorithm. We have developed an iterative algorithm, based on the calculation of the dierence between the due date d i and the processing time p i of a job j i. We will introduce this dierence as latest start date l i : l i = d i p i The latest start date l i signies the time at which a job has to be started in order to be processed before its deadline. In other words, it it the latest time when a job can be started for being on-time. We will order jobs non-decreasingly according to their latest start times l i. We developed an algorithm with a complexity of O(n log n). It prioritizes the execution of jobs with a high weight in order to maximize the weighted number of on-time jobs. Our algorithm is based on the principle, that jobs which can not be nished before their due date can be added to the schedule in an arbitrary order. This means, that jobs that have less chance to be nished before their due date - so called tardy job - do not contribute to the number of on-time jobs and therefore, can be executed at any time. Jobs with a high priority will be advantaged in this sense. Our solutions are approximation algorithms for both the o-line situation and the on-line situation. A schedule can be seen as a function that assigns to a job j i (with processing time p i ) a starting date s i and a machine slot m: σ S : J N M j i (s i, m) We only allow naturals for starting dates and M = {1,..., m} is the number of slots where jobs are executed. 66

4.3 Developed Scheduling Algorithms A schedule function σ S is valid if and only if: j 1, j 2 J, σ S (j 1 ) = (s 1, m 1 ), σ S (j 2, m 2 ) = (s 2, m 2 ), m 1 = m 2 [s 1, s 1 + p 1 ) [s 2, s 2 + p 2 ) = This means, that no two jobs are scheduled at the same time on the same machine slot. O-line Situation In order to develop an algorithm for an o-line situation, it is necessary to dene this case: No jobs are executed currently and all slots are empty. This situation occurs every time the job scheduling module of the central site sends all detected jobs at once to an empty local site. O-line Algorithm There are n jobs J = {j 1,..., j n } to be scheduled. Without loss of generality, let J be ordered according to non-decreasing latest start dates l i = d i p i, this means l 1... l n. We can further order the set of jobs J according to increasing processing times p i. This means, that if two jobs have the same latest start date, the job with the shorter processing time will be ordered rst. We introduce S + as the set of early (on-time) jobs, their starting times and the machines on which they will be processed. Therefore, S + = {(j i, s i, m) s i l i } The set of tardy jobs will be noted as S = {(j i, s i, m) s i > l i } We will keep track of empty slots with variables: E = (e 1,..., e m ) where e i signies the time when slot i is empty. The algorithm checks, if a job j i is early when assigned to the next empty slot and processes it when this is true. Otherwise, it tries to nd an already scheduled job with a lower weight and replaces it. Our algorithm searches for jobs with a shorter processing time in order not to reschedule all other jobs. If the job j i can not be scheduled on-time, it will be added to the set of tardy jobs. Those jobs will be scheduled in an arbitrary order at the end of the algorithm. Implicitly, we 67

4 Scheduling Algorithms Algorithm 3 Local Scheduling for an o-line setting S + = S = E = (0,..., 0) for i = 1,..., n do e l = min k=1,...,m e k next empty slot if e l l i then j i is early when assigned to next empty slot s i = e l ; set starting time of j i S + = S + {(j i, s i, l)} add j i to the set of early jobs e l = e l + p i update empty slot else wi = min{w l j l S +, s l l i, p i p l } nd an replaceable early job j i if w i > w i then if j i has a bigger weight than j i S + = S + \ {(j i, s i, l)} {(j i, s i, l)} replace j i by j i on machine l e l = e l p i + p i update empty slot S = S {j i } Add j i to the set of late jobs else S = S {j i } Leave S + unchanged and add j i to the set of late jobs end if end if end for Schedule late jobs in S in any order 68

4.3 Developed Scheduling Algorithms will start all jobs if the number n of waiting jobs is equal or smaller than the number of free slots m. Therefore, we suppose n > m. The algorithm is shown as algorithm 3. We obtain a schedule S = S + S with early jobs S + and with late jobs S. The result of the objective function will be: f(s) = f(s + ) = j i S + w i Obviously, the algorithm has the complexity O(n log n) when appropriate data structures are used and nds valid schedules. On-line Situation We have an on-line situation, when jobs are submitted from the central site to local sites while jobs are executed on machine slots of the local site. This situation is more complex as the previous one, as jobs are already running and to maximize an optimality criteria becomes more dicult because their is no common time t = 0 for all jobs. On the one side, current decisions about scheduling cannot be compared to previous ones, and on the other side, even current decisions cannot be done with regard to future jobs and are thus predictive. During a scheduling decision, no information about arriving jobs is present. In order to apply an algorithm for maximizing our optimality criterion, we will have to dene exactly an on-line situation: 1. Jobs are running in all machine slots. 2. Scheduled jobs are waiting to be processed. 3. An existing schedule has been constructed from an o-line setting. On-line Algorithm Our goal is to maximize the number of on-time jobs with regard to their priority, when new jobs have been submitted to a local queue. Although algorithm 3 has a pretty low complexity, we will not run it at each arrival of a new job. Our on-line algorithm has to decide to either re-schedule waiting jobs for processing an arriving job rst or to schedule it on the next empty slot no matter if it can be nished before its due date or not. The on-line algorithm will be triggered by the arrival of a new job and if rescheduling is necessary, the algorithm shown in algorithm 3 is used. Rescheduling is only done when an arriving job cannot be nished, before its execution time, although it has a higher priority than jobs already scheduled on the waiting queue. We will take into account not only the set of early jobs but also the set of tardy jobs, when checking if an arriving job has a higher priority. This is due to the following: We assume that a job, that has been scheduled (even if tardy) should be prioritizes with regard to an 69

4 Scheduling Algorithms arriving less critical job. This is based on the fact, that in a real scheduling environment the tardy job has a chance to be nished anyway, because real processing can be shorter than in theory. The on-line algorithm is explained in algorithm 4. Algorithm 4 Local Scheduling for an on-line setting S + = S = E = (0,..., 0) while true do waiting for an arriving job if new job j i arrives then Calculate set of scheduled jobs S + S on waiting queue Calculate next empty slots E e l = min k=1,...,m e k next empty slot if e l l i then j i is early when assigned to next empty slot s i = e l ; set starting time of j i S + = S + {(j i, s i, l)} add j i to the set of early jobs e l = e l + p i update empty slot else wi = min{w l j l S + S } nd smallest weight if w i > w i then if j i has a bigger weight Reschedule according to algorithm 3 taking E as slot allocation else S = S {j i } Leave S + unchanged and add j i to the set of late jobs end if end if end if end while 70

5 Experimental Analysis In order to evaluate the quality of our algorithms, we have decided to undertake an experimental analysis on a set of batches belonging to a nancial application. The evaluation on a real scheduling environment is a crucial factor for the applicability of our algorithms on a practical environment. By proving the viability of our approach, the bridge between the theoretical work and a real working environment has been achieved. Financial applications can be various in nature as well as in terms of functionality. The usual divisions of a nancial institution delivering a nancial service are: the front-oce, the middle-oce and the back-oce. One nancial applications can cover the range of functionalities of all these divisions. 5.1 Job Model In order to show the viability of our approach toward our new job model (see section 2.2, we will regard one application exemplarily in this context. This application is called TAUX (interest rates) and contains mainly back-oce functionalities. It is operating on the nancial market of xed income and derivatives as presented in section 1.2.1. The application uses jobs scheduled by all schedulers we have listed in section 2.1. We have succeeded in transforming all scheduled jobs of this application into our new job model. This allows using only one interface to access information about all jobs. All details concerning the transformation of the jobs can be found in appendix A.2. 5.2 Scheduling algorithms 5.2.1 Prototype engine We have developed a prototype engine for the meta scheduling as well as the local scheduling algorithm (o-line and on-line algorithms) presented in section 4.3. With the help of this prototype we will be able to test the quality of our algorithms on real test data. 71

5 Experimental Analysis Criticality - Weights In order to have a practical use of criticality, we will only use four dierent levels of criticality. Thus, we will use four dierent weights - the numbers one to four, where four means the most critical level - although our theoretical model allows arbitrary positive integers. Simulation Our algorithms will be tested by simulating a job amount on a virtual machine for a selected time window. Typical incidents like machine failure as well as processing times deviating from dened values do not occur in our simulation. As we are not able to operate on an actual virtual machine with real physical machines, their behavior will be simulated as well. Meta Scheduling as presented in section 3.1.2 needs the calculation of processing power for determining which physical machine is best suited to run a job. For simulation purpose, we need to modify the meta scheduling algorithm slightly, in order to distribute jobs equally. When deciding which local site to choose for scheduling a job, the meta scheduling algorithm in our prototype engine calculates the current number of jobs per slots on each physical machine, and takes the physical machine with the smallest value. The eect of this modied version of the algorithm 2 is the same: The goal of Meta Scheduling is to distribute the job load evenly on physical machines according to their processing power. In the simulation the number of slots is the only information available on dierences on machines concerning processing power. 5.2.2 Selected jobs We have selected the amount of jobs emerging on a regular Monday between 0 am and 0 pm (CET) on one virtual machine within our heterogeneous environment. This virtual machine is comprised by two physical machines (named cluster machine 1 and cluster machine 2 ) with dierent slot sizes, which are the numbers jobs that can be processed simultaneously. cluster machine 1: 10 slots cluster machine 2: 8 slots The selected jobs have been scheduled with the help of our prototype engine. The jobs of the selected set belong to several applications. Among those, we nd the application TAUX as presented in section 1.2.1 as well as in appendix A.2. 381 job instances have been detected to be processed during the whole simulation time window of 24 hours. Their total sum of weights is 883 and there are 128 high-critical jobs (weight: 4) whose execution should be advantaged. 72

5.2 Scheduling algorithms 5.2.3 Results Table 5.3 show the results that have been achieved by applying both algorithms. Table 5.2 shows the same simulation without applying meta scheduling, this means by using only one physical machine with 18 slots. Table 5.1 shows the result when no algorithm is applied, jobs are processed according to the order they arrive on the machines. physical machine cluster machine 1 slots 18 jobs 381 early jobs 291 late high-critical jobs 49 weighted number of early jobs 710 weighted number of all jobs 883 Table 5.1: Results without algorithms physical machine cluster machine 1 slots 18 jobs 381 early jobs 303 late high-critical jobs 27 weighted number of early jobs 786 weighted number of all jobs 883 Table 5.2: Results without meta-scheduling physical machine cluster machine 1 cluster machine 2 sum slots 10 8 18 jobs 237 144 381 early jobs 237 101 338 late high-critical jobs 8 0 8 weighted number of early jobs 561 284 845 weighted number of all jobs 561 322 883 Table 5.3: Results using both algorithms Interpretation of Results When queuing and executing jobs solely based on to their arriving on a machine with capacity of 18 slots, the results are rather poor (table 5.1). There are 49 high-critical jobs that are lost. By applying only local scheduling on a machine with the same capacity 73

5 Experimental Analysis as in the previous example, the performance is better. The number of high-critical jobs that are lost is 27. When the machine capacity is split up into two machines containing 10 resp. 8 slots and when both meta scheduling and local scheduling are applied the following improvements can be remarked: The jobs are equally distributed on local machines according to their capacities. The number of lost jobs has decreased to 43, whereas in the previous test results it was 90 resp. 78. The number of high-critical jobs that were lost has decreased to 8 (other test results: 49 resp. 27) The weighted number of early jobs has decreased to 845, where the total weighted number of all jobs is 883 (previous test results: 710 out of 883 resp. 786 out of 883) We remark that a loss of 8 high-critical jobs is a tolerable number in the real scheduling environment of a nancial application and gives a prospective hint for further deploying our developed system. 74

6 Conclusion and Future Work In this chapter we summarize the results of our work and give hints for future work and for further extension of our approach on a real scheduling environment. 6.1 Conclusion The goal of our work was to develop a framework for scheduling jobs with priorities. A heterogeneous platform of nancial applications was a motivating example in our context. Some of these jobs can be high-critical and therefore need to be processed rst. Furthermore, these jobs can be executed on several sites. We have proposed a general job model that allows dening jobs with complex structures and a complicated set of execution dependencies. At the same time, our job model is simplied and allows having a unied model for all kind of batches; we are no more using a heterogeneous set of commercial batch schedulers each having its own characteristics. The concepts we have dened for our job model can easily be adapted into a relational database design. This relational database is a part of our four module centralized architecture that detects when jobs are scheduled to run and distributes them to local sites. The centralized solution allows to monitor the jobs easily and to check their executions. After formalizing the scheduling situations, we have developed a two stage scheduling: A meta scheduling engine distributes jobs evenly to local sites according to their capacities and a local scheduling engine deals with scheduling decisions. The objective function we are trying to optimize is the weighted number of early jobs. This function is adapted to the requirements of nancial applications (respect of deadlines and prioritization of high critical jobs). Our local scheduling algorithm solves a strongly NP-hard problem approximately in polynomial time. Although an in-house developed batch scheduling solution needs more resources and costs for developing and maintenance compared with the use of a commercial product, it allows on the other side a higher degree of freedom and stability. The scheduling algorithm we are using is known, which is often not the case when using commercial software. Another 75

6 Conclusion and Future Work advantage that arises when transforming jobs of an existing batch environment is that it forces users to re-dene their jobs for an existing application. During this process jobs that are no longer useful can be detected. 6.2 Future Work We have tested our job scheduling system on a small subset of jobs (see section 5). However, in order to be deployed on a real system several checks have to be made and all jobs (15000) have to be transformed into our new general job scheduling model. The process of setting a new batch scheduling system in a working production environment for nancial applications is not obvious. This work can be seen as a start of a study and design of a new job scheduling system integrating our new general job model as well as our scheduling model with maximizing the number of on-time jobs. Once we will be able to release a sophisticated version of our job scheduling framework we will see that the theoretical algorithm that we have developed can be adapted to a real environment delivering better scheduling results that pure heuristic approaches. The further development of our architecture, the nal completion of our prototype engine into a stable release and the deployment of our system are the next steps to be taken. On the other side, there are possible extension and modications imaginable concerning the core of the scheduling engine: Optimality Criterion Currently we are maximizing the weighted number of on-time jobs and thus we are dealing with a single criterion problem. It could be imaginable to add one or several further optimality criteria to our situation and extend it to a multi-criteria scheduling decision. Meta Scheduling Concerning the concept of meta scheduling with the help of virtual machine, it could be interesting to investigate if it is possible to replace the user dened load balance factor by a system variable taking into account the capacity of the underlying machine. Processing Times At the moment the estimated processing time as well as the maximal run time of a job are user dened. It could be imaginable to investigate the possibility to set up a mechanism to adapt these values after a certain amount of test runs: an intelligent neuronal network could be able to learn how the processing times evolve a stochastic algorithm could calculate the expected value of processing times The last two items aim to withdraw the possibility from the user to inuence scheduling decisions. Instead, a more automated solution is targeted. It has to be evaluated which 76

6.2 Future Work system is preferred. The parameterization of those possible algorithms for calculating processing times is also still open. 77

6 Conclusion and Future Work 78

Bibliography [AHG76] Rinnooy Kan AHG. Machine scheduling problems: classication, complexity and computations. The Hague: Nijho, 1976. [Ass06] Computer Associates. Unicenter autosys TM job management. http://www3.ca.com/solutions/product.aspx?id=253, 2006. [Bak74] K. R. Baker. Introduction to Sequencing and Scheduling. JW, 1974. [Bap00] P. Baptiste. Batching identical jobs. Mathematical Methods of Operations Research, 53:355367, 2000. [BFY95] M. Baker, G. Fox, and H. Yau. Cluster computing review. 1995. [BGH + 98] [BK06] P. Brucker, A. Gladky, H. Hoogeveen, M. Kovalyov, C. Potts, T. Tautenhahn, and S. Velde. Scheduling a batching machine. Journal of Scheduling, 1:3154, 1998. Peter Brucker and Svetlana Kravchenko. Scheduling equal processing time jobs to minimize the weighted number of late jobs. Journal of Mathematical Modelling and Algorithms, 5(2):143165, June 2006. [BNGK + 02] A. Bar-Noy, S. Guha, Y. Katz, J. (Se) Naor, B. Schieber, and H. Shachnai. Throughput maximization of real-time scheduling with batching. In SODA '02: Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, pages 742751, Philadelphia, PA, USA, 2002. Society for Industrial and Applied Mathematics. [Bru01] Peter Brucker. Scheduling Algorithms. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2001. [CCG + 05] [CDZ04] [CGM05] N. Capit, G. Costa, Y. Georgiou, G. Huard, C. Martin, G. Mounié, P. Neyron, and O. Richard. A batch scheduler with high level components. ccgrid, 2:776783, 2005. B. Chen, X. Deng, and W. Zang. On-line scheduling a batch processing system to minimize total weighted job completion time. J. Comb. Optim., 8(1):8595, 2004. J. Cuenca, D. Gimnèz, and J. Martínez. Heuristics for work distribution of a homogeneous parallel dynamic programming scheme on heterogeneous systems. Parallel Comput., 31(7):711735, 2005. 79

Bibliography [CP06] Siu-Wing Cheng and Chung Keung Poon, editors. Algorithmic Aspects in Information and Management, Second International Conference, AAIM 2006, Hong Kong, China, June 20-22, 2006, Proceedings, volume 4041 of Lecture Notes in Computer Science. Springer, 2006. [DEMT04] [DZ06] [Eyr06] [FFRS05] [GLLK79] [HSSW96] [JCD06] [KNK + 05] [LKA04] [MKBS05] P. Dutot, L. Eyraud, G. Mounié, and D. Trystram. Bi-criteria algorithm for scheduling jobs on cluster platforms. In SPAA '04: Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures, pages 125132, New York, NY, USA, 2004. ACM Press. Jihuan Ding and Guochuan Zhang. Online scheduling with hard deadlines on parallel machines. In Cheng and Poon [CP06], pages 3242. L. Eyraud. A pragmatic analysis of scheduling environments on new computing platforms. In Internationbal Journal of High Performance Computing and Applications, 2006. D. G. Feitelson, E. Frachtenberg, L. Rudolph, and U. Schwiegelshohn, editors. Job Scheduling Strategies for Parallel Processing, 11th International Workshop, JSSPP 2005, Cambridge, MA, USA, June 19, 2005, Revised Selected Papers, volume 3834 of Lecture Notes in Computer Science. Springer, 2005. R.L. Graham, E.L. Lawler, J.K. Lenstra, and A.H.G. Rinnooy Kan. Optimization and approximation in deterministic sequencing and scheduling: A survey. Ann. of Discrete Math., 5:287326, 1979. L. Hall, A. Schulz, D. Shmoys, and J. Wein. Scheduling to minimize average completion time: O-line and on-line algorithms. In SODA: ACM-SIAM Symposium on Discrete Algorithms (A Conference on Theoretical and Experimental Analysis of Discrete Algorithms), 1996. W. Jawor, M. Chrobak, and C. Dürr. Competitive analysis of scheduling algorithms for aggregated links. In Proc. of the 7th Latin American Symposium on Theoretical Informatics (LATIN'06), pages 617628, Valdivia, Chile, 2006. G. Khanna, N., T. Kurc, U. Catalyurek, P. Wycko, J. Saltz, and P. Sadayappan. A hypergraph partitioning based approach for scheduling of tasks with batch-shared i/o. In CCGRID '05: Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2, pages 792799, Washington, DC, USA, 2005. IEEE Computer Society. Joseph Leung, Laurie Kelly, and James H. Anderson. Handbook of Scheduling: Algorithms, Models, and Performance Analysis. CRC Press, Inc., Boca Raton, FL, USA, 2004. S. O. Memik, R. Kastner, E. Bozorgzadeh, and M. Sarrafzadeh. A scheduling algorithm for optimization and early planning in high-level synthesis. ACM Trans. Des. Autom. Electron. Syst., 10(1):3357, 2005. 80

Bibliography [ORS06] ORSYP. Dollar universe TM, enterprise workload automation. http://www.orsyp.com/software_dollar_universe, 2006. [Pin95] [PRS05] [PTS04] M. Pinedo. Scheduling - Theory, Algorithms, and Systems. Prentice Hall, Englewood Clis, 1995. T. Phan, K. Ranganathan, and R. Sion. Evolving toward the perfect schedule: Co-scheduling job assignments and data replication in wide-area systems using a genetic algorithm. In Feitelson et al. [FFRS05], pages 173193. K. Pruhs, E. Torng, and J. Sgall. Online scheduling. In Handbook of Scheduling: Algorithms, Models, and Performance Analysis. CRC Press, 2004. [RRM04] F. Ridouard, P. Richard, and P. Martineau. On-line minimization of makespan for single batching machine scheduling problems. 9th International Workshop on Project Management and Scheduling (PMS 2004), pages 287290, 2004. [Sga96] J. Sgall. On-line scheduling. In Online Algorithms, pages 196231, 1996. [SWW95] [TB06] D. B. Shmoys, J. Wein, and D. P. Williamson. Scheduling parallel machines on-line. SIAM J. Comput., 24:13131331, 1995. V. T'kindt and J.C Billaut. Multicriteria Scheduling - Theory, Models and Algorithms. Springer Verlag, Berlin, 2006. [Uni07] Unix TM. Crontab. http://en.wikipedia.org/wiki/cron, 2007. 81

Bibliography 82

List of Algorithms 1 Global Scheduling Algorithm......................... 65 2 Meta Scheduling Algorithm.......................... 65 3 Local Scheduling for an o-line setting.................... 68 4 Local Scheduling for an on-line setting.................... 70 83

List of Algorithms 84

List of Figures 2.1 States and State Transitions of a Job..................... 33 3.1 Central Architecture.............................. 44 3.2 Interaction between modules.......................... 45 3.3 Interaction in heterogeneous environment.................. 45 3.4 Job Denition.................................. 46 3.5 Table Batch................................... 47 3.6 Table Machine................................. 47 3.7 Group Denition................................ 48 3.8 Schedule Denition............................... 49 3.9 Table Execution................................. 50 3.10 Fields of Table Execution........................... 50 4.1 Problem Illustration.............................. 64 A.1 Complete Database Schema.......................... 89 85

List of Figures 86

A Appendix 87

A Appendix A.1 Relational Database Design In this appendix we will list all tables belonging to the relational database, which is contained in the job denition module of the central site. We have described the architecture of our system in 3.1. A.1.1 Complete Database Schema The complete database schema is shown in gure A.1. It contains the following 13 tables: Application Batch Batch_has_groupe Group Batch_has_machine Machine Execution Instructions Schedule Schedule_has_Rules Rules Status A.1.2 Application The table Application contains the names of applications using batches to implement their functionality. Applications serve as a functional segmentation of batches. An application has a trigram named application_name for identifying it and a full name called application_label. All columns can be found in table A.1. 88

A.1 Relational Database Design Figure A.1: Complete Database Schema 89

A Appendix Column Name Data Type Primary Key Comment application_name VARCHAR(3) PK trigram of application application_label VARCHAR(20) label of application Index Name Index Type Columns PRIMARY PRIMARY application_name Table A.1: Table Application A.1.3 Batch The table Batch contains the technical denition of a batch. The links to other tables precise other properties like group, schedule, condition and machine denitions. All columns can be found in table A.2. Column Name Data Type Primary Key Comment idbatch INTEGER(6) PK numeric identier of batch Application_name VARCHAR(3) corresponding application batch_name VARCHAR(30) name of batch user VARCHAR(20) name of user running the job criticality INT(3) priority in integers script VARCHAR(20) name of executing script description VARCHAR(255) functional description std_input_le VARCHAR(255) name and path of input le std_output_le VARCHAR(255) name and path of output le std_error_le VARCHAR(255) name and path of error le owner VARCHAR(20) physical person responsible for job le_sent VARCHAR(255) name and path of le that is sent (if dened) recipient VARCHAR(20) recipient of le run_time INTEGER(5) processing time max_run_time INTEGER(5) maximal processing time (due date) Index Name Index Type Columns PRIMARY PRIMARY idbatch job_fkindex2 Index Application_name unique_index Unique Index batch_name description Table A.2: Table Batch 90

A.1 Relational Database Design A.1.4 Batch_has_group This relation means, that one or several batches can form a group. Additionally, this means, that the same batch can be contained in more than one group. The column order determines the processing order of a batch in a group. All columns are listed in table A.3. This table serves to model a n:m-relation between the table Batch and group. Column Name Data Type Primary Key Comment Batch_idBatch INTEGER(6) PK Group_idGroup INTEGER(6) PK order INTEGER(1) order of batch in group Index Name Index Type Columns PRIMARY PRIMARY Batch_idBatch Group_idGroup batch_has_groupe_fkindex1 Index Batch_idBatch batch_has_groupe_fkindex2 Index Group_idGroup Table A.3: Table Batch_has_group A.1.5 Group The table Group contains the denition of a group of batches. This includes the functional description as well as the rst batch in the group. All columns can be found in table A.4. Column Name Data Type Primary Key Comment idgroup INTEGER(6) PK numeric identier of group group_name VARCHAR(30) name of group group_description VARCHAR(255) functional description of group batch_header VARCHAR(20) rst batch of group Index Name Index Type Columns PRIMARY PRIMARY idgroup unique_index Unique Index group_name group_description Table A.4: Table Group 91

A Appendix A.1.6 Batch_has_machine This table serves to model a n:m-relation between the table Batch and Machine. This relation means, that one or several batches can be executed on one machine. Additionally, this means, that the same batch can be executed on more than one machine. All columns can be found in table A.5. Column Name Data Type Primary Key Comment Machine_idmachine INTEGER(2) PK Batch_idBatch INTEGER(6) PK Index Name Index Type Columns PRIMARY PRIMARY Machine_idmachine Batch_idBatch job_has_machine_fkindex2 Index Machine_idmachine batch_def_has_machine_fkindex2 Index Batch_idBatch Table A.5: Table Batch_has_machine A.1.7 Machine The table Machine serves to characterize a physical machine with a load balance factor and to dene virtual machines. The concept of virtual machine with load balancing is used by the meta-scheduling module (section 3.1.2) when distributing jobs on machines. All columns can be found in table A.6. Column Name Data Type Primary Key Comment idmachine INTEGER(2) PK numerical identier of machine Machine_idMachine INTEGER(2) id of virtual machine (if entry is physical machine) machine_name VARCHAR(20) name of machine in network drp BOOL ag if machine is backup machine (disaster recovery plan) operating_system VARCHAR(20) name of operating system virtual BOOL ag if machine is virtual load_balance INTEGER(2) load balance factor Index Name Index Type Columns PRIMARY PRIMARY idmachine Machine_FKIndex1 Index Machine_idMachine Table A.6: Table Machine 92

A.1 Relational Database Design A.1.8 Execution The table Execution serves to keep a history as well as to monitor current executions. An instance of an execution can be a job as well as a group. The corresponding identier will be set. All columns can be found in table A.7. Column Name Data Type Primary Key Comment Schedule_idSchedule INTEGER(6) PK id of schedule Schedule_Group_idGroup INTEGER(6) PK id of group (set if item is a schedules group) Schedule_Batch_idBatch INTEGER(6) PK id of batch (set if item is a schedules batch) Status_idStatus INTEGER(2) id of status Machine_idmachine INTEGER(2) id of running machine execution_type CHAR type of entry begin DATETIME start time end DATETIME end time user VARCHAR(20) name of user modications VARCHAR(45) modications made during execution Index Name Index Type Columns PRIMARY PRIMARY Schedule_idSchedule Schedule_Group_idGroup Schedule_Batch_idBatch batch_historique_fkindex5 Index Status_idStatus unique_index Unique Index execution_type begin end batch_execution_fkindex3 Index Machine_idmachine batch_execution_fkindex4 Index Schedule_idSchedule Schedule_Batch_idBatch Schedule_Group_idGroup Table A.7: Table Execution A.1.9 Instructions The table Instructions serves to store and to show pre, post and run commands (see section 2.2.7), that are associated to a job and a status. All columns can be found in table A.8. 93

A Appendix Column Name Data Type Primary Key Comment idinstructions INTEGER(6) PK numeric identier of Instruction Status_idStatus INTEGER(2) PK identier of status (when instruction is to be executed) Batch_idBatch INTEGER(6) id of batch exit_code VARCHAR(20) exit code to sent (after executing instructions) instructions VARCHAR(255) text of instructions les VARCHAR(255) les contained in instructions Index Name Index Type Columns PRIMARY PRIMARY idinstructions Status_idStatus Instructions_FKIndex1 Index Batch_idBatch Instructions_FKIndex2 Index Status_idStatus Table A.8: Table Instructions A.1.10 Status In order to have integer identier for determining a status in other tables, we will use a table translating a string description of a status into a numerical value. The status are the status explained in section 2.2.5. Column Name Data Type Primary Key Comment idstatus INTEGER(2) PK numerical identier of status Status_name VARCHAR(20) description of status Index Name Index Type Columns PRIMARY PRIMARY idstatus Table A.9: Table Status A.1.11 Schedule The table Schedule allows dening a schedule for both a job and a group. The corresponding identier will be set. All columns can be found in table A.10. 94

A.1 Relational Database Design Column Name Data Type Primary Key Comment idschedule INTEGER(6) PK numerical identier of schedule Batch_idBatch INTEGER(6) PK id of batch (set if a batch is scheduled) Group_idGroup INTEGER(6) PK id of group (set if group is scheduled) days_of_week VARCHAR(255) days of week run_calendar VARCHAR(45) running calendar exclude_calendar VARCHAR(45) excluded calender start_minutes VARCHAR(255) start minutes run_window VARCHAR(45) run window schedule_type CHAR(1) type of schedule status CHAR(1) status of schedule (i: inactive, a: active) scheduling_dates DATE rst scheduling date execution_dates VARCHAR(45) execution dates (explicit scheduling) exclusion_dates VARCHAR(45) exclusion dates (explicit scheduling) mult TINYINT(1) multiple launches Index Name Index Type Columns PRIMARY PRIMARY idschedule Batch_idBatch Group_idGroup plannication_fkindex2 Unique Index Group_idGroup plannication_fkindex3 Index Batch_idBatch Table A.10: Table Schedule A.1.12 Rules This table contains rules, that can be dened for a table. All columns can be found in table A.11. Column Name Data Type Primary Key Comment label VARCHAR(20) PK label of rule description VARCHAR(65) description of rule Index Name Index Type Columns PRIMARY PRIMARY label Table A.11: Table Rules 95

A Appendix A.1.13 Schedule_has_Rules A schedule can be composed of a rules. This table implements the n:m relation between schedules and rules. All columns can be found in table A.12. Column Name Data Type Primary Key Comment Schedule_Group_idGroup INTEGER(6) PK Schedule_Batch_idBatch INTEGER(6) PK Schedule_idSchedule INTEGER(6) PK Rules_label VARCHAR(20) PK Index Name Index Type Columns PRIMARY PRIMARY Schedule_Group_idGroup Schedule_Batch_idBatch Schedule_idSchedule Rules_label Schedule_has_Rules_FKIndex1 Index Schedule_idSchedule Schedule_Batch_idBatch Schedule_Group_idGroup Schedule_has_Rules_FKIndex2 Index Rules_label Table A.12: Table Schedule_has_Rules 96

A.2 Application: TAUX A.2 Application: TAUX In order to show the viability of our approach toward our new job model (see section 2.2), we will regard one application exemplarily in this context: an application called TAUX (interest rates). TAUX is mainly an application of the back-oce, although it also includes some transversal functionalities acting on the front-oce. Its scope covers the management of the following nancial products: Swaps Caps and Floors Swaptions FRAS (Forward Rate Agreements) Cashows As nancial markets are networked together all around the world, the application TAUX is also acting world widely on a heterogeneous environment comprising several hundreds of machines. Table A.13 shows the job distribution of TAUX according to schedulers before the transformation. After unifying all existing jobs into our new job scheduling models, we have 842 jobs, because 137 were either detected to be obsolete or were merged into other jobs. Type Number of jobs Dollar Universe 425 Autosys 437 Crontab 85 Windows Scheduler 32 Total number 979 Table A.13: Jobs of TAUX according to schedulers 97