Scheduling Programming Activities and Johnson's Algorithm



Similar documents
Factors to Describe Job Shop Scheduling Problem

1 st year / / Principles of Industrial Eng. Chapter -3 -/ Dr. May G. Kassir. Chapter Three

Innovative Techniques and Tools to Detect Data Quality Problems

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

The problem with waiting time

Scheduling Shop Scheduling. Tim Nieberg

Graphical Scheduler. Get on time All the time. For Production

Algorithms and Data Structures

Introduction to Parallel Programming and MapReduce

10 Project Management with PERT/CPM

A Better Statistical Method for A/B Testing in Marketing Campaigns

14.1 Rent-or-buy problem

Market Simulators for Conjoint Analysis

A Comparative Study of CPU Scheduling Algorithms

Stack Allocation. Run-Time Data Structures. Static Structures

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Introduction to production scheduling. Industrial Management Group School of Engineering University of Seville

Common Data Structures

Linear Programming. Solving LP Models Using MS Excel, 18

ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION Francine Forney, Senior Management Consultant, Fuel Consulting, LLC May 2013

11.1 What is Project Management? Object-Oriented Software Engineering Practical Software Development using UML and Java. What is Project Management?

WorkStream Management Driving Early Project Delivery with Phoenix Project Manager

QUEUES. Primitive Queue operations. enqueue (q, x): inserts item x at the rear of the queue q

Rapid Bottleneck Identification A Better Way to do Load Testing. An Oracle White Paper June 2009

The Relative Worst Order Ratio for On-Line Algorithms

pm4dev, 2015 management for development series Project Schedule Management PROJECT MANAGEMENT FOR DEVELOPMENT ORGANIZATIONS

Object-Oriented Analysis. with the Unified Process. John W. Satzinger Southwest Missouri State University. Robert B. Jackson Brigham Young University

SYSTEMS ANALYSIS AND DESIGN DO NOT COPY

MAX = 5 Current = 0 'This will declare an array with 5 elements. Inserting a Value onto the Stack (Push)

Tolerance Charts. Dr. Pulak M. Pandey.

Application Survey Paper

Symbol Tables. Introduction

Data Structure and Algorithm I Midterm Examination 120 points Time: 9:10am-12:10pm (180 minutes), Friday, November 12, 2010

Load Testing and Monitoring Web Applications in a Windows Environment

FINANCIAL ECONOMICS OPTION PRICING

Analysis and Comparison of CPU Scheduling Algorithms

Using SAS/FSP Software for Ad hoc Reporting By Marc Schlessel SPS Software Services Inc.

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

5.5. Solving linear systems by the elimination method

Priori ty

Curriculum Map. Discipline: Computer Science Course: C++

Manufacturing. Manufacturing challenges of today and how. Navision Axapta solves them- In the current explosive economy, many

Lab Experience 17. Programming Language Translation

Scheduling. Anne Banks Pidduck Adapted from John Musser

Sequential Data Structures

Quantrix & Excel: 3 Key Differences A QUANTRIX WHITE PAPER

CHAPTER 2 Estimating Probabilities

Why is SAS/OR important? For whom is SAS/OR designed?

PowerScheduler Load Process User Guide. PowerSchool Student Information System

A Generalized PERT/CPM Implementation in a Spreadsheet

Rank-Based Non-Parametric Tests

Alternative Job-Shop Scheduling For Proton Therapy

The Distinction between Manufacturing and Multi-Project And the Possible Mix of the Two. By Eli Schragenheim and Daniel P. Walsh

Make Better Decisions with Optimization

Software Management. Dr. Marouane Kessentini Department of Computer Science

GENERALIZED INTEGER PROGRAMMING

Operations Management

QUEUING THEORY. 1. Introduction

Model Efficiency Through Data Compression

CHAPTER 1. Basic Concepts on Planning and Scheduling

Scheduling. Getting Started. Scheduling 79

Data Structures and Data Manipulation

Data Structures. Chapter 8

Supply and Demand in the Market for Money: The Liquidity Preference Framework

Benefits of Using Advanced Scheduling Technology With Primavera P6 Versus Resource-Leveling Only INTRODUCTION

Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA.

Section IV.1: Recursive Algorithms and Recursion Trees

Objectives. Chapter 5: CPU Scheduling. CPU Scheduler. Non-preemptive and preemptive. Dispatcher. Alternating Sequence of CPU And I/O Bursts

An Oracle White Paper February Rapid Bottleneck Identification - A Better Way to do Load Testing

Agile Software Development

College of Engineering and Applied Science University of Wisconsin -- Milwaukee

Adopting Agile Testing

Condusiv s V-locity Server Boosts Performance of SQL Server 2012 by 55%

IST 301. Class Exercise: Simulating Business Processes

Automated Scheduling Methods. Advanced Planning and Scheduling Techniques

Section Five Learning Module D:

Interaction Dialer 3.0. Best Practices

Liquidity premiums and contingent liabilities

Briefing document: How to create a Gantt chart using a spreadsheet

SmartArrays and Java Frequently Asked Questions

ICT Perspectives on Big Data: Well Sorted Materials

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

A Genetic Algorithm Approach for Solving a Flexible Job Shop Scheduling Problem

Pseudo code Tutorial and Exercises Teacher s Version

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Chapter Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

A Method for Cleaning Clinical Trial Analysis Data Sets

The Concept of State in System Theory

APP INVENTOR. Test Review

White Paper UC for Business - Outdial Queuing

Data Structures Using C++ 2E. Chapter 5 Linked Lists

System Copy GT Manual 1.8 Last update: 2015/07/13 Basis Technologies

Beyond the Simple SAS Merge. Vanessa L. Cox, MS 1,2, and Kimberly A. Wildes, DrPH, MA, LPC, NCC 3. Cancer Center, Houston, TX.

Transcription:

Scheduling Programming Activities and Johnson's Algorithm Allan Glaser and Meenal Sinha Octagon Research Solutions, Inc. Abstract Scheduling is important. Much of our daily work requires us to juggle multiple activities in order to maximize productivi ty. In general, we strive to minimize the overall time required to complete a series of related tasks. Although this is a very complex topic, there are certain situations that lend themselves to closed form solutions. Johnson's algorithm is one approach that can often be applied. The algorithm is easily stated and understood, and can be readily implemented in SAS. In addition to directly helping with scheduling, the programming illustrates some interesting SAS capabilities, and the final code is very simple and straightforward. Background and Introduction The problem will be introduced with a small example. Two members of a programming department have the assignment of producing a set of six programs. One person will do the initial programming, and the other person will validate the work. The programming and validation activities are independent, but obviously the validation must follow the programming. The complete set of 6 programs will be delivered as a unit, but the individual programs can be completed in any order. Here are the programs with the expected programming and validation times (in consistent but arbitrary units): program programming time validation time A 15 4 B 22 10 C 3 6 D 12 8 E 9 6 F 12 9 In what order should the work be done to minimize the overall elapsed time? Estimating Programming Effort Effort estimates for programming can vary due to inherent assumptions being made about (a) the scope and/or definition of work, (b) programmers skills and experience, and (c) availability of tools, test data and computing environments. While any set of estimates can be used for planning, the degree of confidence in the plan can be substantially increased by assimilating the expected time concept into the estimation process. Based on PERT/CPM project planning techniques, the expected time blends a variety of estimates into one robust estimate that is more reliable. Expected time (E) is based on optimistic time (O), most likely time (M), and pessimistic time (P) as follows: E = (O + 4M + P) / 6 This expected time (E) for each activity should be used as the best estimate to be plugged into the scheduling algorithm described here.

Problem Details The overall elapsed time between the start of the first programming activity and the completion of the last validation activity is called makespan. The objective of scheduling is to find an optimal order for the programming and validation, and thus minimize the makespan. Determining an optimal schedule requires some reasonable assumptions: The programming can only be done on one program at a time. The programming must be completed before validation can begin. Validation can only be done on one program at a time. The order of completing these programs is not important. After the initial programming, validation may be deferred. The validation order is the same as the programming order. There are no priorities or dependencies among these programs. Johnson's algorithm gives us a straightforward approach to finding an optimal order. There may be multiple orderings that yield the same minimal makespan; Johnson's algorithm will identify just one solution. It is important to understand that although this simple scenario may be valid in many real-world situations, there will be many problems that cannot be sufficiently simplified to fit Johnson's approach. Even with this small example, the optimal order is not obvious. Some people may intuitively schedule the most time-consuming activity first; the B-A-F-D-E-C order will yield a makespan of 82 units of time. Others may prefer the reverse order, i.e. C-E-D-F-A-B, thinking it wise to actually finish some work as early as possible. That will result in a makespan of 83. In this small example, however, the optimal makespan is actually 77, which can be achieved by scheduling in C-B-F-D-E-A order. At first glance, the difference between 82 or 83 and the optimal 77 may seem small and unimportant. But assuming that there are no significant short-term activities that can be used to fill any gaps in the schedule, it reflects a change in overall throughput and productivity of roughly 7%. In most programming shops, that is a very substantial improvement. Johnson's Algorithm A Closer Look In generic terms, our sample problem consists of a set of independent tasks, namely T 1 (program A), T 2 (program B), etc., and two independent processes, P 1 (programming) and P 2 (validation). The process order is fixed P 1 must be completed before P 2 can start. Johnson's algorithm is simple: 1. For each task T 1, T 2,..., T n, determine the P 1 and P 2 times. 2. Establish two queues, Q 1 at the beginning of the schedule and Q 2 at the end of the schedule. 3. Examine all the P 1 and P 2 times, and determine the smallest. 4. If the smallest time is P 1, then insert the corresponding task at the end of Q 1. Otherwise, the smallest time is P 2, and the corresponding task is inserted at the beginning of Q 2. In case of a tie between a P 1 and P 2 time, use the P 1 time. 5. Delete that task from further consideration. 6. Repeat steps 3 5 until all tasks have been assigned. 2

What happens if there is a tie between multiple P 1 or multiple P 2 times? There is no reason to prefer one task to another in this situation, and simply selecting the first task in the list is sufficient. This algorithm is easy to use manually, but there are advantages to implementing it in a SAS program. As tasks are completed, and as process times are revised, a program allows a new schedule to be created with minimal effort. A program also allows experimenting and looking at the results when a critical task is manipulated. And, having a systematic approach with scheduling encourages its use. Programming the Algorithm Multiple programming approaches were initially considered. In all of these, the first step began with getting the raw data into a SAS dataset: TASK PROCESS1 PROCESS2 A 15 4 B 22 10 C 3 6 D 12 8 E 9 6 F 12 9 The next step was to determine the smaller process time for each task, and also note if that time came from the first or second process. Tasks with P 1 time less than P 2 time would be assigned to the head queue Q 1 ; other tasks would be assigned to the tail queue Q 2. This dataset results: TASK PROCESS1 PROCESS2 MINTIME PLACE A 15 4 4 tail B 22 10 10 tail C 3 6 3 head D 12 8 8 tail E 9 6 6 tail F 12 9 9 tail Lastly, the dataset was sorted by MINTIME, yielding: TASK PROCESS1 PROCESS2 MINTIME PLACE C 3 6 3 head A 15 4 4 tail E 9 6 6 tail D 12 8 8 tail F 12 9 9 tail B 22 10 10 tail The first programming approach created two stacks. These were implemented as arrays and correspond to Q 1 and Q 2. As each observation was read, the project was pushed to the appropriate stack. After the last project was assigned the stacks were combined Q 2 was inverted and appended to Q 1. Although this worked, it required a significant amount of code. 3

The second programming approach used a linked list to represent the schedule. This was also implemented with an array, and included pointers to preceeding and subsequent list entries. This worked, but like the stack approach, required a significant amount of code. Occam's razor reminds us that the simplest approach may be the best, and that was true here. The final approach was quite simple. Two character variables are defined, corresponding to Q 1 and Q 2. As each observation is read from the sorted dataset, if PLACE has the value "head", then the task is concatenated to the end of Q 1. Conversely, if PLACE has the value "tail", then the task is concatenated to the front of Q 2. After the last task has been assigned, HEAD and TAIL are concatenated, thus creating the final complete list in variable QUEUE. In this example, the value of QUEUE is "C B F D E A. Some code changes are needed, of course, to handle longer task names. Replicating this list in an array or in separate observations in another dataset is trivial. Here is the core code: data queue ; attrib queue head tail length=$200 ; retain head tail '' ; set raw end=lastrec ; if place = 'head' then head = trim(head) ' ' task ; else tail = trim(task) ' ' tail ; if lastrec then do ; queue = trim(head) ' ' tail ; put queue= ; end ; run ; Efficiency Program efficiency is always a consideration. Scheduling programming shop activities will normally involve a small amount of data, and applying Johnson's algorithm should require minimal computer resources. Nonetheless, it may be instructive to look at some details. In each of the three above approaches, the data are stepped through one observation at a time and the data must be sorted. The time to do the former depends only upon the volume of data, and thus can be done in O(n) time. The sorting, however, requires O(n*log(n)) time, and that term dominates. Thus, actually using Johnson's algorithm and determining an optimal schedule is a function of n*log(n). Calculation of Makespan The calculation of makespan is interesting. At first glance, it is simply the sum of the times for the first process, plus any second process times that extend beyond the end of the first process for the last task. Looking at the optimal solution for our sample problem, we begin by noting that the first process can continue without interruption, yielding: TASK PROCESS1 PROCESS2 PROCESS 1 START TIME PROCESS 1 STOP TIME C 3 6 0 3 B 22 10 3 25 F 12 9 25 37 D 12 8 37 49 E 9 6 49 58 A 15 4 58 73 4

Since the first process is in continuous use for 73 units of time, and the second process for the last task requires 4 units of time, the makespan must be at least 73 + 4 = 77. In this example, the second process must frequently pause while it waits for a task to be completed by the first process. If the converse were true, then the makespan would be minimally the sum of the second process times, plus the first task's time for the first process. The calculation of makespan in realistic situations is beyond the scope of this paper, but it frequently helps to display everything in a graphical or Gantt chart form. PROC TIMEPLOT is a very simple way to do this: task begin stop_1 finish min max 0 84 *-------------------------------------------* C 0 3 9 BX--F B 3 25 35 B----------X----F F 25 37 46 B-----X----F D 37 49 57 B-----X---F E 49 58 64 B----X--F A 58 73 77 B------X-F *-------------------------------------------* Discussion What happens if there are three or more processes? In a SAS programming shop, these processes may be writing specifications, programming, and validation. This situation does not have a straightforward solution. In practice, solutions take the same general approach as the two-process situation, i.e. as tasks move through multiple processes, priority is given to tasks with the greatest tendency to move from short times to long times. There are numerous algorithms and approaches for doing this, and there is evidence that the solutions are near-optimal, but it is not clear how to quantify any differences. Furthermore, these algorithms may allow the tasks to progress through the different processes in different orders. Another potential complication is interference. Interference is defined as having conflicting demands on a single process, e.g. a single programmer being instructed to simultaneously work on multiple high-priority tasks, but able to work on only a single task at a time. Yet another complication is having multiple resources for each process. In these situations, which may include contemporary programming techniques such as pair programming, it is frequently helpful to simply consider the elapsed time that multiple resources will each concurrently spend on a process. In other words, if multiple programmers are working on the same task at the same time, simply consider the elapsed time, and not the total person-time. Note on SAS/OR for Operations Research The SAS/OR package does provide sophisticated scheduling capabilities for very large manufacturing, construction and logistics enterprises. SAS/OR software s project management capabilities gives the flexibility to plan, manage and track project and resource schedules through a single integrated system. The software is adept at handling complicated situations involving multiple project record keeping, resource priorities, complex project and resource calendars, substitutable resources with skill pools, multiple and nonstandard precedence relationships, and activity deadlines. 5

Implementing SAS/OR for Clinical Programming departments will be overkill. The simpler scheduling algorithm and program suggested in this paper are easier to implement and use in most cases. Conclusion Our typical day-to-day work does not completely conform to Johnson's assumptions. We generally must deal with assigned priorities, dependence among tasks, and processes which can concurrently handle multiple tasks. Nevertheless, Johnson's algorithm can be helpful and can frequently add insight to help planning, and it is essential that we understand the importance of minimizing makespan. Implementing Johnson's algorithm in SAS is deceptively simple. Initial approaches that were complex were abandoned in favor of a robust and easy technique. Many real world programming scheduling scenarios can be simplified to use the resulting program, and that can yield valuable insight to help programmers and managers schedule work to help optimize productivity. Acknowledgments The authors thank coworkers who share an interest in scheduling. Their constructive comments and enlightened discussion are appreciated. References Garg N, Jain S, & Swamy C. A Randomized Algorithm for Flow Shop Scheduling. Proceedings of FSTTCS, 1999, 213-218. This gives valuable insight for complex scheduling situations, but does not offer complete practical solutions. Hong TP, Chuang TN. Fuzzy Palmer Scheduling for Flow Shops with More Than Two Machines. Journal of Information Science and Engineering, 1999, 397-406. This addresses Palmer's work more rigorously. Johnson SM. Optimal Two- and Three-Stage Production Schedules with Setup Times Included. Naval Research Logistics Quarterly, 1954, March: 61-68. This is the first paper that solved the 2- process problem. It also addresses some issues with the 3-process problem. Palmer DS. Sequencing Jobs through a Multi-Stage Process in the Minimum Total Time A Quick Method of Obtaining a Near Optimum. Operational Research Quarterly, 1965, March: 101-107. This paper establishes the rule for prioritizing tasks that have increasing times for later processes. Sasieni, Yaspan, & Friedman. Operations Research, 250-269. Wiley; 1959. This is a detailed exploration of Johnson's original paper. SAS/OR. http://www.sas.com/technologies/analytics/optimization/or/#section=1. This gives an overview of the SAS/OR product and capabilities. 6

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Allan Glaser is a manager and Meenal Sinha is a Senior Associate in the Clinical Programming department at Octagon Research Solutions, Inc., and can be contacted at aglaser and msinha, both at @octagonresearch.com. Comments regarding this work would be appreciated. 7