S CHEDULER U SER M ANUAL WP2 Document Filename: Work package: Partner(s): Lead Partner: KWF-WP2-D2-UIBK-v1.0-.doc WP2 UIBK UIBK Document classification: PUBLIC Abstract: This document is a user manual of the K-WfGrid Scheduler which is a service responsible for scheduling of scientific workflows based on Petri Net representation. The manual describes the steps which should be performed in order to install and configure the service, selecting between different available scheduling algorithms two basic algorithms and three advanced algorithms based on performance data. It describes also from the user s perspective the available service interfaces a Web Service interface, a Java API interface, and a Java servlet interface. PUBLIC 1 / 16
Delivery Slip Name Partner Date Signature From Marek Wieczorek UIBK 20/09/2006 Verified by Piotr Nowakowski CYFRONET 08/10/2006 Approved by Steffen Unger FIRST 08/10/2006 Document Log Version Date Summary of changes Author 0.1 30/11/2005 First version Marek Wieczorek 0.2 05/01/2006 Corrections after an internal review. Marek Wieczorek 0.3 29/08/2006 Manual for the final version of the product. Marek Wieczorek 0.4 20/09/2006 Corrections after an internal review. Marek Wieczorek 1.0 08/10/2006 QA check Piotr Nowakowski PUBLIC 2 / 16
CONTENTS COPYRIGHT NOTICE... 4 1. INTRODUCTION... 5 1.1. ABBREVIATIONS AND ACRONYMS... 6 1.2. REFERENCES AND SOURCE CODE... 6 2. PRODUCT USAGE... 7 2.1. RUNNING THE PRODUCT... 7 2.1.1. Operating Requirements... 7 2.1.1.1. Local hardware requirements... 7 2.1.1.2. Local software requirements... 7 2.1.1.3. Grid infrastructure requirements... 7 2.1.2. Step-by-Step User Setup... 7 2.2. BASIC OPERATION... 8 2.3. ADVANCED FEATURES... 9 2.4. KNOWN PROBLEMS... 9 3. INTERFACE REFERENCE GUIDE... 10 3.1. USING THE SCHEDULER AS A WEB SERVICE... 10 3.2. USING THE SCHEDULER AS A JAVA LIBRARY... 10 3.3. SCHEDULER JAVA API USER GUIDE... 10 3.3 EXCEPTION HANDLING... 11 4. SCHEDULER CONFIGURATION SERVLET... 12 5. TROUBLESHOOTING Q&A... 14 6. CONTACT INFORMATION AND CREDITS... 15 7. THE EDG LICENSE AGREEMENT... 16 PUBLIC 3 / 16
COPYRIGHT NOTICE Copyright (c) 2005 by UIBK. All rights reserved. Use of this product is subject to the terms and licenses stated in the EDG license agreement. Please refer to Section 5 for details. This research is partly funded by the European Commission Project K-WfGrid. PUBLIC 4 / 16
1. INTRODUCTION The K-WfGrid project provides a complete application execution and composition environment for the Grid. The application model adopted in the environment is the scientific workflow based on the Petri Net notation which is a successful representation used to describe the behavior of distributed systems. The workflow description language used in K-WfGrid is called GWorkflowDL. Workflows can be described on different abstraction levels (represented by different colors), of which only the lowest green level allows for execution. The Web Service technology is used in K-WfGrid as the representation describing activity interfaces. On the lowest abstraction level, every workflow activity (every workflow transition, using the Petri Net notation) is described by a web service interface, as a web service operation. For processing and execution of the workflows is responsible a service called Grid Workflow Execution Service (GWES). GWES can process the workflows which are on the lowest abstraction level. For transformation between different levels of abstraction are responsible other services of the K-WfGrid, called Workflow Conversion Tool (WCT), Automatic Application Builder (AAB), and the Scheduler. Dependences between different components of the K-WfGrid environment are depicted in Figure 1. Figure 1 K-WGrid environment The Scheduler is a component of the Grid Application Control layer of K-WfGrid. It determines which candidate web service operations (hereafter referred to also as service instances ) are selected in order to be used in the current workflow execution. The goal is to consider non-functional properties of the web services to choose between functionally equivalent instances of the same services, and to make the workflow ready for execution. Using the K-WfGrid terminology, the Scheduler converts socalled blue transitions (which are assigned to alternative service candidates) to green transitions (assigned to single service instances). PUBLIC 5 / 16
All decisions made by the Scheduler attempt to optimize execution performance, by trying to distinguish between semantically equivalent choices. Main features of the Scheduler are as follows: dynamic workflow scheduling, the use of dynamic performance, the use of knowledge-based performance predictions. The developer manual for the Scheduler is provided in a separate document (see References). 1.1. ABBREVIATIONS AND ACRONYMS AAB Automatic Application Builder GWES Grid Workflow Execution Service GWorkflowDL Grid Workflow Description Language KAA Knowledge Assimilation Agent WCT Workflow Composition Tool 1.2. REFERENCES AND SOURCE CODE Further online information about the Scheduler is available at the following links: Scheduler Java Docs: http://www.dps.uibk.ac.at/~marek/kwfgrid/scheduler/docs/ Scheduler CVS (password protected): http://cvs.ui.sav.sk/cgi-bin/cvsweb.cgi/kwfgrid/scheduler/ Scheduler Developer Manual: KWF-WP2-D2-UIBK-v1.0-.doc PUBLIC 6 / 16
2. PRODUCT USAGE The Scheduler is implemented as a software package which can be accessed both as a web service and as a Java library. The user specifies workflows to be scheduled, and the Scheduler creates workflow execution schedules for a real Grid environment. The user can customize the Scheduler by specifying the scheduling algorithm to be used. The detailed instructions how to use the Scheduler are available in the current section. 2.1. RUNNING THE PRODUCT 2.1.1. Operating Requirements 2.1.1.1. Local hardware requirements The Scheduler can be executed on any machine with Java Virtual Machine installed (version 1.5 or higher). 2.1.1.2. Local software requirements The Scheduler requires Java Virtual Machine (version 1.5 or higher) installed on the local machine. It depends on the Castor data binding framework (version 0.9.7), Jaxen XPath Engine (version 1.1), JDOM (version 1.0) and Apache Axis (version 1.3). 2.1.1.3. Grid infrastructure requirements Performance-driven scheduling requires the Monitoring Service (WP3) to be installed on each Grid site. Similarly, the prediction-based scheduling requires the KAA service (WP5) to be available. The Monitoring Service instances deployed on individual Grid sites are localized based on the information stored in GOM. The URL of the KAA service must be specified by the user, either in the properties file (scheduler.properties) or via the Scheduler Configuration Servlet. 2.1.2. Step-by-Step User Setup 1. Get the sources from CVS: export CVSROOT=:pserver:<username>@cvs.ui.sav.sk:/home/cvs cvs login cvs co kwfgrid/scheduler 2. Customize scheduler.properties file: cd <scheduler_root>/src/java/resources <edit> scheduler.properties - Specify the default scheduling algorithm: scheduling_algorithm = [easy random performancedriven predictiondriven predictionandperformancedriven] easy basic algorithm, selecting always the first service instance candidate, random - basic algorithm, selecting service instance candidates in a random way, performancedriven scheduling algorithm based on dynamic performance guidance (supported by the Monitoring Service, WP3), predictiondriven scheduling algorithm based on performance prediction (supported by the KAA service, WP5), PUBLIC 7 / 16
predictionandperformancedriven scheduling algorithm based on performance prediction and dynamic performance guidance (supported by the Monitoring Service, WP3, and the KAA service, WP5). The default scheduling algorithm can be changed via the Scheduler Configuration Servlet. It can be also specified for each scheduled workflow separately, by setting a special property scheduler.refinement.method for the workflow to one of the values mentioned above (easy, random, performancedriven, predictiondriven, predictionandperformancedriven). - Enable/disable groupid consistency mode. In the groupid consistency mode, the Scheduler tries to fulfill a common requirement to schedule operations of the same web service to the same physical resource. group_id_consistency = [true false] - Specify URLs of the KAA service: kaa_service = <kaa_service_url> 3. Compile and deploy the Scheduler: export CATALINA_HOME=<tomcat_home> export SCHEDULER_HOME=<scheduler_root> cd $SCHEDULER_HOME ant deploy The scheduler is now compiled and deployed as an Axis web service. 4. Configure the Scheduler Configuration Servlet: The Scheduler deployed as a web service can be configured and monitored by the Scheduler Configuration Servlet. The servlet should be configured in the servlet container: cd $CATALINA_HOME/webapps/axis/WEB-INF vim web.xml <servlet> <servlet-name>schedulerconfigurationservlet</servlet-name> <display-name>scheduler Configuration Servlet</display-name> <servlet-class> net.kwfgrid.scheduler.servlet.schedulerconfigurationservlet </servlet-class> </servlet> <servlet-mapping> <servlet-name>schedulerconfigurationservlet</servlet-name> <url-pattern>/scheduler/schedulerstatusservlet</url-pattern> </servlet-mapping> 2.2. BASIC OPERATION The Scheduler makes workflows ready-for-execution, by selecting the service instances which should be used in the execution. According to the terminology used in K-WfGrid, the Scheduler converts between the following 2 abstraction levels of workflows: Workflow of Service Candidates (Blue): Consists of lists of Web Service candidates which match the Web Service classes. PUBLIC 8 / 16
Workflow of Service Instances (Green): Consists of concrete instances of Web Service operations (selected from the lists of candidates). Blue workflows are processed by the Scheduler using different implemented scheduling algorithms, and green workflows are created accordingly. This functionality is used by the GWES which is responsible for coordination of the whole workflow processing, and which executes the green workflows created by the Scheduler. The operation of the Scheduler can be monitored by using the Scheduler Configuration Servlet. In Figure 4 we can see a part of a CTM workflow converted from blue to green by the Scheduler. Relevant messages concerning the operation of the Scheduler are logged into the system (see Figure 2). Figure 2 K-WfGrid portal: logged Scheduler messages 2.3. ADVANCED FEATURES Implementing a common user requirement, the Scheduler can work in a special groupid consistency mode. Working in this mode, the Scheduler tries to schedule the operations of the same web service to the same resource. The Scheduler checks if the workflow transition currently under consideration is assigned to instances of a web service which have already been considered during the given workflow scheduling process. If this is the case, and if a special instancegroupid property is set to the same value both for the current transition and for the relevant transition scheduled before, the Scheduler tries to assign to the current transition an instance of the same physical service deployment as the one selected before. For example, let us consider two transitions A and B of the same workflow, for both of which instancegroupid property is set to the same value (e.g., a ). Transition A is scheduled earlier and assigned to a service operation of a service s deployed on resource grid01. In that case when scheduling transition B, the Scheduler will also try to assign it to an operation of the service s deployed on the resource grid01. If such an assignment is not possible, then the Scheduler will perform a normal mode scheduling, ignoring the groupid consistency. 2.4. KNOWN PROBLEMS The performance-based and the prediction-based algorithms depend on the Monitoring Service, and the prediction-based and the prediction-and-performance-based algorithms depend on the KAA service. If any of the required services is not available or incorrectly configured, the corresponding algorithms may not work properly. PUBLIC 9 / 16
3. INTERFACE REFERENCE GUIDE The Scheduler can be accessed in two different ways: either as a web service, or as a Java library. A simple client application is provided together with the scheduler software package, which allows the user to schedule workflow using one of the two possible ways. In the current section, we describe shortly how to use the example client. We also describe the Java API of the Scheduler, and provide an introduction to the Scheduler Configuration Servlet. 3.1. USING THE SCHEDULER AS A WEB SERVICE The Scheduler can be accessed as a standard Axis web service. Script run-scheduler.sh provided with the source code of the scheduler can be applied to schedule workflows using a scheduler service deployed in a web service container: cd <scheduler_root>/scripts./run-scheduler.sh \ <scheduler_root>/src/test/resources/<workflow_xml> \ <scheduler_service_url> The program reads XML workflow specification from the input file, schedules the workflow using the service deployed on <scheduler_service_url>, and prints out the XML of the scheduled workflow to the standard output. The source code of the example client is available in class net.kwfgrid.scheduler.client.schedulerclient. 3.2. USING THE SCHEDULER AS A JAVA LIBRARY Script run-scheduler.sh can also be applied to use the Scheduler as a standard Java library. To this end, the syntax should be as follows: cd <scheduler_root>/scripts./run-scheduler.sh \ <scheduler_root>/src/test/resources/<workflow_xml> -nows 3.3. SCHEDULER JAVA API USER GUIDE The Scheduler provides a Java API which can be accessed directly on the source code level. All Java classes are grouped in package net.kwfgrid.scheduler and in its subpackages. 1. Change the default URL of the KAA service (optional): The default URL of the KAA service specified in scheduling.properties file can be redefined on the source code level. import net.kwfgrid.scheduler.kaainterface.kaabasedscheduling; KAABasedScheduling.setKAA_URI(<kaa_service_url>); 2. Change the default scheduling algorithm (optional): The default scheduling algorithm defined in scheduling.properties file can be redefined on the source code level. One of the following lines can be applied in the code: import net.kwfgrid.scheduler.scheduler; import net.kwfgrid.scheduler.schedulingmethod; PUBLIC 10 / 16
easy algorithm: Scheduler.setDefaultSchedulingMethodName( SchedulingMethod.EASY_SCHEDULING_PROPERTY_NAME ); random algorithm: Scheduler.setDefaultSchedulingMethodName( SchedulingMethod.RANDOM_SCHEDULING_PROPERTY_NAME ); performance-driven algorithm: Scheduler.setDefaultSchedulingMethodName( SchedulingMethod.PERFORMANCE_DRIVEN_SCHEDULING_PROPERTY_NAME ); prediction-based algorithm: Scheduler.setDefaultSchedulingMethodName( SchedulingMethod.PREDICTION_DRIVEN_SCHEDULING_PROPERTY_NAME ); prediction-and-performance-based algorithm: Scheduler.setDefaultSchedulingMethodName( SchedulingMethod. PREDICTION_AND_PERFORMANCE_DRIVEN_SCHEDULING_PROPERTY_NAME ); 3. Initialize an instance of the scheduler: Scheduler scheduler = new Scheduler(); 4. Schedule workflows: import net.kwfgrid.gworkflowdl.structure.workflow; Workflow[] scheduledworkflowxmls = scheduler.schedule(new Workflow[]{<workflow_1>,,<workflow_n>}); 3.3 EXCEPTION HANDLING There are several possible cases when an exception can occur during the scheduling. In particular: the workflow being scheduled may be malformed (e.g., a URL can be incorrect), the Monitoring Service may be not available, the KAA service may be not available. Once detected, all exceptions are handled within the Scheduler. The result of most of the exceptions is usually preventing the Scheduler from receiving the data necessary for performing an advanced scheduling algorithm (performance-driven, prediction-driven or prediction-and-performance-driven). Handling the exceptions, the Scheduler applies basic algorithm easy instead of the advance algorithm. PUBLIC 11 / 16
4. SCHEDULER CONFIGURATION SERVLET The Scheduler Configuration Servlet (see Figure 3-Figure 4) is designed to customize scheduler services deployed on the Grid, and to monitor the scheduling operations performed by the services. The servlet is integrated with the K-WfGrid portal, and has two views which show the status of the scheduler on the general level (main view, see Figure 3), and on the level of an individual workflow (workflow view, see Figure 4). By using the main view (see Figure 3), the user can configure the scheduler service, by changing the default scheduling algorithm, or changing the default URL of the KAA service (used in the predictionbased and in the prediction-and-performance-based algorithms). The user can also see the history of the scheduling operations performed by the service. Figure 3 Scheduler Configuration Servlet: main view By clicking on any workflow scheduled by the service, the user enters the workflow view which shows the details of the given scheduling operation (see Figure 3). The view shows the transitions of the original blue workflow, and the transitions of the green workflow created by the Scheduler. It shows also the details of the scheduling, including the algorithm used to schedule the whole workflow, and the scheduling decision for each processed transition. PUBLIC 12 / 16
Figure 4 Scheduler Configuration Servlet: workflow view PUBLIC 13 / 16
5. TROUBLESHOOTING Q&A Q: I can schedule workflows with both of the basic scheduling algorithms (easy and random), but I cannot do it with any (some) of the advanced algorithms (performance-based, prediction-based, prediction-and-performance-based). A: The basic algorithms do not depend on any other service of the K-WfGrid, so they should work always when the Scheduler is configured properly. The advanced algorithms depend on some other services (on the Monitoring Service and the KAA service), so they may not work when one of these services is not available or not configured properly. If you cannot use the prediction-based-algorithm or the prediction-and-performance-based algorithm, you should check if the URL of the KAA service is properly configured. If you have problems with the prediction-based algorithm or with the prediction-and-performance-based algorithm, the Monitoring Service may not be available on some Grid sites, and you should use another algorithm. PUBLIC 14 / 16
6. CONTACT INFORMATION AND CREDITS Contact person: Marek Wieczorek, (email: marek.wieczorek@uibk.ac.at) PUBLIC 15 / 16
7. THE EDG LICENSE AGREEMENT Copyright (c) 2004 K-WfGrid. All rights reserved. This software includes voluntary contributions made to K-WfGrid. For more information on K- WfGrid, please see http://www.kwfgrid.net. Installation, use, reproduction, display, modification and redistribution of this software, with or without modification, in source and binary forms, are permitted. Any exercise of rights under this license by you or your sub-licensees is subject to the following conditions: 1. Redistributions of this software, with or without modification, must reproduce the above copyright notice and the above license statement as well as this list of conditions, in the software, the user documentation and any other materials provided with the software. 2. The user documentation, if any, included with a redistribution, must include the following notice: This product includes software developed by K-WfGrid (www.kwfgrid.net). Alternatively, if that is where third-party acknowledgments normally appear, this acknowledgment must be reproduced in the software itself. 3. The names K-WfGrid and Knowledge Workflow Grid may not be used to endorse or promote software, or products derived therefrom, except with prior written permission by steffen.unger@first.fraunhofer.de. 4. You are under no obligation to provide anyone with any bug fixes, patches, upgrades or other modifications, enhancements or derivatives of the features, functionality or performance of this software that you may develop. However, if you publish or distribute your modifications, enhancements or derivative works without contemporaneously requiring users to enter into a separate written license agreement, then you are deemed to have granted participants in K-WfGrid a worldwide, non-exclusive, royalty-free, perpetual license to install, use, reproduce, display, modify, redistribute and sub-license your modifications, enhancements or derivative works, whether in binary or source code form, under the license conditions stated in this list of conditions. 5. DISCLAIMER THIS SOFTWARE IS PROVIDED BY K-WfGrid AND CONTRIBUTORS AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, OF SATISFACTORY QUALITY, AND FITNESS FOR A PARTICULAR PURPOSE OR USE ARE DISCLAIMED. K-WfGrid AND CONTRIBUTORS MAKE NO REPRESENTATION THAT THE SOFTWARE, MODIFICATIONS, ENHANCEMENTS OR DERIVATIVE WORKS THEREOF, WILL NOT INFRINGE ANY PATENT, COPYRIGHT, TRADE SECRET OR OTHER PROPRIETARY RIGHT. 6. LIMITATION OF LIABILITY K-WfGrid AND CONTRIBUTORS SHALL HAVE NO LIABILITY TO LICENSEE OR OTHER PERSONS FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, CONSEQUENTIAL, EXEMPLARY, OR PUNITIVE DAMAGES OF ANY CHARACTER INCLUDING, WITHOUT LIMITATION, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES, LOSS OF USE, DATA OR PROFITS, OR BUSINESS INTERRUPTION, HOWEVER CAUSED AND ON ANY THEORY OF CONTRACT, WARRANTY, TORT (INCLUDING NEGLIGENCE), PRODUCT LIABILITY OR OTHERWISE, ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. PUBLIC 16 / 16