SURVEY ON THE ALGORITHMS FOR WORKFLOW PLANNING AND EXECUTION Kirandeep Kaur Khushdeep Kaur Research Scholar Assistant Professor, Department Of Cse, Bhai Maha Singh College Of Engineering, Bhai Maha Singh College Of Engineering, Sri Muktsar Sahib Sri Muktsar Sahib kiransidhu206@yahoo.in Abstract:- In the research work, It is shown that study on various types of workflows, focusing the scientific workflows used for processing cloud. The present study mentions the practice applications of using such kind of workflows in the cloud computing. This paper also discusses the latest tools using in designing workflows and executing such workflows in cloud environments. Keywords:- Cloud computing, Workflow, Scientific workflow management system (SWfMS) for planning and execution. 1. Introduction Cloud computing is associated with a paradigm in which resources can be provisioned on demand over the internet. In cloud computing, workflows are concerned with the automation of procedures where jobs are passed between participants according to a defined set of rules to simplify the complexity of execution and management of applications which helps to manage the processes efficiently to satisfying the requirements of modern enterprises and users. Scientific workflow management system (SWfMS) supports to management of workflow execution. A considerable number of scientific workflow management systems (SWfMS) are Pegasus, Kepler, Taverna, DAGMan. Two major tasks in workflow process management are planning and coordination of its execution. Planning is the process of organizing the activities in workflow to balance the performance and process management. Workflow planning achieves by the planners or planning algorithms. 1.1 Types of Cloud workflows Major categories of workflows are scientific workflows and business workflows. Scientific workflows These workflows are used in number of functions like data analysis, image processing simulation etc. These workflows are used for the few and large tasks. For the large tasks it is necessary to divide the task on individual computers to complete the task. Montage workflow is the type of scientific workflow. Business workflows Business workflow allows to inter relate the business processes to the items. 1.2 Cloud Workflow components: Workflow components are explained by three parameters. Input Material and information are required. Transformation rules, algorithms Which are carried out by machines. Output Material, information are produced and used as input for next steps. 1.3 Basic characteristics of cloud workflows: Transparency Cloud workflows provide a mechanism of non-user visible task scheduling, self configuration and load balancing [12]. 2015, IJCIT All Rights Reserved Page 65
Multi-tenant architecture Cloud workflows posses the feature of multi-tenancy in cloud computing, a number of tenants can design deploy and run their workflows simultaneously [12]. Scalability Size of services based of computing demands. Because workflow management systems can attain the self configuration of computing resources by expanding and declining the running nodes according to control of cost the operating conditions to improve the performance [12]. Real-Time monitoring Cloud workflows provides the tools for fault controlling, Load balancing and node scale controlling by finding the running status of transaction processes in cloud computing [12]. 1.4 Workflow planners and workflow schedulers: Workflow planner describes the global view of the whole workflow to the cloud user including all tasks and all dependencies between the tasks and workflow schedulers compare these free tasks that are released by workflow engine to the resources (condor VM) to match and to execute them. Planning algorithms are used by workflow planners that can tie the any task to the any resource. For example HEFT. But scheduling algorithms can tie only free tasks to the available resources. Scheduling algorithms are used by workflow schedulers which are also known as local scheduling algorithms. For example: MAX-MIN, MIN-MIN. 2. Scientific workflow management systems in cloud computing These are software based management system developed for building scientific researches to monitoring the defined sequence of tasks arranged as workflow and to relate with vast amount of data to improve the results. These systems help to reuse and integrate domain of particular functions and tool through environments. These systems automate the error handling activities like data access, integration and transformation, data analysis and optimization of workflow execution. 2.1 Existing scientific workflow management systems for planning and execution of workflows The specification and execution of workflows are managed by workflow systems responsible for the coordination of the services involved. The following workflow management systems are: Pegasus Pegasus signifies the planning for execution of workflow in many areas of science [7]. Pegasus plans the resource-independent, user provided, workflow description onto the available resources. It provides the means to scientists build the workflows in abstract manner without bothering about the details of the underlying cyber infrastructure middleware [9]. The abstract workflow represented as directed acyclic graph where nodes represent computational tasks [6]. Pegasus automatically manages the data generated during workflow execution by staging them out to user specified locations by registering them to in data catalogues and by capturing their provenance information. Pegasus dynamically invent accessible resources and their characteristics, queries for the location of data (potentially replicated in the environment) [9]. Kepler It is the workflow system that helps the scientists to plan, design and executes the scientific workflows [4]. Kepler offers to provide the support for webservice-based workflow, graphical user interface, and an execution engine to edit and manage scientific workflow systems. Kepler uses the MoML (Modeling Markup Language in XML) enables the description of large number of workflow structures, embedded workflows among others. To use the Kepler, user have to install it on their machine, thus from the user s point of view, it is a local tool [8]. It uses the actor-oriented design approach to compose and finish the workflows. The computational components are called actors, and they are linked together to form a workflow. Triana Triana is open source workflow management system. It is the problem solving system. It provides the graphical user interface and a many number of data analysis tools. Job entities are known as task. Users 2015, IJCIT All Rights Reserved Page 66
can create the group of tasks without using the programming [8]. Taverna Taverna provides the graphical user interface to compose and plan the workflow services [8]. It is the open source and domain independent workflow management system. It consists of the number of tools to plan and executes the workflow. It provides the support to the web services, R services, JAVA services etc. Taverna is also running on user s machine like Kepler. Workbench of Taverna enables to automate the various experimental techniques through the integration of services including WSDL based single operation web services into workflows [5]. DAG Man DAG Man stands for Directed Acyclic Graph Manager. It is the workflow engine under the Pegasus workflow management system. It is basically job enumerating other jobs in the workflow and their dependencies. User can define their pre post scripts for each job to execute these scripts before and after the job s execution, respectively. File dependencies between the jobs are managed by user, as DAG Man manages only execution dependencies [8]. 2.2 Challenges faced by workflow management systems: A number of challenges are accepted by workflow management systems while applied over the execution environments or dealing with the big data. S No Name challenges of 1. Data Scale and computation (complexity) Desciptions Workflow execution requires huge amount of distributed data objects. These distributed data objects can be of complex types, different sizes or in other forms. To 2. 3. Resource Provisioning Collaboration In heterogeneous Environment reduce these problems computation needs to distribute the data over different computational nodes Unlike a grid environment, It requires the functionality and allocation of resources, network bandwidth to the scientific workflows. As the workflow has been located to execute, the resource provisioned to scientific workflow is fixed, which may control the scientific problems that can be hold by workflows. As scientific projects become more and more collaborating in nature, brings a number of challenges in heterogeneous environments to handle the collaboration. Workflow execution is also affected by heterogeneous performance of computing resource due to the variation in the design of physical machine. Data scale and computation complexity: Workflow execution requires huge amount of distributed data objects. These distributed data objects can be of complex types, different sizes or in 2015, IJCIT All Rights Reserved Page 67
other forms. Now a days data deluge problem is faced by scientific experiments, networks, satellites, sensors and the data requires to be processed fast than the computational resources. Data scale and management are beyond the capability of traditional workflows as they dealing with the traditional infrastructures for the scheduling and computing of data resources. In addition to data scale computation complexity is also a big problem for workflows. To reduce these problems computation needs to distribute the data over different computational nodes [10]. Resource provisioning It requires the functionality and allocation of resources, network bandwidth to the scientific workflows. As grid environments are not accomplished to providing the workflow with smoothly dynamic resource allocation. As the workflow has been located to execute, the resource provisioned to scientific workflow is fixed, which may control the scientific problems that can be hold by workflows [10]. Collaboration in heterogeneous environments Collaboration means interaction between workflow management system and execution environments like resource accessing, load balancing. As scientific projects become more and more collaborating in nature, which brings a number of challenges in heterogeneous environments to handle the collaboration. Workflow execution is also affected by heterogeneous performance of computing resource due to the variation in the design of physical machine [10]. 2.3 Merits of scientific workflow management systems 1. These software helps to define, implement and manage the workflows [12]. 2. These help to reduce the cost of operating transaction processes and improve the quality of service [12]. 3. Workflow manages the provisioning of dynamic resources in the cloud environments. It has a control on resource provisioning for workflow execution [11]. 4. Cloud workflow management systems assists for modeling, integrating the computing services and scheduling of service processes. 3. Literature survey In this paper,[yong zhao et al,2014] authors presents their experience in integrating the Swift scientific workflow management system with the Open Nebula Cloud platform, which supports workflow specification and sub-mission, on-demand virtual cluster provisioning, high-throughput task scheduling and execution, and efficient and scalable resource management in the Cloud. Authors set up a series of experiments to demonstrate the capability of our integration and use a MODIS image processing workflow as a showcase of the implementation. In the paper, [Suraj Panday et. Al,2013] the authors presented a review of workflow, workflow engine and its iteration with cloud computing, existing solutions for workflows and their limitations with respect to scalability and the key benefits that the cloud services offer workflow applications compared to traditional environment. In the paper, [Huang Hua et al,2013 ]the authors present technology of workflow and cloud computing, then present the concept and features of workflows, possible trend of proposed workflow in future. In the paper, [Jianwu Wanget et al, 2011] the authors integrated the Hadoop with Kepler to provide an easy-to-use architecture, which helps the users to compose and execute Map Reduce applications in Kepler scientific workflows planners and designers.this facilitates scientists to utilize MapReduce their domain-specific problems and connect them with other tasks in a workflows through the Kepler graphical user interface. researchers validate the feasibility of their approach via a word count use case. In the paper, [Hector Fernandez et al, 2011] the authors have proposed a chemistry- inspired workflow management system to solve the degree of parallelism, scalability, elasticity and distribution of clouds in cloud federation platform. To implement this workflow management system, authors have compared the result of Taverna workbench, Kepler, HOLC language for both centralized and decentralized environment together. 2015, IJCIT All Rights Reserved Page 68
In the paper, [ Weiwei Chen et al,2011] authors use workflow planning execution logs gathered from Pegasus and Condor to analyze overheads for set of workflow runs on cloud and grid platforms. 4. Conclusion In summary, it is implied that due to large volume of data being there is a need for planning before execution of such enormous data bases. The planning phases most critical part of running projects that huge task of execution specially when resources are limited in terms of memory, bandwidth etc. The problem gets more complex when there is huge variation of job sizes. Therefore making sure the resources one needs for execution are well planned and later on problems related to congestion slow response time optimized. These algorithms basically check the feasibility of scheduling before execution. The algorithms build on predictive optimal conditions before final execution occurs. The main algorithms discussed Pegasus, Kepler, Taverna, Triana and DagMan. Their advantages and disadvantages are also elaborated and it was found that Pegasus [6] [7] [9] planning algorithm most extensively used and DagMan [8] is most efficient among the algorithm designed here. In the end it is shown that there is a ample opportunities to improve this algorithm as they most of these work on one or two parameters only. 5. Future work Previous algorithm have limited work on building a solution does not take care of inter node bandwidth, task (memory to data processing), inter node traffic and congestion while planning workload distribution to virtual machines whether there is congestion or smooth flow traffic between virtual machines. It has not been considered which is critical as there may be precedence or sequence of job work depending upon each other. Hence there is ample chance to increase the reliability of the algorithm to increase its flow. There is a need to work on parameters that will make the improved algorithm in concept of LATE planning algorithm. 6. References [1] Marc Bux, Ulf Leaser, Dynamic CloudSim: Simulating Heterogeneity in Computational Clouds, 2013. [2] Weiwei Chen, Ewa Deelman, Workflow: A Toolkit for Simulating Scientific Workflow in Distributing Environments,2013. [3] AnjuBala, Dr. Inderveer Chana, A Survey of Various Workflow Scheduling Algorithms in Cloud Environments,2011. [4]Jianwu Wang, Daniel Crawl, IlkayAltintas: Kepler+Hadoop, A General Architacture Facilitating Data Intensive Applications in Scientific Workflow Systems,2009. [5] Suraj Panday, Dileban Karunamoorty, Rajkumar Buyya, Workflow engine for clouds. [6] Jens-S. Vockler, Gideon Juve, EwaDeelman, Mats Rynge, G. Bruce Berriman, Experiences Using Cloud Computing for a Scientific Workflow Applications,2011. [7] Christina Hoffa, Gaurang Metha, Timothy Freeman et al, On the Use of Cloud Computing for Scientific Workflows [8] Z Farkas, P. Kacsuka, P-GRADE Portal: A generic workflow system to support user communities,2010. [9] AlexandruCostan, CorinaStratan ElianaTirsa, MugurelIonutAndreica, ValentinCristea, Towards a grid Platform for Scientific Workflows Management. [10] Yong zhao et al, Migrating Scientific Workflow Management System from Grid to Cloud,2014. [11] Suraj Panday et. Al, Workflow Engine for Clouds,2013. 2015, IJCIT All Rights Reserved Page 69
[12] Huang Hua et al, Survey of cloud workflow, 2013. [13] SaraMigliorini, Mauro Gambini, Marcello La Rosa, ArthurH.M.terHofstede, Pattern-Based Evaluation of Scientific Workflow Management Systems,2011. [14] Hector Fernandez, Cedric Tedeschi, Thierry, A Chemistry-Inspired Workflow Management System for Scientific Application in Clouds,2011. [15] WeiweiChen, EwaDeelman, 01 Workflow Overheads Analysis and Optimizations, 2011. 2015, IJCIT All Rights Reserved Page 70