Comparison of Resource Scheduling in Centralized, Decentralized and Hybrid Grid Environments

Comparison of Resource Scheduling in Centralized, Decentralized and Hybrid Grid Environments N.Malarvizhi 1 and V.Rhymend Uthariaraj 2 1 Professor, Department of IT, Jawahar Engineering Collge, Chennai 2 Professor & Director, Ramanujan Computing Centre, Anna University, Chennai 1 nmv_94@yahoo.com 2 rhymend@annauniv.edu Abstract - In this paper, the performance analysis of the resource scheduling algorithms in the centralized, decentralized and hybrid grid architecture models is analysed. The benefits and limitations of the different architecture models are discussed, and a comparison regarding issues such as the effects of the system load and system size is given. Furthermore, the simulations are carried out in order to investigate the performance of the models using various performance metrics such as the average Total Time to Release (TTR), and the percentage of job completion and job failure. The overheads involved in the afore-mentioned architecture models are discussed. The percentage of improvement of resource scheduling algorithm in one architecture model over the other architecture models are analysed. Keywords - Grid computing, resource scheduling, cluster, total time, checkpoint I. INTRODUCTION Grid computing is a form of distributed computing that involves the coordination and sharing of computers, applications (jobs), storage, and network resources across dynamic and geographically dispersed organizations cited in [1]. Running a customer s job in a grid environment involves resource discovery, resource selection and job execution as discussed in [2]. In this paper, the grid system is considered as a collection of heterogeneous clusters (resources) named as multi-cluster grid. The clusters interconnect two or more servers together to create a single and unified resource. Resource scheduling is an important aspect in multi-cluster grid environment, where the consumers and resources are distributed geographically across multiple administrative domains. The motivation here is to improve the performance of the grid system by reducing the total time the job spends in the grid system, by increasing the throughput and by improving the resource utilization. II. RELATED WORK The author in [3] defines the grid scheduling as the process of making scheduling decisions, involving resources over multiple administrative domains. A number of approaches for resource management architectures have been proposed and the prominent ones are centralized, hierarchical and decentralized as mentioned in [4]. The grid resource management is an important component of the grid system cited in [5]. In the grid environment, the schedulers are responsible for ensuring the performance of both the user and the resource. The structure of schedulers such as centralized, decentralized and hybrid architecture model depends upon the number of resources managed, and the domain in which the resources are located. These architectures differ according to the scheduler, number of scheduling components, their autonomy and the job submission system. In this paper, the performance of the resource scheduling and load balancing algorithm in such architectures is analysed and compared based on different system parameters. The performance improvement of one algorithm over the other algorithms are analysed based on different simulation scenarios such as varying the number of jobs submitted and the number of clusters available. 382

III. COMPARISON OF DIVERSE GRID ARCHITECTURES In the centralized scheduler model, a variety of factors need to be considered for the effective scheduling of resources, such as the total time the user application spends in the grid system, the number of jobs completed (throughput), and the percentage of utilization of a resource. Since the environment is dynamic, the selection of resources based on any one of the criterion, such as resource processing power, job transfer delay or resource queue waiting time do not produce efficient schedules. The selection of resource by any one criterion, offered only a limited number of resources that satisfies the requirements of the job. To improve the performance of both the user and the resource, it is important to consider all criteria such as the amount of time needed to transfer a job to different resources (based on network bandwidth), how long the job will take to execute on different resources (based on processing power), and the time when the job starts its execution (based on resource queue length) together, to select a resource for executing the job, rather than considering them separately. The Multi Criteria Resource Scheduling (MCRS) algorithm proposed in [6] considers transfer time (data access cost), queue waiting time and the processing time of the resource and select a resource which gives minimum TTR for executing the job. To enhance the performance of the centralized resource scheduling algorithm further, a fault tolerant and recovery strategy is to be incorporated. The performance of the proposed MCRS approach is further enhanced by adding a checkpoint based fault tolerance and recovery technique into it which is named as Multi Criteria Resource Scheduling with Checkpoint Set(MCRS_CS) proposed in [7]. The centralized scheduler model, however, does not ensure scalability and promptness of Quality of Service (QoS), which results in less performance to the consumers. All resource requests and consumers feedbacks in the centralized architecture model are processed only through the single scheduler. There is a possibility of heavy traffic at the scheduler, and it is heavily loaded. Also, the presence of a single scheduler may lead to the problem of single point failure. The single point failure problem in the centralized scheduler model can be overcome by the decentralized and hybrid scheduler models. In these models, there is no central component responsible for scheduling. In the decentralized model, the schedulers interact among themselves in order to decide which resources should be allocated to the jobs being executed. The Decentralized Remote Execution (DRE) algorithm proposed in [8] considers multiple criteria of a cluster. To reduce the cluster state information collection overhead, the proposed algorithm performs the state information exchange through mutual information feedback. The job migration decision is carried out by evaluating the performance benefit of the jobs in terms of minimizing the total time spent by the job. In the hybrid model, the scheduler at the top of the hierarchy interacts with the local schedulers in order to decide about the schedules. The Hybrid Remote Exeution(HRE) algorithm is proposed in [9] which deals with the two-layered resource scheduling approach. When the user submits a job to the local cluster, the scheduler in the cluster determines whether the job can be dispatched to the nodes in the cluster itself, or transferred to the grid scheduler for remote execution. The decision is based on various job and resource characteristics. IV. PERFORMANCE EVALUATION AND DISCUSSION In this section, the performance of the resource scheduling algorithm in the centralized, decentralized and hybrid grid architecture models is compared, and the performance metrics on varying the system load and system size is analysed. The percentage of improvement of resource scheduling algorithm in one architecture model over the algorithm in other architecture models under different scenarios, is presented. The simulation setup and parameters are described below 383

TABLE 1 SIMULATION PARAMETERS Parameter Number of clusters 10 100 Number of servers per cluster 2 Number of nodes per server 4-16 Number of processors per node 1-8 Processing power per processor Job Length Value 277-577 MIPS Number of jobs 100-9000 Distance coefficient 1.3 Migration Threshold 0.1 Periodic information exchange Number of random neighbours Deadline Bandwidth A.Performance Metrics 100000-1000000 MI 20 Seconds 20% of Nbor 10 100 Seconds 100MB 1GB The metrics used for comparing the performance of the resource scheduling algorithm in diverse grid architectures, such as centralized, decentralized and hybrid, are given below: Average TTR (ATTR): It is defined as the ratio of the TTR of the submitted jobs and the number of jobs completed. Percentage of job completion: It is defined as the ratio of the number of jobs completed and the number of jobs submitted. Percentage of job failure: It is defined as the ratio of the no. of jobs failed & number of jobs submitted. B. Effect of System Load In this section, the performance of the resource scheduling algorithms in diverse architectures such as centralized, decentralized and hybrid is analysed under different system loads, i.e., how different number of jobs has an effect on the described performance metrics. Figures 1 to 3 show the performance comparison of the algorithms by varying the number of jobs from 1000 to 9000, with 40 clusters in the system. The performance comparison is done in terms of the average TTR. According to the simulations done, it is observed that, there is not much variation in the performance between the centralized, decentralized and hybrid resource scheduling algorithms, when the number of jobs is less than 1000. This is because, for a less number of jobs, the number of migrations will be less, since majority of the jobs can be executed in the originating resource itself without exceeding their deadline. The performance improvement can be realized only when the number of jobs is more. When the number of jobs is more, the concept of migration considerably improved the performance parameters. Figure 1 exhibit the results of the simulation. From Figure 1, it is observed that the DRE and HRE algorithms exhibit a smaller average TTR than the MCRS and MCRS_CS algorithms. When the number of jobs is 1000, the performance difference of hybrid over the other algorithms is less. As the number of jobs increases further, the performance difference between MCRS and HRE and also MCRS_CS and HRE algorithms increases as well. This is because, the increase in the number of jobs affects the centralized algorithm considerably, even with the provision of checkpoints. Hence, the time required to schedule the job from the central scheduler increases. A comparison of the performance of the DRE and HRE reveals that the HRE gives better results than the DRE, because in DRE the selection of the resource for job execution is done within a limited range of neighbouring resources. Figure 1 Effect of system load in ATTR It is observed that the average TTR of the hybrid algorithm is less than that of the centralized and decentralized resource scheduling algorithms for all the values of the system load. 384

The HRE algorithm has an average improvement of 52.16%, 40.73% and15.57% over the MCRS, MCRS_CS and DRE algorithms respectively. An analysis focused on the performance comparison of the centralized, decentralized and hybrid resource scheduling algorithms in terms of the percentage of jobs completed was done. When the number of jobs increases, the percentage of jobs completed increases for all the algorithms except the centralized algorithm. This is because, in the centralized environment, there is a single job submission system and all the jobs are submitted through a single scheduler. Hence, the percentage of job completion decreases when the system load increases in both MCRS and MCRS_CS algorithms. From Figure 2, it is observed that, only a lesser amout of performance improvement of the hybrid over the centralized is seen, when the load is 1000. As the number of jobs increases further, the performance difference between those two algorithms increases as well. This is because the increase in the schedule time reduces the number of jobs completed, and thereby, reduces the percentage of jobs completed within the specified deadline in the centralized algorithm. When comparing the performance between MCRS and MCRS_CS, MCRS_CS is 11.86% better than MCRS. The performance of the hybrid algorithm is more than that of the decentralized algorithm. This is because, in the decentralized algorithm, the selection of the resource for a job is done within a limited range, including its own cluster and the neighbouring clusters. Hence, the number of jobs completed is comparatively lesser in the decentralized algorithm than in the hybrid algorithm, which reduces the percentage of jobs completed in the decentralized algorithm than that of the hybrid algorithm. From Figure 2 it is observed that the HRE algorithm gives more percentage of jobs completed for all the values of the system load. It has an average improvement of 42.82%, 36.24% and 7.87% over the MCRS, MCRS_CS and DRE algorithms respectively. An analysis focused on the performance comparison of the centralized, decentralized and hybrid resource scheduling algorithms in terms of the percentage of jobs failed was done. It reveals that, when the number of jobs increases, the percentage of jobs failed increases rapidly only in the centralized algorithm. This is because, a single scheduler entity in the centralized environment offered only a limited number of resources which reduces the number of jobs completed within the specified deadline. Figure 3 Effect of system load in job failure From Figure 3 it is observed that the HRE algorithm produces a lesser percentage of jobs failed for all values of the system load. It has an average improvement of 30.53%, 27.19% and 8.68% over MCRS, MCRS_CS and DRE resource scheduling algorithms respectively. C. Effect of system size In this section, the performance of the resource scheduling algorithms in diverse architectures such as centralized, decentralized and hybrid is analysed to test the scalability effect of the grid system by varying the grid size, i.e., how different system sizes have an effect on the performance metrics. Figure 2 Effect of system load in job completion 385

Figures 4 to 6 show the performance comparison of the algorithms by varying the number of clusters from 10 to 100, with 1000 jobs submitted to the system. The performance analysis of the centralized, decentralized and hybrid algorithms in terms of the average TTR is considered. Figure 4 exhibit the results of the simulation. From Figure 4, it is observed that, for all the system sizes tested, the decentralized and hybrid algorithms exhibit a smaller average TTR than the centralized algorithm. As the number of clusters increases, the average TTR decreases for all the algorithms for the fixed number of 1000 jobs. There is huge difference in the average TTR between the centralized and the other algorithms, when the number of clusters is less. This is because, in the centralized algorithm, the schedule time is too high for the given number of jobs, due to a single central scheduler entity. The performance of the hybrid algorithm is better than the decentralized algorithm a little, as the decentralized schedulers do not have a global knowledge of the resources in the grid system. Figure 4 Effect of system size in ATTR At the system load of 1000 jobs, the HRE algorithm has an average improvement of 33.94% and 18.72% over the MCRS and DRE algorithms respectively. The performance is analysed by comparing the centralized, decentralized and hybrid resource scheduling algorithms, in terms of the percentage of jobs completed with varying number of clusters. Figure 5 show the results of the simulation. The hybrid algorithm achieves a better performance in a varying grid size scenario, with the decentralized algorithm close behind. There is only a minimal performance improvement in the hybrid algorithm over the decentralized algorithm, because in the decentralized algorithm, the selection of the resource is done within a limited range. Hence, the number of jobs completed in the decentralized algorithm is comparatively lesser than that of the hybrid algorithm. This reduces the percentage of job completions in the decentralized algorithm than that of the hybrid algorithm. Figure 5 Effect of system size in job completion At the system load of 1000 jobs, the HRE algorithm has an average improvement of 8.69% and 6.27% over MCRS and DRE algorithms respectively. The performance is analysed by comparing the centralized, decentralized and hybrid resource scheduling algorithms, in terms of the percentage of job failure, by varying the number of clusters. When the number of clusters increases, the percentage of jobs failed decreases for all the algorithms. This is because, when the number of clusters increases, the algorithms get more number of satisfied resources to execute the jobs. The performance difference between the hybrid and centralized algorithms increases with the increase in the number of clusters. This is because, in a centralized environment, even though the number of clusters increases, comparatively less number of jobs are completed within the specified deadline, due to a lack of the ability of managing more resources by the single scheduler entity. Hence, the number of jobs failed increases, which in turn, increases the percentage of job failure. Figure 6 show the results of the simulation. The performance of the hybrid algorithm is marginally better when compared with the decentralized algorithm, but reasonably good when compared with the centralized algorithm 386

V. CONCLUSION Figure 6 Effect of system size in job failure At the system load of 1000 jobs, the HRE algorithm has an average improvement of 23% and 17% over MCRS and DRE algorithms respectively. D. Comparative Analysis Grid environments cannot rely on having a single component permanently present. Thus, the centralized approach is not suitable for a large environment. Furthermore, it does not scale well if more number of resources are entering the grid system. The decentralized and hybrid approach seem to be more practical since the reliability is greater than the one for the centralized approach and the scalability is warranted. Since there is no migration concept in the centralized environment, there is no migration overhead in that environment. The migration overhead is lesser in decentralized environment, since the job migration is done within the limited numbers of neighbours in decentralized environment. For all the applications, the resource utilization in the centralized environment is higher, since most of the resources are utilized for executing the jobs submitted in the environment. The performance of the decentralized and hybrid resource scheduling algorithms is comparatively higher than the centralized environment algorithm. This is because, in decentralized and hybrid environments, if necessary, the jobs are migrated to remote resources for execution, which improves the performance of the system. This paper analyses the performance of the centralized, decentralized and hybrid resource scheduling algorithms for scheduling the jobs submitted in the multi-cluster grid environment. The performance of the algorithms depends highly on the parameters, the machine configurations and the workload. When choosing an appropriate scheduling model depending on the requirements and restrictions, there are always benefits and limitations to be taken into consideration. From the results obtained, it is observed that, due to scalability reasons, the centralized approach is not feasible for a large scale grid environment. Since, the scalability is warranted, the decentralized and hybrid algorithms are better than the centralized algorithm. The conclusion from the experiments is that, the hybrid resource scheduling algorithm produces less ATTR, more percentage of job completion and less percentage of job failure than the centralized and decentralized resource scheduling algorithms. When comparing the performance of the centralized and decentralized resource scheduling algorithms, the decentralized algorithm provides better results than the centralized algorithm. Unlike centralized scheduling, both decentralized and hybrid scheduling allow jobs to be scheduled by multiple schedulers. In decentralized scheduling, every scheduler can communicate with a limited number of schedulers in the system and schedule a job to them depends upon the requirements. In hybrid scheduling, there is a central scheduler and multiple lower-level sub-schedulers. The central scheduler is responsible for controlling the job execution and assigning the jobs to the lower-level schedulers. Each lower-level scheduler is responsible for scheduling the jobs onto the resources owned by an organization. The central scheduler can communicate with all the other lower-level schedulers in the system. Thus, compared to centralized and decentralized scheduling, hybrid scheduling is more scalable. References [1] Foster, I., Kesselman, C. and Tuecke, S. The Anatomy of the grid: Enabling scalable virtual organizations, International Journal of High Performance Computing Applications, Vol. 15, Issue. 3, pp. 200 222, 2001. [2] Li,M and Baker,M. The Grid Technologies, John Wiley & sons, 2005. 387

[3] Schopf, J.M. Ten actions when grid scheduling The user as a grid scheduler, Chapter 2 in Grid Resource Management Part I Introduction to grids and resource management, Kluwer Academic Publishers, pp. 15 23, 2004. [4] Buyya, R. Economic-based distributed resource management and scheduling for grid computing, Ph.D Thesis, Monash University, Australia, 2002. [5] Zhang, Q. and Li, Z. Design of grid resource management system based on information service, Journal of Computers, Vol. 5, No. 5, pp. 687-694, 2010. [6] Malarvizhi, N. and Rhymend Uthariaraj, V. A minimum time to release job scheduling algorithm in computational grid environment, Proceedings of the IEEE International Conference on Networked Computing, Seoul, Korea, pp. 13-18, 2009. [7] Malarvizhi, N. and Rhymend Uthariaraj, V. Fault tolerant scheduling strategy for computational grid environment, International Journal of Engineering Science and Technology, Vol. 2, No. 9, pp. 4361-4372, 2010. [8] Malarvizhi, N., Gokulnath, K. and Rhymend Uthariaraj, V. Load distribution through optimal neighbour selection in decentralized grid environment, European Journal of Scientific Research, Vol. 50, No.4, pp.575-585, 2011. [9] Malarvizhi, N. and Rhymend Uthariaraj, V. Hierarchical load balancing approach in computational grid environment, International Journal of Recent Trends in Engineering, Vol. 3, No. 1, pp. 19-24, 2010. 388