46 CHAPTER 4 GRID SCHEDULER WITH DEVIATION BASED RESOURCE SCHEDULING 4.1 OUTLINE In this chapter, the significance of policy problem and its relationship with grid scheduling is explained in detail with the help of an example. Hence, to realize a controlled grid resource sharing, the grid environment must be equipped with resource usage policies and SLAs. In addition, they must be integrated with grid meta-schedulers. 4.2 DEVIATION BASED RESOURCE SCHEDULING Grid is highly dynamic with respect to user application requirements and grid is accessible for multiple users simultaneously. The existing grid infrastructure is rigid in nature it cannot satisfy all the user application requirements. In this situation, the grid scheduler fails to locate potential resources due to non-availability of execution environment. The emergence of virtualization technologies integrated with existing grid infrastructure addresses the above said issue by creating virtual resources in the potential resources and dynamically deploys the required execution environment. The virtualization technology integrated in the grid can customize user application requirements using the concept of on-demand provisioning of resources. To incorporate the virtualization technology with grid environment the grid scheduler needs the appropriate mechanisms for dynamic virtual machine creation and deletion. The existing grid schedulers do not have the mechanisms for dynamic creation of virtual grid resources in
47 remote physical resources to meet the application execution environment. The conventional grid schedulers fail to address the following scenarios: Application requires number of CPUs that shall be satisfied by a single cluster. Application requires number of CPUs that cannot be met by a single cluster, Application requires a completely different software environment that no cluster in the grid can provide, Application requires number of CPUs within a deadline. Hence, the grid scheduling mechanism has to consider both physical as well as virtual resources during the allocation of jobs. The proposed research work designs and implements an intelligent grid scheduling mechanism to prioritize the job requests for various scheduling scenarios and optimally allocate resources in an efficient and intelligent manner. The Integration of Virtualization technologies in grid infrastructure allows scheduling of user s application to the potential physical resources even if they do not possess the required application environment which will be provisioned using virtual resources. The performance of the negotiation algorithm mainly depends on the average negotiation time. The average negotiation time per SLA is mainly depends on the number of nodes selected for negotiation, which in turn depends on the order of available resources. The main drawback of the existing scheduling algorithms is that the resources are ordered against their own scheduling metrics (such as rank, budget, deadline etc) rather than against the job request. It leads to increase in the number of negotiation. Hence it is mandatory to deduce one resource ordering algorithm that
48 calculates the amount of deviation of resource parameters (should includes both positive and negative deviation) against the parameters specified in the job request. Here, the resource parameters are nothing but the parameters that are specified through resource usage policies in PMS. To calculate the deviation, the DRS has to identify the appropriate usage policy, gathers the resource parameters specified in that policy and compare it against the request. After calculating the deviation values, the scheduler orders the resources based on their deviation values. First, the pearson correlation coefficient is tried to identify the similarity between the available and the requested parameters. But it does not yield any fruitful solution. Next, the percentage deviation coefficient is applied that successfully computes the deviation in both directions (i.e. bipolar). The insight of computing the deviation coefficient is to order and select the resources based on their capability to fulfil the current request. This will automatically lead to reduction in the number of negotiations needed. With the resource back up support, the change of the resource at some providers during the calculation of the deviation coefficient does not affect the negotiation if it is not participate in the negotiation. Even though it is participating in negotiation, if it cannot provide the commitment, then the negotiation module switch the negotiation process to the next potential resource provider in the Meta scheduler s ordered host list. The proposed DRS algorithm calculates the percentage deviation of i th available resource (D ij ), each j th parameter specified in the request against the available resource s parameters using the equation specified in Figure 4.1. After calculating the D ij for every available resource, in order to scale down the percentage deviation between +1 and -1 (bipolar), divide all the D ij by the maximum or minimum deviation value in the Dij set(refer to Figure 4.1) for positive and negative region respectively.
49 Figure 4.1 Percentage deviation co-efficient and deviation values After calculating the deviation D, the DRS policy selects the resource based on the lollipop sequence (with modified ordering) of the deviation value. It starts to select the resource that have zero deviation value first, if not found then moved towards worst-case plug-in match (0+ t) travel along best-case plug-in match (+1), then shifted to best-case subsume match(0- t) and finally ends with worst-case subsume match(-1)(refer to Figure 4.2). (Here, t 0.001). The significance of lollipop sequence based resource selection is to identify the best fit resource for a job request. If a resource is available in the exact match region or in plug-in region for a request, the job should be scheduled to that resource in order to avoid complex negotiation process. If more than one matched resources are available for the request, then the scheduler selects the resource that it first sees while it walks through over the points from A to C(refer to Figure 4.2). If no match found in the region A to C, then SLA negotiation starts in the region D to E.
50 Figure 4.2 Lollipop sequence based resource selection The DRS resource selection procedure is explained with an example. In Table 4.1, R denotes the request posted by the user with specified parameters and A1, A2 specifies the resources with their capabilities that are available at that time. The illustration of DRS is explained with an example here. Table 4.1 contains the sample request that need to be scheduled along with the available resource parameters. Then the percentage deviation values that are computed using the equation in Figure 4.1 and is shown in Table 4.2. Finally, for all the resources the deviation values are computed using equation in Figure 4.1 and are given in Table 4.3. From Table 4.3, resource A1 in Red is in the exact region, A2 in yellow is in plug-in region, whereas all the other resources are in the subsume region colored in light blue. While ordering, the weightage can be set to any parameter. Here, the proposed approach gives more weightage to Number of CPUs (NCPU). So in subsume region, the resources are ordered based on their NCPU value. Hence the ordered resources are: A1, A2, A7, A8, A3, A4, A5 and A6. It is important to note that the DRS is accurate than the averaging (the deviation value) because the averaging method may select the resource which are not satisfying all the required parameters. But in DRS, the resources that the
51 Table 4.1 Sample request and available resource parameters Hardware Requirements (HR) NCPU RAM SS Speed (GHZ) Software requirements (SR) OS Software QoS Requirements BW (Mbps) R 50 2 40 3.3 RHEL4 MATLAB-6.0 10 A1 50 2 40 3.3 RHEL4 MATLAB-6.0 10 A2 100 4 80 3.3 RHEL4 MATLAB-6.0 20 A3 40 2 40 3.3 RHEL4 MATLAB-6.0 10 A4 40 2 40 3.3 RHEL4 MATLAB-6.0 10 A5 10 2 40 3.3 RHEL5 MATLAB-5.5 10 A6 10 2 40 3.3 RHEL4 MATLAB-6.0 10 A7 50 2 40 3.3 RHEL7 MATLAB-4.0 10 A8 50 1 20 2.2 RHEL4 MATLAB-6.0 10 Table 4.2 Percentage deviation co-efficient for the available hosts (D ij ) NCPU RAM HR SR QR SS Speed (GHZ) OS Software BW (Mbps) A1 0 0 0 0 0 0 0 A2 100 100 100 0 0 0 100 A3-25 0 0 0 0 0 0 A4-25 0 0 0 0 0 0 A5-400 0 0 0-100 -100 0 A6-400 0 0 0 0 0 0 A7 0 0 0 0-100 -100 0 A8 0-100 -100 0 0 0 0
52 Table 4.3 Deviation values for all the available hosts (D) NCPU RAM HR SR QR SS Speed (GHZ) OS Software BW (Mbps) A1 0 0 0 0 0 0 0 A2 1 1 1 0 0 0 1 A3-0.0625 0 0 0 0 0 0 A4-0.0625 0 0 0 0 0 0 A5-1 0 0 0-1 -1 0 A6-1 0 0 0 0 0 0 A7 0 0 0 0-1 -1 0 A8 0-1 -1 0 0 0 0 resources having deviation value greater than zero in all the parameters (here CPU-count, RAM and CPU%) fall in the plug-in region. If a resource obtain deviation value less than zero in anyone of the parameters, it will automatically fall in the subsume region. CPU-count, RAM and CPU%) fall in the plug-in region. If a resource obtain deviation value less than zero in anyone of the parameters, it will automatically fall in the subsume region. In short, this chapter gives the detailed narration about deviation based resource scheduling and its significance.