CDBMS Physical Layer issue: Load Balancing

CDBMS Physical Layer issue: Load Balancing Shweta Mongia CSE, School of Engineering G D Goenka University, Sohna Shweta.mongia@gdgoenka.ac.in Shipra Kataria CSE, School of Engineering G D Goenka University, Sohna Shipra.kataria@gdgoenka.ac.in Abstract: Cloud Database Management System the latest emerging field in the IT world.cloud computing has many benefits but on the other side it has many barriers too. There are lot of complexity, heterogeneity involved in the structure of cloud henceload balancing has become one of the biggest challenge. In this paper, we have analyzed various terms related to load balancing algorithms, why load balancing is important, we have simulated Round Robin Algorithm and Throttled algorithm for Load Balancing for three broker policies Closest Data Center, Optimize Response Time and Reconfigure Dynamically with LB on Cloud Analyst Simulator and it has been concluded that for cloud, Dynamic algorithm outperform as compare to Static algorithm. Keywords: CDBMS Layers, Physical Layer, Load Balancing, Round Robin. I INTRODUCTION Cloud Providers provide various types of services like Infrastructure as a service (IAAS), Platform as a Service (PAAS), Software as a Service (SAAS) to their users[1]. Database as a Service based on payas-u-go pricing model is one of the main example of Cloud Software as a Service. Cloud Database Management System consists of five layers[2] as shown in figure 1. External Layer Conceptual Middleware Layer Conceptual Layer Physical Middleware Layer Physical Layer Figure 1. Cloud Database Management Layers [6] II LOAD BALANCING ISSUE AT PHYSICAL LAYER Due to the lot of complexity and heterogeneity involved in the Cloud Load balancing has become the core issue at Physical Layer of Cloud Database Management System. The goal of Load Balancing is to maximize the performance of the cloud system by transferring the task s from busy processor to other processors that are less busy, or even ideal processor. It should balance the load so that no system should underutilized and neither overloaded by so many user tasks [3]. Here, at Physical Layer we are totally talking about the Data Centers which consists of number of Hosts i.e. physical nodes. Hosts further consists of number of Virtual Machines [4]. As shown in figure 2, on Virtual machines user tasks or cloudlets are being executed. Data Centers (Amazon, Yahoo, Google etc) Number of Hosts ( Physical Machines) Number of Virtual Machines Figure 2. Relationship between Data Center, Host and Virtual Machine. Data Centers are the number of Service Providers like Amazon, Yahoo, Google etc. As shown in figure 3, Datacenters are characterized by number of hosts. Host are the physical nodes characterized by specifications likecpu (processor) speed, RAM, storage capacity, cost in terms of bandwidth allocated to it, architecture, operating system, VMM(virtual machine monitoring), time zone is as number of resources scattered all the whole world so from where we have to select the resource over the cloud. 27505

Virtual Machines are defined by characteristics like Processing element which is defined asmillions of instruction per second (MIPS), RAM to different virtual machine, priority. Then lastely, Cloudlet are the user task characterized by cloudlet length like millions of instruction, length of task,cloudlet filesize input and output file size, cloudletoutput size, processing element number i.e. pes number how many cores should be allocated to the user task. Datacenter (No. of Hosts) Cloud Infrastructure Service (Registry) Data Center Host OR Set of Hosts Virtual Machine OR Set of VMs Scheduling Policies VM Allocation VM Scheduling Cloudlet Scheduling Host (Processor speed, RAM, Storage capacity, bandwidth, Operating system, VMM, time zone) Number of tasks run (Cloudlet) Fig 4. Different Scheduling Levels Virtual Machine (VM Size: RAM, Bandwidth, Processing elements, VMM, priority) Cloudlet (Cloudlet length, cloudlet file size, cloudlet output size, processing elements number) Fig 3. Data Center, Host, Virtual Machine Characteristics. This is also known as Market oriented cloud architecture: At Physical machine upon which we are deploying virtual machines and on Virtual Machines user applications are being run. At the top there are users or brokers. Broker will decide to whom user can give its application. So there is a need of some load balancing strategies to effectively work in cloud environment. As shown in Figure 4, to effectively work with Load Balancing strategies in Cloud two levels of scheduling has to be done which is defined as: 1 st level scheduling: how to schedule user request or cloudlet over the available virtual machine this is also known as task level scheduling or cloudlet scheduling i.e. where to execute user jobs. 2 nd level scheduling: how to deploy the virtual machine over the available physical machine this is also known as Resource level scheduling. III CURRENT STATE OF WORK According to Authors[3][5][8][9], Two categories are considered for load balancing algorithms as shown in figure 5. Static Algorithms: are those which are suitable for stable environment. They Distribute load based on the prior knowledge of nodes capability and attributes like memory, processing speed, most recent communication performance but are not flexible to adapt in dynamic situation. Then they explained about the four main static algorithms: CLBDM: This algorithm is an improvement over Round Robin but the down side is that it is having single point of failure. Ant Colony: This algorithm generates more Network overhead and it is based on decentralized approach but advantage is that it collects information faster. Enhanced Map Reduce Algorithm: This algorithm first Map tasks and then Reduce result. It is based on three methods: part, comp and group. Request is partitioned into parts using map tasks. Each parts has been solved using hash key after that they are compared to form the groups using reduce tasks. VM mapping: only disadvantage with this algorithm is that it contains single point of failure, i.e. central scheduling controller. Central Scheduling Controller calculates 27506

which resource can take the task and then assign the task and here it consist of resource monitor also which gives the information about resource availability to Central Scheduler. Dynamic Algorithms: These algorithms assign task and may reassign task based on nodes capability and attributes and run time situation. These are complex in nature and efficient in nature. INS: it removes deduplication and redundancy, complicated in nature and require certain parameters of nodes like distance and time. ESWLC: based on existing algorithm WLC DDFTP: Dual Direction downloading algorithm from FTP server. Load balancing Min-Min: it is based on OLB. LBMM improves OLB by adding three layers. At first layer there is request manager who is responsible for taking the task and then assign it to service manager at second layer service manager divides the task and assign the subtask to the service node who is responsible for executing the task. Load Balancing Algorithms for Physical Layer systems does not provide the expected performance because of the complex, dynamic, distributed structure of cloud hence to improve the performance of the system load balancing is needed for Cloud. They classified load balancing based on two parameters: System Load: how to distribute the load among the systems based on this three approaches are valid: centralized, Distributed and Mixed System Topology: how to distribute the load based on the state of the system. Three approaches are applicable in this category Static, Dynamic and Adaptive approach. Authors also has given the examples of Real load balancing systems DNS (Domain Name Server): System which maps the name with ip addresses which ensure the high availability, fault tolerance and map the traffic to closest server. ZXTM LB: System for traffic management Amazon Load Balancing Systems G. Soni et al [7] proposed one Load Balancing Algorithm Central Load Balancer which balance the load based on state of Virtual machine and hardware configuration in data center. Authors explained the working of Central Load Balancer as follows: Static Algorithm Round Robin CLBDM Ant Colony VM Mapping Figure5. Load Balancing Algorithms Dynamic Algorithm ESWLC M. Zbakh et al [6] described load balancing as to improve the resource utilization like CPU, network, storage, utilization so as to improve the overall performance of system by these factors like optimal resource utilization, maximum throughput, maximize response time and avoiding overload. Traditional INS DDFTP LBMM Every request from user bases arrive at Data Center Controller. Data Center will approach to central load balancer It maintains a table that contains the information of VM Id, States and Priority of VM. CLB will look into the table and see for the highest priority of VM, priority of the VM is calculated based on their processing speed in MIPS and Memory resources with the help of following equation: Pr(i)=t*Tc(i)+s*Tm(i) Where (i<=e<=n) and t+s=1 Pr= Priority of Virtual Machine node Tc= Processing Speed Tm= Memory Resource t= the CPU weight 27507

s= the weight of memory After that it will check for the states of VM. If the state of the VM is available then it will give that VM Id to the Data center Controller. But if the state of VM is Busy then it will go for the next highest priority virtual machine. Finally Data Center Controller allocate the user request to VM Id that is provided by the Central Load Balancer Algorithm. The proposed algorithm is implemented and simulated with the help of Cloudanalyst Simulator and tested the efficiency with the help of two cases in the first case load is kept constant and the number of VM are increased, in the second case number of VM are kept constant and load is increased constantly. Authors concluded that in both cases Central Load Balancer efficiently shares the load of user request among various virtual machines. IV SIMULATION ANALYSIS Any datacenter must register to Cloud Infrastructure Service(CIS). Cloud information system entity is a registry that contains resources that are available on cloud.datacenter must be having characteristics these characteristics are for host. Host must me having some hardware characteristics like number of processing elements, RAM, Bandwidth. These host must be virtualized into VM s. VMs further will have some characteristics.broker is an inter-mediator between service provider and user, it will submit its task list to the Data center.this particular framework works on various policies [4]. VM allocation policy: Used by datacenter (service provider) to allocate VMs. VM scheduler policy: Used by host (having physical resources or hardware), as the processing is to be done on VM. Cloudlet Scheduler policy: Used by VM as processing of cloudlet is to be done on virtual machine. All the policies are either time shared, space shared. Space Shared & TimeShared[4] Space shared for Virtual machine as well as for Cloudlet: space is shared by virtual machine and task.at one time,only ONE Virtual machine and task will be assigned to a particular processing node or core. Space shared provisioning for VM s & time shared provisioning for cloudlet: space shared policy is used to allocate VM s & time shared policy is used for time shared hence during the VM lifetime, all the task assigned to it dynamically simultaneously context switch until their completion. We have divided the task depending on time, simultaneously we can run two task at the same time Time shared provisioning for VM s & space shared provisioning for cloudlet: at both the core with respect to VM. at both the core with respect to time we have divide the the time slots for VM. Wheneven we running the VM their corresponding task will run first task will completed then second task. Time shared provisioning for VM s as well as Cloulet: time is divide for VM, at each core the task is also divided with respect to time. There are many simulators available for cloud. Simulation provides the real environment to test the system. We have studied the two simulators in detail and simulate oneof the load balancing algorithm on cloud analyst. Although cloudsim provides more real environment of cloud. In the next paper we will implement this load balancing algorithm along with others algorithm on cloudsim. Criteria for comparing the load balancing algorithm[5][8]: Throughput: It is used to calculate as the total number of tasks executed within the fixed span of time it does not involve the creation and destruction time of virtual machine. It should be high to improve the performance of the system. Overhead: It is the amount of messages, communication involved in the system. This should be minimized in order to do work efficiently. Fault tolerant: It is defined as whether the whole system will work in failure or not. 27508

Response time: It is the time taken by the system to respond for a particular task. It should be minimized. Resource utilization: It is defined as how much resources are being used at particular time. It should be optimized. Performance: It is used to check the efficiency of the system. If throughput is good, response time is less, overhead is less resource utilization is optimized then system performance will be effective. We have simulated Round Robin Algorithm and Throttled Algorithm [3][8] on cloudanalyst Simulator. Round Robin is a static load balancing algorithm. This algorithm does not take into account the previous state of nodes while distributing the user tasks. It chooses the nodes randomly while allocating the user jobs so some of the nodes become heavily loaded and some become lightly loaded.throttled Algorithm on the other hand works in dynamic fashion. In cloud Analyst, we have done the implementation of Round Robin algorithm and Throttled. We have compared the Round Robin Algorithm and Throttled Algorithm on the basis of three Broker Policies i.e. Closest Datacenter, Optimize Response Time and Reconfigure Dynamically with LB..Figure 6. Cloud Analyst Simulation Environment Figure 7. Cloud Analyst Simulation Environment Figure 8 shows cloud analyst simulation configuration. Table 1 and 2 shows the overall response time in (ms) for three broker policies i.e. Closest Datacenter, Optimize Response Time and Reconfigure Dynamically with LB. Simulation Environment: Total Number of Datacenters: 12 Number of User bases: 12 Number Of Regions: 6 Number of VM per datacenter= 5 Total Virtual Machine Cost ($): 0.50 Total Data Transfer Cost ($): 0.77 Grand Total: ($) 1.27 As shown in figure 6,7two datacenters will be residing in one regions. Service Broker Closest Data Center Optimize Response Time Figure 8. Simulation Configuration Overall Response Time (ms) Average Minimum Maximum 50.11 37.60 62.38 50.08 37.13 64.13 Reconfigure 59.11 38.51 38448.25 Dynamically with LB Table 1. Overall Response Time of Round Robin Algorithm in (ms) for three Service Broker Policies. Service Broker Closest Data Center Optimize Response Time Reconfigure Dynamically with LB Overall Response Time (ms) Average Minimum Maximum 291.21 38.36 620.11 291.36 40.36 635.11 291.59 38.73 809.01 27509

Table 2. Overall Response Time of Throttled Algorithm in (ms) for three Service Broker Policies V RESULT ANALYSIS We have simulated Round Robin and Throttled Load Balancing Algorithm for three Service Broker Policies i.e. Closest Data Center, Optimize Response time and Reconfigure Dynamically with LB and analyzed that for the same setup i.e. number of datacenters, user bases overall response time for round robin is less as compared to throttled algorithm but for the broker policy Reconfigure Dynamically with LBresponse time of Round Robin is maximum because Throttled algorithm is Dynamic by nature and for Round Robin it is difficult to work with dynamic environment. VI CONCLUSION In this paper simulation has been done for Round Robin Algorithm and Throttled Algorithm with the help of Cloud Analyst Simulator and It has been concluded from the simulation that static algorithms gives better response time for Static environment but for dynamic environment static algorithms gives very poor results. As cloud is Dynamic in nature so Dynamic Algorithm are more suitable for Dynamic environment but it involves more overhead in terms of Communication, network messages etc. Computing & Simulation, 2009. HPCS'09, pp 1-11. [5] S. Begum, C.S.R Prashanth, Review of load balancing in cloud computing, International Journal of Computer Science Issues, Vol 10, Issue 1, No. 2, January 2013ISSN 1694-0814. Pp343-352. [6] A.Khiyaita, M.Zbakh, H.El Bakkali, D. El Kettani, Load Balancing Cloud Computing: State of Art, IEEE Network Security and System, ISBN 978-1-4673-1050-5, April 2012, pp 106-109. [7] G. Soni, M. Kalra, A Novel Approach for Load Balancing in cloud Data Center, IEEE 2014 International Conference Advance Computing Conference, ISBN 978-1-4799-2571-1, Feb 2014, pp 807-812. [8] N Jain Kansal, I. chana, Existing Load balancing techniques in cloud computing: a systematic review, Journal of information systems and communication. ISSN)976-8750, Volume 3, issue 1, 2012, pp-87-91. [9] Nikita, Shaveta, Gaurav Comparative analysis of load balancing Algorithm in Cloud Computing, International Journal of Advanced Research in Computer Engineering and Technolgy, ISSN : 2278-1323, Volume 1, Issue 3, May 2102. There are many Load Balancing algorithms available for cloud. In future we will simulate these Load Balancing algorithms with the help of CloudSim Simulator as Cloudsim simulator provides more functionalities for Cloud. REFERENCES [1] P. Mell and T. Grance, The NIST Definition of Cloud Computing, Version 15, 10-7-09. [2] S. Mongia, M.N. Doja, B. Alam, M. Alam, 5 layered Architecture of Cloud Database Management System, AASRI DCS2013 Conference. [3] S S Mohrana, R. D. Ramesh, D. Powar, Analysis of Load Balancers in cloud computing, IASET, International Journal of Computer Science and Engineering, ISSN 2278-9960, Vol. 2, Issue 2, May 2013, pp 101-108. [4] R. Buyya, R. Ranjan and R. N. Calheriros, Modelling and simulation of scalable cloud computing environment and the cloudsim Toolkit: Challenges and aopportunities, IEEE International Conference on High Performance 27510