A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

A Replcaton-Based and Fault Tolerant Allocaton Algorthm for Cloud Computng Tork Altameem Dept of Computer Scence, RCC, Kng Saud Unversty, PO Box: 28095 11437 Ryadh-Saud Araba Abstract The very large nfrastructure and the ncreasng demand of servces of cloud computng systems lead to the need of an effectve fault tolerant allocaton technque In ths paper, we address the problem of allocatng user applcatons to the vrtual machnes of cloud computng systems so that falures can be avoded n the presence of faults We employ ob replcaton as an effectve mechansm to acheve effcent and fault-tolerant cloud Most of the exstng replcaton-based algorthms use a fxed number of replcatons for each applcaton whch consumes more cloud resources We propose an algorthm to determne adaptvely the number of replcas accordng to the fault rate of cloud vrtual machnes The proposed algorthms have been evaluated through smulaton and have shown better performance n terms of turnaround tme and throughput Keywords Fault tolerant, Replcaton, Cloud Computng, Fault rate I INTRODUCTION Cloud computng s an nternet based computng soluton whch s consdered as the next step n the technology evoluton of dstrbuted computng It provdes a comprehensve soluton for delverng IT as a servce and t facltates scalable and cooperatve sharng of resources among dfferent organzatons The cloud enables ondemand access to applcatons from anywhere n the world, wthout consderng ther mplementaton detals [1] Resources nclude storage, processors, platforms, or applcaton servces The flexblty of cloud computng s a functon of allocatng resources on demand [2], [3] In clouds, resources of dfferent physcal machnes can be grouped nto Vrtual Machnes (VMs) These VMs can be started and stopped on-demand on to meet servce requests Ths provdes maxmum flexblty to confgure varous parttons of resources for dfferent specfc requrements of user requests [4] VMs can fal to do ther work due to ther heterogenety and usage for longer perods of tme Falure of VMs has a great mpact on scalablty, performance, proft and consumer trust Fault tolerance s an approach where a cloud computng system contnues to work successfully even f there s a fault[5] Although cloud computng has been wdely adopted by the ndustry, fault tolerance s stll a man research challenge to be fully addressed [6]Because of the very large nfrastructure of cloud and the ncreasng demand of servces an effectve fault tolerant allocaton technque for cloud computng s requred[2] The man mechansms used n mplementng fault tolerance n cloud computng nclude checkpontng and replcaton Checkpontng s the ablty to save the state of a runnng applcaton to a stable storage In case of any fault, ths saved state can be used to resume executon of the applcaton from the pont n computaton where the check-pont was last regstered nstead of restartng the applcaton from ts very begnnng [7] In ths paper, ob and applcaton wll be used nterchangeably Replcaton s based on the assumpton that the probablty of a sngle VM falure s much hgher than of a smultaneous falure of multple VMs It avods ob recomputaton by startng several copes of the same ob on dfferent VMs Wth redundant copes of a ob, the cloud can contnue to provde a servce n spte of falure of some VMs carryng out ob copes wthout affectng the performance[8] Cloud applcatons must have dynamc fault-tolerant servces that detect faults and resolve t These servces enable applcatons to carry on ther computatons n case of falure wthout termnatng applcatons Also, these servces must satsfy the mnmum levels of qualty of servce (QoS) requrements of applcatons such as the deadlne to complete the applcatons, the number of computng VMs, the type of the platform and so on In ths paper, the man contrbuton s to develop a replcaton based algorthm for allocatng applcatons of users to the VMs of the cloud computng system The algorthm selects VMs accordng to the fnshng tme of applcatons rather than the response tme The algorthm generates the number of replcas dynamcally Ths means the number of replcas wll not be fxed for all the applcatons Ths paper s organzed as follows: secton 2 brefly explans related work of toleratng faults n cloud computng systems Secton 3 elaborates the proposed algorthm Secton 4 augments results and dscusses the performance of the proposed algorthm Secton 5 presents our concluson wwwcsetnet 395

VM 1 VMs Montorng Server VM 2 Customers obs Broker Replcaton Manager VM k Fgure 1 The components of the cloud computng system II RELATED WORK In [3], K Ganga and S Karthk dscussed the fault tolerance technques and classfed them as proactve and reactve technques Proactve technques predct the falure and replace the suspected resources from the other workng resources Reactve technques try to reduce the effect of falures when occurs such as checkpontng, replcaton and resubmsson They focused on applyng ob replcaton on scentfc workflow systems Alhosban et al [1] provdes a technque to dynamcally evaluate the performance of servces based on ther prevous hstory and user requrements Ther technque uses a predcton and plannng approach and t conssts of two phases In the frst phase, the fault lkelhood of the servce s assessed In the second phase, they bult a recovery plan to execute n case of fault(s) and they calculated the overall system relablty based on the fault occurrence lkelhoods assessed for all the servces Das and Khlar [2] have proposed a reactve model that ntegrates fault tolerance wth cloud vrtualzaton Ther model depends on usng success rate of the computng nodes and vrtualzaton and t ncludes the support of load balancng They have used replcaton mechansm to acheve the fault tolerance Jhawar, Pur and Santambrogo [9] have presented an approach toward transparently delverng fault tolerance on the applcatons deployed n vrtual machne nstances They have presented an approach for realzng generc fault tolerance mechansms as ndependent modules, valdatng fault tolerance propertes of each mechansm, and matchng user s requrements wth avalable fault tolerance modules to obtan a comprehensve soluton wth desred propertes Also, they have desgned a framework that allows the servce provder to ntegrate ts system wth the exstng cloud nfrastructure and provdes the bass to genercally delver fault tolerance as a servce Revewng lteratures reveals that all prevous works are manly based on usng the response tme as the man crtera when selectng VMs that can execute a cloud applcaton There s no work done n the area of cloud computng that consders the fnshng tme of applcatons on the cloud VMs In ths paper, an algorthm that depends on the fnshng tme of obs or applcatons on cloud VMs s presented and evaluated Also, our algorthm uses the mechansm of replcaton to tolerate faults of VMs f occurred Replcaton provdes an effcent way to guarantee the completon of obs accordng to the QoS requred by the user In cloud computng, most of the exstng replcaton based algorthms uses a fxed number of replcas Ths fxed number of replcas can lead to the use of extra VMs n executng user applcatons These extra VMs may be needed by other watng applcatons Thus, cloud wll lose the monetary beneft of these VMs So, there s a need to a way to provde a dynamc number of replcas to preserve the cloud resources III THE PROPOSED ALGORITHM The man purpose of the proposed algorthm s to mprove the performance of the cloud through mnmzng both the tme spent by the applcaton n the cloud and the effect of falure f occurred The algorthm depends on selectng VMs that have the earlest fnshng tme for user applcatons Also, t depends on usng the replcaton mechansm to generate multple copes of the same applcaton to be executed on multple VMs, smultaneously The components of the cloud computng system used n our paper are shown n Fgure 1 These components nclude the broker, the VM montorng server, the replcaton manager and the cloud VMs Consumers or users submt ther applcatons or obs along wth ther QoS requrements to the cloud through the cloud portal The obs wll be nserted n the broker queue The broker wll receve a ob from the broker queue along wth ts requred QoS Then, t wll ask the VM Montorng Server for a lst of sutable VMs for executng the ob The server wll reply wth a lst of VMs that can perform the applcaton along wth ther expected fnsh tmes for the user s applcaton The broker wll sort ths lst accordng to the fnsh tme of the applcaton on each VM The frst VM n the sorted lst, wwwcsetnet 396

whch s the VM wth the earlest fnsh tme, s selected as the man VM to execute the ob The fnsh tme of a VM for a ob s defned as:,,,, (1) where t(, ) s the executon tme of ob on VM and ST (, ) s the tme at whch ob wll start executon on VM The value of the ST(, ) s summaton of the executon tme of all the obs assgned to the VM and executed or to be executed before ob and t can be defned by:,, (2) Snce the selected man VM may fal, the system wll choose some other VMs from the lst on whch copes of the ob wll be executed Replcatng a ob can help dealng wth falure; then when one VM fals, the adverse effect on performance of the applcaton t runs can be reduced f replcas complete wthout falure Now, the broker wll ask the replcaton manager to determne the number of replcas The number of replcas should not be hgh n order to avod cloud overloadng The number of replcas s based on the falure rate of the man VM and the QoS requrements The falure rate of a VM s a representaton of the falure hstory of t Assume f s the number of tmes a VM faled to complete obs and n s the total number of obs assgned to the VM The falure rate of a VM s defned by: f, where 1 fr 0 n fr (3) When provdng hs QoS requrements, the customer determnes whether he needs replcaton of hs applcaton or not Ths can be acheved through a factor called rep whose value s gven by the customer to the cloud server provder If the value of rep = 0 then the customer does not need to replcate hs applcaton If the customer needs to replcate hs applcaton he wll set rep = 1 If rep =1, Replcaton Manager uses the falure rate of the man VM to determne the number of replcas for the customer s applcatons It checks the value of fr of the man VM If the value of fr s less than k t wll add an extra replca for the applcaton If the value of fr s greater than or equal k t wll add two extra replca for the applcaton The value of k s determned by the cloud servce provder accordng to the abundance of VMs that can execute the applcaton wthout affectng allocaton of other applcatons Fgure 2 shows the man steps of the proposed algorthm For each applcaton(ob) submtted to the cloud Begn Receve a ob wth QoS requrements from the portal; Request a lst of sutable VMs the ob from the VM montorng server; Receve a lst of sutable VMs from VM montorng server; Compute FT(, ) for each VM ; Sort the lst n an ascendng order accordng to the FT(, ) ; Determne the man VM as the frst one n the lst; Compute the fr for each the man VM; R = 0; /* R number of replcatons of applcaton */ If(rep==1) If(0 R = 1; Else R = 2; EndIf Retreve next ob as ; End Fgure 2 The proposed system s operaton IV RESULTS AND ANALYSIS In ths secton, the performance of the proposed algorthm s compared aganst the performance of a replcaton-based algorthm that depends on usng the response tme n selectng VMs and uses a fxed number of replcas The comparson s performed wthn cloud data centers wth varyng load and relablty In the smulaton experments, the number of applcatons submtted ranges from 100 to 500 The number of VMs n the grd s assumed to be 1000 1 A Throughput Throughput s one of the most mportant standard metrcs used to measure the performance of fault tolerant systems [10] It s used to measure the ablty of the cloud to accommodate applcatons Throughput s defned as: Throughput n) n (, (4) where n s the total number of applcatons submtted and T n s the total amount of tme necessary to complete the n applcatons Throughput acts as an ndcator of the cloud s proft When the throughput ncreases the proft of the cloud wll ncrease Fgure 3 and Fgure 4 show the throughput comparson of the proposed fnsh tme based algorthm wth a response tme based algorthm [8] for dfferent number of applcatons submtted In ths experment, the numbers of submtted applcatons by the customers to the cloud are 100, 0, 0, 0 and 500 The percentage of faults nected n the cloud s 10% n Fgure 3 and % n Fgure 4 T n wwwcsetnet 397

Throughput Throughput 80 70 60 50 Fnsh tme based Response tme-based 100 0 0 0 500 No of applcatons submtted Fgure 3 Throughput Comparson wth 10% nected faults 50 Fnsh tme based Response tme-based 10 100 0 0 0 500 No of applcatons submtted Fgure 4 Throughput Comparson wth % nected faults The fgures shows that the throughput of the proposed fnsh tme based algorthm s better than the throughput of the response tme based algorthm for the whole range of ob numbers Ths s because that one VM can have a good response tme when executng an applcaton but t may have a delay to start that applcaton Ths delay s due to that the VM may have a lot of work to do before startng the executon of the applcaton On the other hand, another VM wth a worst response tme but has a lttle work to do can start that applcaton and fnsh t earler Also, t s shown from the fgures that the value of throughput n Fgure 4 s less than ts value n Fgure 3 at the same number of applcatons submtted Ths s due to that n Fgure 4 the number of VMs faults s greater than the number of VMs faults n Fgure 3 2 B Turnaround Tme Turnaround s an mportant parameter for evaluatng the performance of dstrbuted computng systems It s the most mportant parameter users pay attenton to It can be defnes as the nterval between applcaton submsson and applcaton completon Fgure 5 depcts the comparson of the proposed fnsh tme based algorthm wth a response tme based algorthm [8] for dfferent percentages of faults nected n the cloud The percentages of faults nected are from 5% to 25% Turnaround Tme 35 25 15 10 Fnsh tme based Response tme-based 5 5% 10% 15% % 25% Percentage of faults nected Fgure 5 Turnaround tme comparson wth 500 applcatons submtted In general, the turnaround tme resultng from usng the two algorthms ncreases wth the ncrease n the number of applcatons submtted The fgures show that the proposed algorthm has better turnaround tme than the other algorthm for dfferent applcaton szes Ths s because n the response tme based algorthm the number of faulty VMs s more than that of the proposed algorthm Ths wll lead to more delay tme resultng from those man VMs that have the best response tme fal to execute the assgned applcatons As the number of faled VMs ncreases, the delay tme wll ncrease and thus the turnaround tme for executng user applcatons wll ncrease On the other hand, the proposed system selects the VMs whch are less prone to fal Ths wll lead to a small number of faulty VMs and lower delay tmes than the other algorthm V CONCLUSION In ths paper, we presented and evaluated a fault tolerant allocaton algorthm for cloud computng systems that uses the replcaton mechansm The algorthm depends on usng the fnshng tme n selectng VMs Also, the algorthm computes the number of replca for an applcaton accordng to the fault rate of the VM allocated to execute the applcaton The performance of the proposed algorthm s evaluated under dfferent condtons usng metrcs such as throughput and turnaround tme From results of experments, t can be concluded that the proposed algorthm provdes better performance wwwcsetnet 398

REFERENCES [1] A Alhosban et al, Self-healng Framework for Cloud-based Servces, Proceedngs of the 13 Int l Conf on Computer Systems and Applcatons, May 27- [2] P Das, P Mohan Khlar, "VFT: A Vrtualzaton and Fault Tolerance Approach for Cloud Computng," Proceedngs of the IEEE Conference on Informaton and Communcaton Technologes (ICT 13), Kanyakumar, Taml Nadu, n press, 13, pp 473-478 [3] K Ganga and SKarthk, A Fault Tolerent Approach n Scentfc Workflow Systems based on Cloud Computng, Proceedngs of the IEEE Internatonal Conference on Pattern Recognton, Informatcs and Moble Engneerng, February 21-22, 13, pp117-122 [4] R Buyyaa et al, Cloud computng and emergng IT platforms: Vson, hype, and realty for delverng computng as the 5th utlty, Future Generaton Computer Systems, vol 25, 09, pp 599-616 [5] D Sngh, J Sngh and A Chhabra, Hgh Avalablty of Clouds: Falover Strateges for Cloud Computng usng Integrated Checkpontng Algorthms, IEEE Internatonal Conference on Communcaton Systems and Network Technologes, 12 [6] P Gupta and S Banga, Revew of Cloud Computng n Fault Tolerant Envronment wth Effcent Energy Consumpton, Internatonal Journal of scentfc research and management, Vol 1, Issue 4, 13, pp 251-254 [7] G Belalem and S Lmam, Fault Tolerant Archtecture to Cloud Computng Usng Adaptve Checkpont, Internatonal Journal of Cloud Applcatons and Computng, Vol 1, Issue 4, October- December 11, pp 60-69 [8] Z Zheng, T C Zhou, M R Lyu and I Kng, Component Rankng for Fault-Tolerant Cloud Applcatons, IEEE Transactons On Servces Computng, Vol 5, No 4, October-December 12, pp 5-550 [9] R Jhawar, V Pur and M Santambrogo, Fault Tolerance Management n Cloud Computng: A System-Level Perspectve, IEEE Systems Journal, Vol 7, No 2, June 13, pp 288-297 [10] M Huda, H Schmdt and I Peake, "An agent orented proactve faulttolerant framework for grd computng," Proc of Internatonal Conference on e-scence and Grd Computng, Melbourne, Australa, pp 4-311, Dec 5-8, 05 wwwcsetnet 399