Inrnaional Journal of Sofwar Enginring and Knowldg Enginring World Scinific ublihing Company A SOTWARE RELIABILITY MODEL OR CLOUD-BASED SOTWARE REJUVENATION USING DYNAMIC AULT TREES JEAN RAME and AIING XU Compur and Informaion Scinc Dparmn, Univriy of Maachu Darmouh Norh Darmouh, MA 747, USA jrahm@umad.du hxu@umad.du Rcivd 8 Augu 5 Rvid 8 Ocobr 5 Accpd Day Monh Yar Corrcly mauring h rliabiliy and availabiliy of a cloud-bad ym i criical for valuaing i ym prformanc. Du o h promid high rliabiliy of phyical facilii providd for cloud rvic, ofwar faul hav bcom on of h major facor for h failur of cloud-bad ym. In hi papr, w focu on h ofwar aging phnomnon whr ym prformanc may b progrivly dgradd du o xhauion of ym rourc, fragmnaion and accumulaion of rror. W u a proaciv chniqu, calld ofwar rjuvnaion, o counrac h ofwar aging problm. Th dynamic faul r DT formalim i adopd o modl h ym rliabiliy bfor and during a ofwar rjuvnaion proc in an aging cloud-bad ym. A novl analyical approach i prnd o driv h rliabiliy funcion of a cloud-bad o Sar S ga, which i furhr vrifid uing Coninuou Tim Markov Chain CTMC for i corrcn. W u a ca udy of a cloud-bad ym o illura h validiy of our approach. Bad on h rliabiliy analyical rul, w how how co-ffciv ofwar rjuvnaion chdul can b crad o kp h ym rliabiliy coninly aying abov a prdfind criical lvl. Kyword: Sofwar aging; ofwar rjuvnaion; rliabiliy analyi; dynamic faul r DT; ho par S ga; Markov chain; chduling.. Inroducion Du o rcn advanc in cloud compuing chnologi, cloud rvic hav bn ud in many diffrn ara uch a raffic conrol, ral-im nor nwork, halhcar, and mobil cloud compuing. Cloud rvic providr hav rid o dlivr produc wih high qualiy of rvic QoS, which provid ur faul-olran hardwar and rliabl ofwar plaform for dploying cloud-bad applicaion [][]. owvr, cloud ouag ar ill vry common du o componn failur, which can affc qui ngaivly h rvnu of cloud-bad ym. rviou rarch on h rliabiliy of compur-bad ym ha focud on hardwar rliabiliy and availabiliy; conqunly, h hardwar faul olranc and faul managmn ar wll undrood and dvlopd [3]. Wih h promid high rliabiliy and availabiliy of phyical facilii, Corrponding auhor: Dr. aiping Xu, Aocia rofor, Compur and Informaion Scinc Dparmn, Univriy of Maachu Darmouh, Email: hxu@umad.du.
J. Rahm &. Xu including h hardwar facilii and hir aociad rdundancy mchanim, ofwar faul hav now bcom on of h major facor of failur in a cloud-bad ym. Sinc ofwar rliabiliy i conidrd on of h wak poin in ym rliabiliy, ofwar faul olranc and failur forcaing rquir mor anion han hardwar faul olranc in modrn compur ym [4, 5]. Thi work i moivad o dal wih h ofwar faul in cloud compuing in ordr o aur high rliabiliy and availabiliy of cloud-bad ofwar ym. In many afy-criical compur-bad ym, failur of h ofwar ym may lad o unrcovrabl lo uch a human lif [6]. Such ym ar rquird o b prfcly rliabl and nvr fail bad on h diciplin of faul-olran and rliabl compuing. Rliabiliy and availabiliy ar wo common way o xpr ym faul olranc in indury. A rliabl compur-bad ym ypically ha high availabiliy if unrliabiliy i h major cau for unavailabiliy. In hi papr, w focu on analyzing h rliabiliy of cloud-bad ym for ofwar faul olranc in ofwar rliabiliy nginring SRE. Tradiional SRE ha bn bad on analyi of ofwar dfc and bug uch a Bohrbug or inbug wihou conidring ofwar aging rlad bug [4]. Bohrbug ar mainly dign dfc ha can b liminad by dbugging or adoping dign divriy; whil inbug ar dfind a faul ha would op cauing failur whn on amp o iola hm. Th concp of ofwar aging phnomnon wa inroducd in h middl 9, which xplain ha h ym rourc ud by h ofwar dgrad gradually a a funcion of im [7]. Sofwar aging ar o how up du o mulipl facor uch a mmory bloaing, mmory lak, unrminad hrad, daa corrupion, unrlad fil-lock, fragmnaion in orag pac, and accumulaion of round-off rror whn running a pic of ofwar. I ha conidrably changd h SRE fild of udy, and bcom a major facor for h rliabiliy of fully d and dployd ofwar ym. To dal wih h ofwar aging problm and o aur ofwar faul olranc, ofwar rjuvnaion proc ha bn inroducd a a proaciv approach o counracing ofwar aging and mainaining a rliabl ofwar ym [8]. Sofwar rjuvnaion involv acion uch a opping h running ofwar occaionally, and claning i inrnal a.g., garbag collcion, fluhing opraing ym krnl abl, and riniializing inrnal daa rucur. Th impl way o prform ofwar rjuvnaion i o rar h ofwar componn ha cau h aging problm, or o rboo h whol ym. Du o h vr-growing cloud compuing chnology and i va mark, h workload of cloud-bad ym ha incrad dramaically. A havy workload of a cloud-bad ym will inviably lad o mor ofwar aging problm. In hi papr, w propo o u cloud-bad par componn a major ofwar componn in a compur-bad ym o nhanc i ym rliabiliy, and inroduc an analyicalbad approach o dvloping rjuvnaion chdul for cloud-bad ym in ordr o mainain hir high ym rliabiliy and nur a zro-downim rjuvnaion proc. Dynamic aul Tr DT ar adopd o modl h rliabiliy of a cloud-bad ym, and a novl analyical approach i prnd o driv h rliabiliy funcion of a major
A Sofwar Rliabiliy Modl for Cloud-Bad Sofwar Rjuvnaion Uing Dynamic aul Tr 3 yp of dynamic ga in DT modl, calld o Sar S ga. Th analyical approach i hn formally vrifid uing a Coninuou Tim Markov Chain CTMC modl o nur i corrcn. A h CTMC approach ha i inrinic limiaion of only upporing componn wih conan failur ra, o h b of our knowldg, our propod analyical approach i h fir formal way o corrcly driv h rliabiliy funcion of an S ga wihou uch a limiaion. To dmonra h pracical uag of our approach in valuaing h ym rliabiliy of a cloud-bad ym, w aum a rliabiliy hrhold for h ym undr conidraion. Whn h hrhold i rachd, h ofwar rjuvnaion proc i riggrd, and h rliabiliy of h cloud-bad ym i bood o i iniial a. Our ca udy how ha ofwar rjuvnaion chduling bad on h rliabiliy analyi of a cloud-bad ym can ignificanly nhanc i ym rliabiliy and availabiliy. Thi work xnd our prviouly propod approach o producing a rliabiliy-bad ofwar rjuvnaion chdul for cloud-bad ym [9]. In our prviou work, w u CTMC o driv h rliabiliy funcion of an S ga for cloud-bad ym. To ovrcom h limiaion of h CTMC approach, in hi papr, w prn a nw analyical approach, which i mor gnral and inuiiv, and may ponially uppor ofwar componn wih non-conan failur ra in our fuur rarch. Th r of h papr i organizd a follow. Scion dicu prviou work rlad o our rarch. Scion 3 prn a moivaing xampl for rjuvnaion of cloud-bad componn. Scion 4 dcrib how o modl and analyz h rliabiliy of cloud-bad ym uing DT. Scion 5 prn a ca udy o dmonra h validiy of our approach, and Scion 6 conclud h papr and mnion fuur work.. Rlad Work In 995, rarchr inroducd h o-calld ofwar rjuvnaion chniqu o dal wih aging-rlad ofwar faul [8]. Thi chniqu, in conra o raciv approach wih acion akn only afr a ofwar failur, i conidrd a proaciv approach ha prmpivly rar h aging applicaion and clan ofwar aging rlad bug [, ]. rviou udi on ofwar aging and ofwar rjuvnaion for prdicing a rjuvnaion chdul can b claifid ino wo cagori, namly analyical-bad and maurmn-bad approach []. In an analyical-bad approach, a failur diribuion i aumd for ofwar faul rlad o h ofwar aging phnomnon, and ofwar rjuvnaion i xcud a a fixd inrval bad on h analyical rul of h ym rliabiliy and availabiliy. Svral analyic modl hav bn propod o drmin h opimal im for rjuvnaion. Bobbio al. propod a fin-graind ofwar dgradaion modl for opimal rjuvnaion polici []. Bad on h aumpion ha h currn dgradaion lvl of h ym can b idnifid, hy prnd wo diffrn ragi o drmin whhr and whn o rjuvna. Vaidyanahan al. prnd an analyical modl of a ofwar ym uing inpcionbad ofwar rjuvnaion [3]. In hir propod approach, hy howd ha inpcion-bad mainnanc wa advanagou in many ca ovr non-inpcion
4 J. Rahm &. Xu bad mainnanc. Dohi al. inroducd a modifid ochaic modl o ima h ofwar rjuvnaion chdul [4]. Th propod modl i bad on mi-markov proc, which can maximiz h ym availabiliy. Koura and lai applid h ofwar rjuvnaion chniqu o clur ym in ordr o achiv hir high availabiliy [5]. In hir approach, ofwar rjuvnaion i carrid ou whn a ofwar dployd on a nod ar o xprinc dgradaion; hu an unchduld rboo may b avoidd. Alhough h abov approach inroduc variou modl for ofwar rjuvnaion, hy ar no inndd o addr complx ym componn bhavior and inracion, uch a dynamic rlaionhip bwn ofwar componn including paring rlaionhip and funcional dpndncy. Diffrn from h xiing analyicalbad approach, w focu on h dynamic bhavior of ofwar componn in h conx of cloud-bad ym. W adop h paring rlaionhip a an xampl o dmonra how dynamic rlaionhip of ofwar componn in a cloud-bad ym can b modld and analyzd uing DT. On h ohr hand, maurmn-bad approach appli aiical analyi o h maurd daa of rourc uag and dgradaion ha may lad o h ofwar aging problm. In a maurmn-bad approach, a monioring program i ud o coninuouly collc h ym prformanc daa, and analyz hm in ordr o ima h ym dgradaion lvl. Whn xhauion rach a criical lvl, h ofwar rjuvnaion proc i riggrd. Machida al. ud Mann-Kndall o dc ofwar aging from rac of compur ym mric [6]. Thy d for xinc of monoonic rnd in im ri, which ar ofn conidrd indicaion of ofwar aging. Grok al. udid h rourc uag in a wb rvr ubjc o an arificial workload [7]. Thy applid non-paramric aiical mhod o dc and ima rnd in h daa for prdicing fuur rourc uag and ofwar aging iu. Guo al. propod a ofwar aging rnd prdicion mhod bad on ur innion [8]. Th approach can b ud o prdic h rnd of ofwar aging bad on h quaniy of ur rqu o ofwar componn whil h ym i funcioning. Th xiing maurmn-bad approach ar faibl way o dc ofwar aging problm in ral-world compur-bad ym, bu hy ypically rquir o proc larg amoun of ym daa. Thu, hy ar no a fficin a analyical-bad approach. owvr, maurmn-bad approach do provid uful inigh abou dynamic ym bhavior and failur diribuion rlad o ofwar aging. A uch, our rarch i complmnary o h xiing rarch ffor on maurmn-bad ofwar rjuvnaion chniqu ha inviga h rlaionhip of ofwar mric and ofwar aging rlad ofwar faul uing aiical analyi [9]. Ohr rlad work ampd o addr h ofwar aging iu in virualizd daacnr. Machida al. propod a ri n bad availabiliy modl for virualizd ym wih im-bad rjuvnaion for virual machin []. Thy compard hr chniqu in rm of ady-a availabiliy, and uggd h opimal combinaion of rjuvnaion riggr inrval for ach rjuvnaion chniqu uing a gradin arch mhod. Thin al. propod an analyical approach ha modl availabiliy for
A Sofwar Rliabiliy Modl for Cloud-Bad Sofwar Rjuvnaion Uing Dynamic aul Tr 5 applicaion rvr []. Bad on h availabiliy modl, hy prnd poibl combinaion of virualizaion, high availabiliy clur and ofwar rjuvnaion. owvr, h abov approach ar no xplicily bad on ofwar rliabiliy analyi. In conra, our approach analyz ym rliabiliy uing DT modl, and can gnra rjuvnaion chdul ha xplicily aify h prdfind rliabiliy and availabiliy rquirmn of a cloud-bad ym. 3. Rjuvnaion of Cloud-Bad Componn Virualizaion chnology ha bn wll-adopd in cloud compuing, which allow on o har a machin phyical rourc among mulipl virual nvironmn, calld virual machin VM. A hown in ig., A VM i no boundd o h hardwar dircly; rahr i i boundd o gnric drivr ha ar crad by a virual machin managr VMM or a hyprvior. Sinc a VM can b aily crad and droyd, i i paricularly uful in a diar rcovry proc of a cloud-bad ym. In hi papr, a cloud-bad ym i rfrrd o a a ofwar ym ha coni of mulipl VM, whr ach VM i conidrd a ofwar componn wihin h ym. ig.. An xampl of a rliabl cloud-bad ym wih par ofwar componn A a proaciv faul managmn chniqu, ofwar rjuvnaion ha bn ud o rfrh ym inrnal a and prvn h occurrnc of ofwar failur du o ofwar aging. A mniond bfor, a impl way for ofwar rjuvnaion i ym rboo,.g., o rar a VM or all VM in a cloud-bad ym. Th baic ida of our approach i o cra a nw inanc of VM ha rplac h on o b rjuvnad. Sinc h nwly dployd VM inanc ha no y bn affcd by h ofwar aging phnomnon, h rliabiliy of h ofwar componn, afr bing rplacd, i bood back o i iniial condiion. To achiv high faul olranc and rliabiliy, w furhr adop h ofwar rdundancy chniqu uing wo diffrn yp of ofwar andby par, namly Cold Sar CS and S. In h conx of cloud compuing, cold andby man ha a ofwar componn i availabl a an imag of a VM, rahr han an aciv VM inanc. Daa bwn a primary componn and h par on i rgularly mirrord bad on a pcifid chdul,.g., mulipl im a day. Sinc a CS i no up running and do no ak any workload, i rliabiliy qual o wih a conan failur ra. Sinc a CS can b ard vry quickly, h rcovry im uing CS ypically
6 J. Rahm &. Xu ak ju a fw minu o no mor han wo hour. No ha a ofwar-dfind CS i qui diffrnc from a hardwar-bad CS in rm of i co and fficincy. Th co of a ofwar-dfind CS i i orag and vry lil CU im for daa mirroring; whil a hardwar-bad CS i a phyical dvic ha mu b availabl all h im in ordr o aur fa failovr [3]. urhrmor, a ofwar-dfind CS can b ard vry quickly, bu a hardwar-bad CS ypically rquir manual configuraion and adjumn in h vn of parial or oal failur. On h ohr hand, an S in h conx of cloud compuing i a ho andby VM inanc. Thi man ha h ofwar componn rving a an S mu b inalld and dployd, and mu b inanly availabl whn h primary componn fail. Alhough an S i dployd and running along wih h primary componn, i ypically do no ak any workload for procing ur rqu. To nur faul olranc, criical daa of an S i mirrord in nar ral im.g., in h rang of µ from h primary VM inanc. Thi gnrally provid a rcovry im of a fw cond in ca of a failur. Similar o CS, a ofwar-dfind S alo ha much lowr co and work mor fficinly han a hardwar-bad S. In our ym dign, ach criical primary componn mu b quippd wih a la on S and on CS in ordr o mainain h ndd rliabiliy. owvr, whn calculaing h ym rliabiliy, w only nd o conidr h primary componn and i S, bu no i CS, a h CS i no funcioning. A CS i conidrd for rliabiliy analyi only whn i bcom a primary componn or a ho par on. In h following, for impliciy, w dno a primary VM inanc/componn a, which i aciv and ha a full workload, an S a, which i aciv bu do no ak any workload, and a CS a C, which i inaciv and no funcioning a all. In our approach, a rjuvnaion chdul of a cloud-bad ym i crad bad on i rliabiliy modling and h analyical rul. Whn h rliabiliy of a ym componn or h whol ym rach a prdfind hrhold, h rjuvnaion proc i riggrd. W aum h rjuvnaion proc ak abou 3 minu, which i ypically ufficin for aring a CS and ranfr all rqu o h nw VM. A a impl xampl illurad in ig., uppo w hav wo inanc, a primary componn and a ho andby on, which ar dployd on wo diffrn phyical machin. Th wo phyical machin uually blong o wo diffrn zon dnod a Zon and Zon in ig., o a powr/nwork ouag in on zon will no affc h availabiliy of h ohr on []. To rjuvna h whol ym, w can ar wo CS C and C, dnod a and in ig., o rplac and, rpcivly. Alhough in ig., and ar dployd on h am phyical machin whr and ar dployd, rpcivly, in raliy, hi i no ncary and boh and can b dployd on any phyical rvr. Onc h par componn and ar up and running, rv a a nw primary componn and ar o proc nw ur rqu; whil rv a a nw S, which i kp aliv bu do no ak any workload. Manwhil, w allow 3 minu in oal for h old componn and o finih procing hir xiing rqu. Afr 3 minu,
A Sofwar Rliabiliy Modl for Cloud-Bad Sofwar Rjuvnaion Uing Dynamic aul Tr 7 w hu down and dl h componn and, which hall hav bn uccfully rplacd by and afr h rjuvnaion proc compl. inally, wo nw CS C and C ar crad and mad rady for h nx round of a rjuvnaion proc. No ha in our rjuvnaion ragy, w hav chon o hu down inanc and rahr han rar and ru hm. Thi i bcau diffrn from a phyical machin, a VM can b aily crad and dployd, hu dploying nw inanc and i a much mor fficin way han raring and ruing and. During h rjuvnaion procdur, w nd o conidr wo cnario. On cnario i o rjuvna h major ofwar componn all oghr. In hi ca, w rplica h whol ym whn h ym rliabiliy rach i hrhold. W call hi cnario a ym-pcific rjuvnaion. Th cond cnario i a componn-pcific on, whr ach im w only rjuvna h criical componn who rliabiliy i ypically h low on whn h ym rliabiliy rach i rliabiliy hrhold. A w can from a ca udy prnd in Scion 5, h componn-pcific rjuvnaion would b normally mor co-ffciv han h ym-pcific approach. 4. Modling and Analyi Uing DT In hi cion, w fir brifly inroduc DT, and hn w how how o u DT o modl and analyz h rliabiliy of a cloud-bad ym ubjc o ofwar rjuvnaion. To implify mar, w aum ha h im-o-failur for ach ofwar componn i.., a VM ha a probabiliy dniy funcion pdf ha i xponnially diribud; in ohr word, all VM hav conan failur ra. 4.. S Ga for Cloud-Bad Sym Th faul r modling chniqu wa inroducd in 96 a Bll Tlphon Lab, which provid a concpual modling approach o rprning ym lvl rliabiliy in rm of inracion bwn componn rliabilii [3]. aul r analyi TA i by far h mo commonly ud chniqu for rik and rliabiliy analyi, whr h ym failur i dcribd in rm of h failur of i componn. Sandard faul r ar combinaorial modl and ar buil uing aic ga.g., AND-ga, OR-ga, and K/M-ga and baic vn. A combinaorial modl can only capur h combinaion of vn wihou conidring h ordr of occurrnc of hir failur, hy ar uually inadqua o modl oday complx dynamic ym [3, 4]. DT augmn h andard combinaorial ga of a rgular faul r, and inroduc hr novl modling capabilii, namly par componn managmn and allocaion, funcional dpndncy, and failur qunc dpndncy [5]. Th modling capabilii ar ralizd uing hr main dynamic ga: h par ga, h funcional dpndncy ga, and h prioriy-and ga. Th work don in hi papr u h dynamic par ga, in paricular h S and CS ga. No ha a par ga ha on primary inpu and on or mor alrna inpu i.., h par. Th primary inpu i iniially powrd on, and whn i fail, i i rplacd by an alrna inpu. Th par ga fail whn h primary and all h alrna inpu fail. igur how an S ga wih
8 J. Rahm &. Xu on primary componn dnod a and on ho par componn dnod a. Th S ga fail whn boh of h wo componn and fail. ig.. A S ga wih on primary componn and on ho par componn Suppo h conan failur ra of componn and ar and, rpcivly. Sinc do no ak any workload whn i funcioning, i failur ra i ypically lowr han. Whn fail, ak ovr workload, and bhav a a primary componn. now ha a highr conan failur ra han du o h ofwar aging phnomnon wih full workload. or hi raon, w call h par componn, afr i rol raniion,. No ha and do no hav o b qual bcau and may hav diffrn configuraion. Thr ar wo cnario whn h S ga fail. In h fir cnario, fail bfor fail. Thi ca i illurad a Ca in ig. 3, whr fail a and fail a, wih <. In h cond cnario, fail bfor fail. In hi ca, do no hav a chanc o bhav a a primary componn, and h failur of immdialy lad o h failur of h S ga. Thi ca i illurad a Ca in ig. 3, whr <. Ca Ca ig. 3. Two ca for h failur of an S ga Ca : fail bfor ; Ca : fail bfor W now driv h rliabiliy funcion R of h S ga by conidring h abov wo ca. Ca : fail bfor fail, dnod a p. In hi ca, i i guarand ha do no fail during, ]. Afr fail, ak ovr h workload and bcom. Inuiivly, h diribuion funcion p of h S ga, i.., h probabiliy ha h S ga fail during, ] can b calculad a in Eq..
A Sofwar Rliabiliy Modl for Cloud-Bad Sofwar Rjuvnaion Uing Dynamic aul Tr 9 p r p T d d owvr, Eq. work only whn, i.., h conan failur ra of do no chang afr i wich i rol from a par componn o a primary on a im. Whn >, a w can from ig. 4, h ingraion of h pdf of from o do no giv h corrc unrliabiliy of h componn a im, bcau i incorrcly aum ha h componn bhav a aring from im. Sinc h componn acually bhav a during, ], h unrliabiliy of a im qual h unrliabiliy of a rahr han h unrliabiliy calculad by h ingraion of h pdf of from o. Thi rquir u o calcula a nw aring ingraion im for uch ha h unrliabiliy of a rprnd by h hadd ara undr h pdf of i qual o h unrliabiliy of a rprnd by h hadd ara undr h pdf of. A h pdf of and ar f and f, rpcivly, uch a rlaionhip bwn and can b dcribd a in Eq.. d Solving Eq., w hav. Sinc fail during a priod of im -, h ingraion rang for now bcom [, ]. Bad on h abov analyi, h probabiliy of fail bfor fail can b calculad a in Eq. 3. d ig. 4. Th iniial unrliabiliy of whn fail i.., h unrliabiliy of a im
J. Rahm &. Xu r d d T p p 3 To implify h ingraion rang for, w can ubiu u for variabl in Eq. 3, and driv h diribuion funcion of h S ga apparing in Ca a in Eq. 4. p r u u d dud dud T p p 4 Ca : fail bfor fail, dnod a p. In hi ca, i i guarand ha do no fail during, ]. Th diribuion funcion of h S ga, i.., h probabiliy ha h S ga fail during, ] can b calculad a in Eq. 5. p r d d d d T p p 5 A h wo ca ar complly indpndn, h unrliabiliy of h S ga a im i h ummaion of h unrliabiliy valu of h wo ca a im. Thu, w driv h unrliabiliy funcion U of h S ga a in Eq. 6.
A Sofwar Rliabiliy Modl for Cloud-Bad Sofwar Rjuvnaion Uing Dynamic aul Tr U p p 6 Accordingly, h rliabiliy funcion R of h S ga can b drivd a in Eq. 7. U R 7 I i worh noing ha hr i an obviou bu ubl hird ca, whr componn and fail xacly a h am im, dnod a. A h probabiliy of failur aociad wih h vn [T ] i, i.., h probabiliy ha ihr or fail during [, ] i, h unrliabiliy of h S ga in h hird ca mu qual. Thi rul can b aily drivd a in Eq. 8, whr fail a im during, ], and fail xacly a h am im whn fail. r d d d T 8 4.. Vrifying h Rliabiliy uncion Uing CTMC To formally vrify h corrcn of h rliabiliy funcion R of h S ga drivd in Scion 4., w now u a CTMC modl and olv i a quaion. igur 5 how h CTMC modl corrponding o h S ga givn in ig.. Thr ar four a o 4 dfind in h CTMC modl, which ar dnod a,,, and AILURE, rpcivly. Th a Sa rfr o h on in which boh h primary componn and h ho par on ar funcioning. Whn h ho par componn or h primary on fail, h modl nr i a Sa or a Sa 3, rpcivly. No ha w dno Sa 3 a inad of bcau in Sa 3, h ho par componn ha a diffrn failur ra a h on in Sa. ig. 5. Th CTMC modl of h S ga in ig.
J. Rahm &. Xu L i b h probabiliy of h ym in a i a im, whr i 4, and ij d [Xd j X i] b h incrmnal raniion probabiliy wih random variabl X. Th following marix [ ij d], whr i, j 4, i h incrmnal on-p raniion marix [4] of h CTMC dfind in ig. 5. d [ d] ij d d d d d d 9 Th marix [ ij d], whr i, j 4, i a ochaic marix wih ach row um o. Thi marix provid h probabilii for ach a ihr rmaining whn i j or rani o a diffrn a whn i j during h im inrval d. Givn h iniial probabilii of h a, h marix can b ud o dcrib h a raniion proc complly. rom h marix dfind in Eq. 9, w can driv h following rlaion a in Eq..-.4. d d. p d d d. p d d d.3 3 3 d d d.4 4 3 4 whr h iniial probabilii ar dfind by h probabiliy of h ym bing a Sa. Thu w hav, and 3 4. A d go o, w driv a of linar fir-ordr diffrnial quaion a in Eq..-.4, which ar a quaion of h CTMC modl. d ' lim d d. d ' lim d d. 3 d 3 3 ' lim 3 d d.3 4 d 4 4 ' lim 3 d d.4 Th a quaion dfind in Eq..-.4 can b olvd uing Laplac ranformaion, which allow o ranform a linar fir ordr diffrnial quaion ino a linar algbraic quaion ha i ay o olv. L h Laplac ranformaion of i b i a dfind in Eq.., h Laplac ranformaion of i can b drivd a in Eq...
A Sofwar Rliabiliy Modl for Cloud-Bad Sofwar Rjuvnaion Uing Dynamic aul Tr 3 } { d L i i i. ' } ' { i i i i d L. Now apply h Laplac ranformaion dfind in Eq..-. o boh id of h Eq..-.4, w can driv Eq. 3.-3.4. 3. 3. 3 3 3 3.3 3 4 4 3.4 Subiuing h iniial probabilii i, whr i 4, ino Eq. 3.-3.4, w can olv, and 3. By furhr applying invr Laplac ranformaion o, and 3, w can olv h original linar fir ordr diffrnial quaion in Eq..-.3 a follow. 3 3 Th rliabiliy funcion R i h ummaion of, and 3, which can b calculad a in Eq. 4, 3 R 4 I i ay o ha Eq. 4 giv xacly h am formula a h on dfind in Eq. 7; hu, i vrifi h corrcn of our propod analyical approach for calculaing h rliabiliy of h S ga a im. No ha 4 i h probabiliy ha h ym i in i AILURE a a im. Thrfor, 4 acually dfin h ym unrliabiliy funcion U 4 - R. 4.3. Modling and Analyi Uing DT in Two ha To modl and analyz h rliabiliy of a cloud-bad ym wih par componn, w conidr wo diffrn pha. ha rprn h pr-rjuvnaion ag whr h rliabiliy analyi i bad on h failur ra of h primary componn and hir S. CS ar no conidrd in hi pha bcau hy canno ak ovr h ym load inanly whn boh h primary and ho par componn fail. W modl h
4 J. Rahm &. Xu ym rliabiliy uing DT, and hn calcula i rliabiliy bad on h rliabiliy funcion of S ga drivd in Scion 4.. ha i h ofwar rjuvnaion pha. Whn h prdfind rliabiliy hrhold i rachd, h ofwar rjuvnaion proc i iniiad, and h ym nr hi pha. A w hav mniond, hr ar wo rjuvnaion cnario, namly h ym-pcific rjuvnaion and h componn-pcific on. To illura h baic ida of calculaing h ym rliabiliy in hi pha, w u h fir cnario a an xampl, whr h whol ym i rjuvnad. In hi cnario, w ar wo CS and o rplac and, rpcivly. During h rjuvnaion priod, all four ofwar componn,, and coxi and ar funcioning. A hown in ig. 6, h dynamic faul r modl i dcompod ino ubr, S and S, which ar all S ga ha ar conncd by an AND-ga. Thi i bcau h ym fail only whn boh of h wo S ga fail, and h failur of a ingl S ga during h rjuvnaion pha will no lad o h failur of h whol ym. Subr S coni of componn and ha ar o b rjuvnad; whil ubr S coni of h nwly dployd componn and, which ar ud o rplac and. A boh S and S ar dfind a S ga, hy can b compud uing h am analyi chniqu a dcribd in ha. ig. 6. A DT modl wih S ga ha Onc w hav h diribuion funcion of S and S, h aic ga, i.., h ANDga, can b aily olvd uing h um-of-dijoin-produc SD mhod [3]. Spcifically, o calcula h rliabiliy of h whol ym in hi pha, w fir calcula h unrliabiliy funcion U S and U S for S and S, rpcivly. Thn h rliabiliy of h AND-ga can b calculad a in Eq. 5. R U U U 5 AND S S In h following ca udy, w will conidr boh of h wo cnario during h rjuvnaion proc, whr Scnario involv rjuvnaion of h whol ym, and in hi ca, w nd o rplica all major ofwar componn whn h ym rliabiliy rach h hrhold. On h ohr hand, Scnario i componn pcific, hu w only rjuvna h mo criical componn who rliabiliy i h low whn h ym rliabiliy rach i hrhold.
A Sofwar Rliabiliy Modl for Cloud-Bad Sofwar Rjuvnaion Uing Dynamic aul Tr 5 5. Ca Sudy A challnging ak in cloud compuing i o corrcly maur h rliabiliy of a cloudbad ym and mainain i high rliabiliy. In hi ca udy, w how how o modl and analyz h rliabiliy of a cloud-bad ym uing DT, and hn ima an ffciv rjuvnaion chdul ha m h high rliabiliy rquirmn of h ym. W conidr a ypical cloud-bad ym a hown in ig. 7, which coni of an applicaion rvr A and a daaba rvr B. To nhanc h ym rliabiliy, wo ho par componn A and B ar up for A and B, rpcivly, which ar rady o ak ovr h workload onc h primary on fail. No ha ach of h rvr i dployd in diffrn zon for faul-olranc purpo []. A a clarificaion for h rliabiliy analyi in hi ca udy, w viw a VM wih i OS, h rvr ofwar and h dployd rvic a a ingl ofwar componn. In addiion, w only conidr h rliabiliy of h rvr wihin h box drawn wih dahd lin, and aum h proxy rvr rliabiliy i idal. urhrmor, w aum ha h proxy rvr and h applicaion rvr can monior and dc failur of h applicaion rvr and h daaba rvr, rpcivly. roxy Srvr Zon monior monior Zon 3 rplac App Srvr A App Srvr A monior monior Zon Zon 4 rplac DB Srvr B DB Srvr B ig. 7. A cloud-bad ym wih rvr and hir S To nur a high rliabiliy of h ym, w a rliabiliy hrhold of.99, and aum h conan failur ra of h rvr b A.4, A.5, B.5, and B.3. No ha h failur ra of h ho par componn ar lowr han hir corrponding primary on bcau h par componn do no ak any workload whn h primary on ar funcioning. owvr, whn a primary rvr fail, h aociad ho par componn ak ovr h workload; in hi ca, i failur ra will incra accordingly. W aum h ho par componn hav h am configuraion a hir aociad primary on, hu w hav A A.4 and B B.5.
6 J. Rahm &. Xu Thi ca udy involv 8 ofwar componn ha ar pli ino wo group. Th fir group coni of h four rvr hown in ig. 7. Th cond group coni of four CS componn ha ar ud o rplac h rvr in h fir group during h rjuvnaion proc. W nam h vr in h cond group a A, A, B, and B. A h CS componn ar undployd VM imag, hir failur ra ar. Onc dployd, hy will hav h am failur ra a hir corrponding ofwar componn du o h aumd am configuraion. igur 8 how h DT modl of h cloud-bad ym in ha. Bcau h ym fail whn ihr h applicaion rvr or h daaba rvr fail, h wo S ga ar conncd by an OR-ga. Th rliabiliy funcion of h OR-ga can b drivd a in Eq. 6. ig. 8. DT modl of h cloud-bad ym ha R U U U U 6 OR S S S whr U S and U S ar h unrliabiliy funcion of h ubr S and S, rpcivly. According o Eq. 7, U S and U S can b calculad a in Eq. 7 and Eq. 8, rpcivly. No ha Eq. 7-8 hav bn implifid du o h aumd configuraion, whr A A and B B. U U A A A A A S RS A A 7 B B B B B S RS B B 8 In ha, w conidr boh of h cnario mniond in h nd of Scion 4.3, o hir impac on ym rliabiliy a wll a hir conqun rjuvnaion chdul can b compard. igur 9 how h DT modl of h cloud-bad ym in ha bad on Scnario. or h am raon a in ha, h ym rliabiliy can b calculad a in Eq. 9. According o Eq. 5, U S3 and U S4 can b calculad a in Eq. and Eq., rpcivly. R UOR U S 3 U S 3 U S 4 9 U S 3 U S U S' U S 4 U S U S '
A Sofwar Rliabiliy Modl for Cloud-Bad Sofwar Rjuvnaion Uing Dynamic aul Tr 7 No ha in Eq. -, U S, U S, U S and U S can b calculad in a imilar way a in Eq. 7-8. ig. 9. DT modl of h cloud-bad ym in ha Scnario Th rliabiliy analyical rul for Scnario ar lid in Tabl. Th abl how ha h rliabiliy hrhold.99 i rachd vry 8 day. nc, boh h applicaion and daaba rvr ar rjuvnad a h nd of ha. A ha ha a 3-minu im duraion, w calcula h ym rliabiliy a 5,, and 3 minu in ha o illura how ym rliabiliy may chang during h rjuvnaion proc. rom h abl, w can ha h ym rliabiliy i kp vry high during h raniion. Afr 3 minu, h nwly dployd rvr complly ak ovr h ym, and h rvr o b rjuvnad ar hu down. Whn hi happn, h ym rurn o i iniial a, and ar a nw lif cycl wih a vry high iniial rliabiliy. According o Tabl, w ugg ha h ym hould b rjuvnad vry 8 day in ordr o mainain h ym rliabiliy abov h hrhold. By furhr looking ino Tabl, w noic ha whn h ym rliabiliy rach.99 afr 8 day, h rliabiliy of h daaba rvr ubym i alway lowr han ha of h applicaion rvr ubym. Thi ugg ha w may fir rjuvna h mo criical componn wih h low rliabiliy.g., h daaba rvr in hi ca udy wihou acrificing h ym rliabiliy oo much. Thn w wai unil h ym rliabiliy rach h hrhold again, and rjuvna h applicaion rvr nx, a hy now bcom h mo criical componn. Thi i xacly wha happn in h rjuvnaion chdul of Scnario, whr h applicaion rvr and h daaba rvr ar rjuvnad alrnaivly. igur how h DT modl of h cloud-bad ym in ha for on of h wo ca in Scnario, whr only h daaba rvr ar rjuvnad. In hi ca, h ym rliabiliy can b calculad a in Eq., and U S and U S4 can b calculad in a imilar way a in Eq. 7 and Eq., rpcivly.
8 J. Rahm &. Xu Tabl. Sym rliabiliy wih ofwar rjuvnaion Scnario ha Tim Day App Srvr DB Srvr Rliabiliy Rliabiliy Sym Rliabiliy.9999875.99998.99996756 5.999686.99957.99994568.998745.99885.996834333 8.99644.9944.99778 8.35.99999999999.99999999999.99999999999 8.69.99999999999.99999999999.99999999999 8.39.99999999999.99999999998.99999999997 8.8.99999999998.99999999994.9999999999............... 73.9999875.99998.99996756 77.999686.99957.99994568 8.998745.99885.996834333 9.99644.9944.99778 9.35.99999999999.99999999999.99999999999 9.69.99999999999.99999999999.99999999999 9.39.99999999999.99999999998.99999999997 9.8.99999999998.99999999994.9999999999 9.9999875.99998.99996756 95.999686.99957.99994568.998745.99885.996834333 8.99644.9944.99778 8.35.99999999999.99999999999.99999999999 8.69.99999999999.99999999999.99999999999 8.39.99999999999.99999999998.99999999997 8.8.99999999998.99999999994.9999999999 9.9999875.99998.99996756 3.999686.99957.99994568 8.998745.99885.996834333 ig.. DT modl of h cloud-bad ym in ha Scnario, rjuvna daaba rvr only R UOR U S U S U S 4
A Sofwar Rliabiliy Modl for Cloud-Bad Sofwar Rjuvnaion Uing Dynamic aul Tr 9 Th ym rliabiliy for h ohr ca in Scnario, whr only h applicaion rvr ar rjuvnad, can b calculad in a imilar way. Tabl how h rliabiliy analyical rul for Scnario. A h nd of ach ha, h rvr ubym wih i rliabiliy markd by > i h on o b rjuvnad. or xampl, afr 8 day, h daaba rvr ar rjuvnad, and afr 7 day, h applicaion rvr ar rjuvnad. Tabl. Sym rliabiliy wih ofwar rjuvnaion Scnario ha Tim App Srvr Day Rliabiliy DB Srvr Rliabiliy Sym Rliabiliy.9999875.99998.99996756 5.999686.99957.99994568.998745.99885.996834333 8.99644 >.9944.99778 8.35.99645.99999999999.996449999 8.69.9964.99999999999.99649999 8.39.99638.99999999998.9963799998 8.8.99635.99999999994.9963499994.9955.999969.99577465 5.9955.99949.9968856 7 >.994.99844.99 7.35.99999999999.99844.9984499999 7.69.99999999999.998439.99843899999 7.39.99999999999.998437.99843699999 7.8.99999999998.998435.99843499998 3.999884.99765.99749567 35.9999.99468.9937478645 39.9985 >.9994.996464 39.35.9984.99999999999.998399999 39.69.998.99999999999.99899999 39.39.998.99999999997.9989999997 39.8.99899.99999999993.9989899993............... 85.998486.999969.9984687 9.996853.99949.9959597 95.99467.99765.99955748 96.9947.99684.99994669 97 >.99365.9963.99 97.35.99999999999.9963.996899999 97.69.99999999999.99698.996899999 97.39.99999999998.99694.99679799998 97.8.99999999996.9969.99679399996.999884.99468.994588 5.9999 >.9994.9938465 5.35.9999.99999999999.99874399999 5.69.999895.99999999999.9987499999 5.39.99988.99999999997.99873999997 5.8.999868.9999999999.9987379999.9979.99957.99747753 5.99644.99885.9943657574 9 >.9947.9963.9953553
J. Rahm &. Xu W now illura h uggd rjuvnaion chdul for boh Scnario and Scnario a in ig.. In h figur, h ar of rjuvnaion i indicad by a uddn incrmn of h ym rliabiliy. By comparing h wo rjuvnaion chdul, w can ha during 9 day, Scnario ha 6 rjuvnaion proc ha rquir u o rjuvna boh of h applicaion and daaba rvr. On h ohr hand, Scnario ha 9 rjuvnaion proc ha only rquir u o rjuvna ihr h applicaion rvr or h daaba rvr ach im. I i ay o ha Scnario rul in l managmn of h rvr in ordr o kp h ym rliabiliy abov h.99 hrhold during h 9 day. Suppo h rjuvnaion of h applicaion rvr ha h am co a h ha of h daaba rvr, by uing h rjuvnaion chdul dfind in Scnario, h co can b rducd by 6-9/6 5%, comparing o h rjuvnaion chdul ud in Scnario. ig.. Rjuvnaion chduling for h cloud-bad ym Scnario v. Scnario 6. Concluion and uur Work In hi papr, w propo a rliabiliy-bad approach o ablihing co-ffciv ofwar rjuvnaion chdul for cloud-bad ym. Th ym rquir h uag of ho par componn during normal running im, and cold par componn during h rjuvnaion proc in ordr o kp h ym rliabiliy abov a prdfind hrhold. By modling h rliabiliy of a cloud-bad ym uing DT, w ar abl o driv h rliabiliy funcion for ach ofwar componn a wll a h whol ym. W dfin wo pha for h ofwar rjuvnaion, and dicu abou wo cnario of h rjuvnaion proc in ha. Th analyical rul of our ca udy how ha Scnario i mor co-ffciv han Scnario. or fuur work, w will xnd our currn work for componn wih non-conan failur ra. W will adop a maurmn-bad approach o collcing mpirical daa in
A Sofwar Rliabiliy Modl for Cloud-Bad Sofwar Rjuvnaion Uing Dynamic aul Tr ordr o drmin h pdf of h major ofwar componn, h rliabiliy of which i affcd by ofwar aging. Sofwar ool will b dvlopd for modling and analyzing h rliabiliy of cloud-bad ym, a wll a driving ffciv rjuvnaion chdul. In addiion, w will xpand and apply our propod approach in mor complx cloud nvironmn, uch a cloud-bad ym uing Amazon Wb Srvic AWS. Comparaiv analyi of ym prformanc will b conducd for our propod approach a wll a xiing faul-olran ragi ha improv h rliabiliy of cloud applicaion [6]. inally, w nviion modling and analyzing cloud-bad ym wih aciv andby par componn, which can har workload wih h primary on [7], a a fuur, and mor ambiiou rarch dircion. Acknowldgmn W hank all anonymou rfr for h carful rviw of hi papr, and h many uful commn and uggion ha graly hlpd u o improv h prnaion and h qualiy of h papr. Rfrnc. K. V. Vihwanah and N. Nagappan, Characrizing cloud compuing hardwar rliabiliy, in roc. of h ACM ympoium on Cloud compuing SoCC, Indianapoli, IN, USA, Jun -,, pp. 93-4.. D. ich and. Xu, A RAID-bad cur and faul-olran modl for cloud informaion orag, Inrnaional Journal of Sofwar Enginring and Knowldg Enginring IJSEKE 35 3 67-654. 3. M. Rauand and A. øyland, Sym Rliabiliy Thory: Modl, Saiical Mhod, and Applicaion, Scond Ediion, obokn, Nw Jry, USA, John Wily & Son, Inc., 4. 4.. ham, Sym Sofwar Rliabiliy, Springr Sri in Rliabiliy Enginring, Springr- Vrlag London, 6. 5. A. Somani and N. Vaidya, Undranding faul olranc and rliabiliy, IEEE Compur 34 997 45-5. 6. E. Marhall, aal rror: how pario ovrlookd a cud, Scinc 5555 99 347. 7. M. Gro, R. Maia and K. S. Trivdi, Th fundamnal of ofwar aging, in roc. of h Inrnaional Workhop on Sofwar Aging and Rjuvnaion WoSAR 8, ISSRE, Sal, WA, USA, Novmbr -4, 8, pp. -6. 8. Y. uang, C. Kinala, N. Koli and N. ulon, Sofwar rjuvnaion: analyi, modul and applicaion, in roc. of h Twny-ifh Inrnaional Sympoium on aul-tolran Compuing TCS 95, aadna, CA, USA, Jun 7-3, 995, pp. 38-39. 9. J. Rahm and. Xu, Rliabiliy-bad ofwar rjuvnaion chduling for cloud-bad ym, in roc. of h 7h Inrnaional Confrnc on Sofwar Enginring and Knowldg Enginring SEKE 5, iburgh, USA, July 6-8, 5, pp. 98-33.. V. Calli, R.E. arpr and. idlbrgr, al., roaciv managmn of ofwar aging, IBM Journal of Rarch and Dvlopmn 45 3-33.. L. Jiang and G. Xu, Modling and analyi of ofwar aging and ofwar failur, Journal of Sym and Sofwar 84 7 59-595.. A. Bobbio, M. Srno and C. Anglano, in graind ofwar dgradaion modl for opimal rjuvnaion polici, rformanc Evaluaion 46 45-6.
J. Rahm &. Xu 3. K. Vaidyanahan, D. Slvamuhu and K. S. Trivdi, Analyi of inpcion-bad prvniv mainnanc in opraional ofwar ym, in roc. of h IEEE Sympoium on Rliabl Diribud Sym SRDS, Suia, Japan, Ocobr 3-6,, pp. 86-95. 4. T. Dohi, K. Gova-opojanova and K. S. Trivdi, Saiical non-paramric algorihm o ima h opimal ofwar rjuvnaion chdul, in roc. of Inrnaional Sympoium on Dpndabl Compuing, Lo Angl, CA, USA, Dcmbr, pp. 77-84. 5. V.. Koura and A. N. lai, Applying ofwar rjuvnaion in a wo nod clur ym for high availabiliy, in roc. of h Inrnaional Confrnc on Dpndabiliy of Compur Sym, Szklarka, orba, May 5-7, 6, pp. 75-8. 6.. Machida, A. Andrzjak, R. Maia and E. Vicn, On h ffcivn of Mann-Kndall for dcion of ofwar aging, in roc. of h IEEE Inrnaional Sympoium on Sofwar Rliabiliy Enginring Workhop ISSREW, aadna, CA, USA, Novmbr 4-7, 3, pp. 69-74. 7. M. Grok, L. Li, K. Vaidyanahan and K. S. Trivdi, Analyi of ofwar aging in a wb rvr, IEEE Tran. on Rliabiliy 553 6 4-4. 8. J. Guo, Y. Ju, Y. Wang and X. Li, Th prdicion of ofwar aging rnd bad on ur innion, in roc. of h IEEE Youh Confrnc on Informaion Compuing and Tlcommunicaion YC-ICT, Bijing, China, Novmbr 8-3,, pp. 6-9. 9. D. Corono, R. Nalla and R. iranuono, I ofwar aging rlad o ofwar mric? in roc. of h IEEE Scond Inrnaional Workhop on Sofwar Aging and Rjuvnaion WoSAR, San Jo, CA, USA, Novmbr,, pp. -6... Machida, D. Kim and K. Trivdi, Modling and analyi of ofwar rjuvnaion in a rvr virualizd ym, in roc. of h IEEE Scond Inrnaional Workhop Sofwar Aging and Rjuvnaion WoSAR, San Jo, CA, USA, Novmbr,, pp. -6.. T. Thin, S.-D. Chi and J. S. ark, Availabiliy modling and analyi on virualizd cluring wih rjuvnaion, Inrnaional Journal of Compur Scinc and Nwork Scuriy IJCNS 89 8 7-8.. J. Barr, A. Narin and J. Varia, Building faul-olran applicaion on AWS, Amazon Wb Srvic AWS, Amazon, Ocobr, rrivd on July 5, 5, from hp://mdia.amazonwbrvic.com/aws_building_aul_tolran_applicaion.pdf 3.. Xu, L. Xing and R. Robidoux, DRBD: dynamic rliabiliy block diagram for ym rliabiliy modling, Inrnaional Journal of Compur and Applicaion IJCA 3 9 3-4. 4. R. Robidoux,. Xu, L. Xing and M.C. Zhou, Auomad modling of dynamic rliabiliy block diagram uing colord ri n, IEEE Tran. on Sym, Man, and Cybrnic, ar A: Sym and uman SMC-A 4 337-35. 5. J. B. Dugan, S. J. Bavuo and M. A. Boyd, Dynamic faul-r modl for faul-olran compur ym, IEEE Tran. on Rliabiliy 43 99 363-377. 6. M. Lu and. Yu, A faul olran ragy in hybrid cloud bad on QN prformanc modl, in roc. of h Inrnaional Confrnc on h Informaion Scinc and Applicaion ICISA, aaya, Thailand, Jun 4-6, 3, pp. -7. 7. L. uang and Q. Xu, Lifim rliabiliy for load-haring rdundan ym wih arbirary failur diribuion, IEEE Tran. on Rliabiliy 59 39-33.