The 3rd Internatonal Conference on Grd and Pervasve Computng - Worshops esource Schedulng n Destop Grd by Grd-JQA L. Mohammad Khanl M. Analou Assstant professor Assstant professor C.S. Dept.Tabrz Unversty C.E. Dept. IUST Tabrz, Iran Tehran, Iran l-khanl@tabrzu.ac.r Analou@ust.ac.r Abstract In destop grd computng, resource schedulng s an mportant ssue. In ths paper, we propose a QoS-based resource schedulng algorthm that fnds the best match between s and resources whle meetng QoS requests. We descrbe Grd-JQA, our proposed archtecture supportng resource schedulng n destop grd envronments, and our current mplementaton of ths archtecture. In ths wor we propose an aggregaton formula for the QoS parameters. The formula s a unt less combnaton of the parameters together wth weghtng factors. Three heurstc approaches have been desgned and compared va smulatons to match s whch tae nto account the QoS requested by the s, and at the same tme, to mnmze the s mae span as much as possble. Also, an optmum method based on the performance metrc has been desgned to compare the performance of the heurstcs developed. We compare our wor wth Mn_Mn, Max_Mn and heurstcs. The results of a smulaton are provded to evaluate the man dea of the paper.. Introducton A resource manager s one of the most crtcal components of the grd mddleware [] snce t s responsble for resource management that provdes a resources selecton and ob schedulng. Therefore, resource dscovery, resource selecton, and ob schedulng that nfluence computng performance are mportant ssues n grd computng. Grd servces are often expected to meet some mnmum levels of qualty of servce (QoS) for a desrable operaton. esource managements can encompass not only a commtment to perform a but also commtments to level of performance or qualty of servce [3]. Thus approprate mechansms are needed for montorng and regulatng the usage of system resource to meet QoS requrements [, 3, 9]. In ths paper, we propose a resource management servce that automatcally selects optmal resources and requests resource allocaton. The set of optmal resources conssts of the resources that guarantee to mnmze the total executon tme of a gven applcaton. We also propose a fault tolerance servce that detects resource falures, devatons from requred QoS levels, and excessve resource usages and resolves detected falures. Ths paper s organzed as follows: In Secton 2, we descrbe prevous wors about resource management and fault tolerance servces n grd computng. In Secton 3, we explan the archtecture of our Grd-JQA archtecture. In Secton 4, we smulate our proposed soluton and we compare our wor wth Mn_Mn, Max_Mn and Suffrage heurstcs. Fnally, we conclude the paper n Secton 5. 2. elated wors In grd computng, there are two approaches for managng resources for ob executon [4, 7]. One s that a user drectly searches the resources for ob executon usng an nformaton servce and then requests a local resource manager to do resource allocaton. The other s to use a resource manager, as s used n Condor-G [] and Legon [2]. Condor-G [] leverages software from Globus and Condor [6] to allow users to harness mult doman resources. In Condor, the matchmaer uses a very generc matchmang algorthm, called the Gang-Matchng [6]. Prevous resource management approaches 978--7695-377-9/8 $25. 28 IEEE DOI.9/GPC.WOKSHOPS.28.27 63 Authorzed lcensed use lmted to: UNIVESIDADE FEDEAL DO IO GANDE DO SUL. Downloaded on October 9, 28 at 9:5 from IEEE Xplore. estrctons apply.
employed n Globus [2], Condor, and others [, 2, 7, 8] do not solve the problem of selectng optmal resources that occurs when the number of resources that satsfy user s demand s much more than the number of resources that the user needs. Also, grd applcatons, mddleware, tools, or systems such as Globus [2], Condor-G, Nmnod- G [], Nnf-G [6], and others [8] have been addressng ether fault tolerance ssues and do not provde a generc mechansm for resolvng falures, or dfferent applcatons have been adoptng ad hoc fault tolerance mechansms whch can not be reused, nor shared among them. In Globus, a notceable flaw s the lac of support for fault tolerance [2, 3]. To date, grd applcatons have ether gnored falure ssues or have mplemented fault detecton and response behavor completely wthn the applcaton [5]. The support for fault tolerance conssted manly of fault detecton servces or a montorng system. HBM [9] desgns and mplements a local montor and a data collector for provdng a fault detecton servce of processes and computers for applcatons developed wth the Globus. Although HBM dd not provde a fault management servce, t can detect lmted falures,.e., a process falure and a computer falure. Whle NWS (Networ Weather Servce) [2] montors avalable networ bandwdth, memory, CPU avalablty, free memory sze, and free ds space sze, t cannot provde a fault detecton servce and a fault management servce. eferences [6, 8, 9, ] propose a falures detecton servce or a fault tolerance servce n grds, but they do not provde a mechansm for handlng the detected falure and the problem of QoS s not addressed. Therefore, n ths paper, we propose a resource manager for selectng optmal resources and a fault manager for a fault tolerance servce. Our resource manager automatcally selects the set of optmal resources among the set of canddate resources and requests resource allocaton, so t provdes convenence for a user to execute a ob. It also guarantees effcent and relable ob executon through a fault tolerance servce. The proposed fault tolerance servce detects falures event by fault detector by montorng processes, processors, and networs and resolves detected falures through ob duplcaton. 3. The Archtecture of Grd-JQA The Grd Java based Qualty of servce management by Actve database (Grd-JQA) s a framewor that provdes worflow management for qualty of servce on dfferent types of resources, ncludng networs, CPUs, and dss [3,4,5]. It also encourages Grd customers to specfy ther qualty of servce needs based on ther actual requrements. The man goal of ths system s to provde seamless access to users for submttng obs to a pool of heterogeneous resources, and at the same tme, dynamcally montorng the resource requrements for executon of applcatons. Fgure shows the archtecture of the proposed Grd-JQA. The Actve Grd Informaton Server (AGIS) and fault detector are connected wth Globus Toolt. In fgure 2, the AGIS cooperates wth a grd portal, SL parser, a fault detector and GAM. A grd portal provdes an nterface for a user to launch an applcaton that wll utlze the resources and servces provded by the grd. For scalablty of archtecture, we use mult level AGIS. Mult-level AGIS are created by connectng AGIS herarchcally. The ey to accomplshng ths s n Grd-JQA s nherent archtecture, whch allows an AGIS to behave le a resource towards a hgher level AGIS. The user whch goes to a hgher level AGIS has access to the entre computng power of the Grd, whereas the clent connected nto a lower-level AGIS has only access to the computng power managed by the lower-level AGIS. In ths fashon Grds can be scaled to an nfnte number of levels. To execute a ob wth the Grd-JQA, a user descrbes a resource type, a resource condton, and the number of resources usng SL. SL s the specfcaton language used by the Globus Toolt to descrbe confguraton and servce requrements [3]. Then the user sends SL to a Grd-JQA and the SL parser extracts the resource type and resource condton and sends them to an AGIS. The resources are processor, networ, and memory. It assgns weght for each parameter that shows the mportance of the parameter. Let us assume that a grd nfrastructure conssts of N s. 64 Authorzed lcensed use lmted to: UNIVESIDADE FEDEAL DO IO GANDE DO SUL. Downloaded on October 9, 28 at 9:5 from IEEE Xplore. estrctons apply.
Portal Our research area SL esource equest SL Parser Advertse esource type, esource condton etc Upper AGIS Lower AGIS equest esult Actve Grd Informaton Server equest Optmal set of resources Fault Detector Alert Fault Event Query esource Allocaton equest State Informaton Fgure - The archtecture of Grd-JQA esource Advertsement GAM The request s showed by vector of QoS parameters q, =, 2,..., N and the weghts for the parameters as shown n equatons () and (2)., q2 q q, q =, () W = w, w2,, w w w = (2) = Each weght s used to show the mportance of each parameter. For example, f CPU s mportant for one, the clent wll set for the CPU weght and zero for the others. GAM advertses resource level capabltes to AGIS. When the resource capabltes are changed, the fault detector nforms the AGIS by fault event. Let us assume that a Grd nfrastructure conssts of M resources. The capabltes of a Grd resource s expressed wth the resource parameter vector q, =, 2,..., M as t appears n equaton (3)., 2 q, = q q, q (3) The elements of whch q, =, 2,...,, ndcate ndependent capabltes of the th resource that affect ts performance. s Note that, q e Tas and have the same unt, and s resource manager compares q e wth q for each Tas from to. If the resource provdes the requrements needed for the, t can be chosen as the best matched resource. We ntroduce satsfy operator. T means that the resource can satsfy the T and guarantee QoS parameters. The satsfacton relaton s provded n such a way that the memory appears n separate facton. It s because that the shortage of the memory blocs the executon and ts exes mae no help. Other QoS parameters are aggregated. The aggregaton s ustfed from the fact that CPU and bandwdth do not have the mentoned restrctons for the memory. q = T = T T ( ) w qmem q T mem = the number of QoS parameters (4) The soluton proposed here s that, we normalze the resource capabltes by the clent requrements 65 Authorzed lcensed use lmted to: UNIVESIDADE FEDEAL DO IO GANDE DO SUL. Downloaded on October 9, 28 at 9:5 from IEEE Xplore. estrctons apply.
T and therefore the summaton wll be possble whereas the unts of each parameter are dfferent such as byte, bps, MFlops. When the resource capablty exceeds the demand, s more than one. So a T plus n some QoS parameter helps the. But a plus for memory parameters does not help the so we separate memory parameter from other parameters. Also, the clent ntroduces a weght for each parameter to show the mportance of the parameter. The weghts range from to and the sum of all the weghts s equal to one. We multply the weght nto as mentoned n (4). Fnally the best match T resource wll be the one that can provde the maxmum q value for T w T. q = 4. Smulatons In smulaton we use followng eght methods for matchng: ) The General method that matches resources wth s n frst come frst servce (FCFS) strategy (frst to frst free resource) regardless of QoS parameters. 2) The Optmum method, selects the best resources for s [3]. The best resource s the one that has q maxmum value for T w T. = 3) The, our proposed soluton, uses the threshold n (4) nstead of and nstead of fndng the maxmum for each matchng n Optmum method. 4) Dup_, our proposed soluton that adds new feature to. Ths feature s duplcate executon of delayed s (.e. executed n wea resources). 5) The Wat method. In all other methods, f the AGIS does not fnd the proper resource, t wll assgn the best avalable resource to. So the wll not wat for the proper resource. But n the Wat method s wat untl the proper resource s found. 6) The Mn_Mn heurstc consders the whole set of unmapped s. It fnds the set of mnmum completon tmes (MCT) correspondng to each unmapped. It then selects the wth the overall mnmum MCT from the set to be mapped next. It contnues untl all the unmapped s are mapped. Mn-mn consders all unmapped s whle MCT consders ust one. The ntuton behnd mn-mn heurstc s that at each mappng step the current maespan ncreases the least. 7) The Max-Mn heurstc s smlar to mn-mn, the only dfference beng that after the set of MCT s calculated the overall maxmum MCT value s selected next for mappng. The ntuton s that long s can be overlapped wth shorter s n case of max-mn. 8) The heurstc stores the sufferage value for each of the unmapped s. value s the dfference between the mnmum completon tme and the second mnmum completon tme. The havng the largest sufferage value s selected next for mappng. In ths smulaton, we assume that the CPU cycle s from to 6, and all resources have same amount of memory and bandwdth. The number of s n applcaton s 2 and the CPU weght s.9. All s requre same amount of memory and bandwdth. We choose the CPU cycle requrement of applcaton n range to 6 whch s smlar to the range chosen for resource s CPU cycle power. We do smulaton 6 tmes and each tme we consder the amount of CPU cycle requrement, 2, 3, 4, 5 and 6. For each CPU cycle requrement, the smulaton s repeated tmes and fnally we use the average turnaround tme. Fgure 2 shows the result of smulaton for 6 CPU cycle requrement wth resource number from (half of the number) to 6 (three tme more than the number). After ths, n all fgures, horzontal axs ndcates the number of the resources, and the vertcal axs ndcates the turnaround tme n msec. 25 2 5 5 Optmum Dup_ Wat MnMn MaxMn Fgure 2-6 CPU cycle request The smulaton results show that: ) Executng n wea resource n comparson wth watng to proper resource produces less turnaround tme. 2) Dup_ has mnmum turnaround tme. Because there are some s executed n wea resources, they can fnd chance of executng n strong resources. So the turnaround tme s decreased. 3) The Dup_ average turn around tme s 6.7% less than Mn_Mn heurstc. 66 Authorzed lcensed use lmted to: UNIVESIDADE FEDEAL DO IO GANDE DO SUL. Downloaded on October 9, 28 at 9:5 from IEEE Xplore. estrctons apply.
4) The Dup_ average turn around tme s 2.67% greater than Max_Mn and heurstcs. Based on these assumptons and algorthms, we do the smulaton. The results of average turn around tme are shown n fgures 3, 4, 5, 6 and 7. From the fgures, we can see the average turn around tme for Dup_ method are much lower than that for other practcal methods, especally when the CPU request s hgh. By assgnng the proper value to threshold, Dup_ becomes same as Optmum method. 4 2 8 6 4 Optmum Dup_ Wat MnMn MaxMn 9 8 7 6 5 4 3 2 Fgure 6-2 CPU cycle request 7 6 5 4 3 2 Optmum Dup_ Wat MnMn MaxMn Optmum Dup_ Wat MnMn MaxMn 2 Fgure 3-5 CPU cycle request 4 2 8 6 4 2 Fgure 4-4 CPU cycle request 2 8 6 4 2 Fgure 5-3 CPU cycle request Optmum Dup_ Wat MnMn MaxMn Optmum Dup_ Wat MnMn MaxMn Comparng above fgures, we can fnd followng results: ) Most of the tme, has less turnaround tme than Wat method because executng n wea resource s better than watng for proper resource. Fgure 7- CPU cycle request 2) Comparng and Optmum method shows that they are same for strong request. The dfference between and Optmum s for low requests that Optmum method has less turnaround tme than. But two ponts should be noted: Frst the user has low request so the executon tme s longer and f the user wants less executon tme, t should requre more capabltes. Second, n ths smulaton the threshold s one, but as explaned before, the threshold can be changed dynamcally n related to envronment changes so can be smlar to Optmum method. 3) Most of the tme Dup_ has less turnaround tme n comparson wth other methods. And also t has less cost n comparson wth retryng and chec pontng. 4) Dup_ has at least 45% mprovment over the general method whch uses the frst come frst servce (FCFS) strategy. So Dup_ s sutable and relable method for matchng n grd envronment. 6) Dup_ average turn around tme s 3.35 % less than Mn_Mn, average turn around tme s 6.% less than Mn_Mn. Ths s very mportant result, because Mn_Mn has extra nput. The Mn-Mn heurstc consders the whole set of unmapped s. It fnds the set of mnmum completon tmes (MCT) correspondng to each unmapped. It then selects the wth the overall mnmum MCT from the set to be mapped next. It contnues untl all the unmapped s are mapped. But our heurstcs only get the advertsement nputs and requrements. 5. Conclusons 67 Authorzed lcensed use lmted to: UNIVESIDADE FEDEAL DO IO GANDE DO SUL. Downloaded on October 9, 28 at 9:5 from IEEE Xplore. estrctons apply.
In ths paper, we propose a resource management servce that provdes the optmal resource selecton and a fault tolerance servce. The contrbutons of ths wor are as follows: () We propose a resource manager for optmal resource selecton. The resource manager consders the requrements of ob and resource capabltes. The resource manager selects the optmal resources that guarantee the optmal performance whle turn around tme s chosen as metrc for performance evaluaton. () We present a fault management servce to guarantee that the submtted obs would be completed relably and effcently. We perform the smulaton to measure the performance mprovement due to optmal resource selecton and ob duplcaton. () Three ways are proposed for resource selecton: Optmum,, Dup. Wth our algorthms, only one resource s decded automatcally for any request f multple avalable resources are found, resultng n no need to as the user manually to select the resource from a large lst of avalable matchng resources. The smulaton shows that Dup_ average turn around tme s 3.35 % less than Mn_Mn. The value of the threshold changes dynamcally related to envronmental changes such as the number of dle resources, whereas the exstng mappng systems lac the ablty of nexact matchng. In the future, we plan to mplement our resource manager as a part of the Globus Toolt and mae varous experments for measurng effcency of the resource manager and ob duplcaton. Also, we wll nvestgate ways for selectng the best value of threshold.. eferences [] I. Foster, A. oy, V. Sander, A qualty of servce archtecture that combnes resource reservaton and applcaton adaptaton, 8 th Internatonal Worshop on Qualty of Servce, 2. [2] I. Foster, C. Kesselman, Globus: a metacomputng nfrastructure toolt, Int. J. Supercomputer Appl. (2), 997. [3] I. Foster, C. Kesselman, The Grd 2: Blueprnt for a New Computng Infrastructure, Morgan Kaufmann Publshers, Los Altos, CA, 24. [4] J. Chen and Y. Yang. A Taxonomy of Grd Worflow Verfcaton and Valdaton, Concurrency and Computaton Practce and Experence, Wley, 28, 2(4), 347-36. [5] N.T. Anh, Integratng fault-tolerance technques n grd applcatons, Ph.D. Dssertaton, August 2. [6] Y. Tanaa, H. Naada, S. Seguch, T. Suzumura, S. Matsuoa, Nnf-G: a reference mplementaton of PCbased programmng mddleware for grd computng, J. Grd Computng (), 23. [7] J. Chen and Y. Yang, Adaptve Selecton of Necessary and Suffcent Checponts for Dynamc Verfcaton of Temporal Constrants n Grd Worflow Systems. ACM Transactons on Autonomous and Adaptve Systems (TAAS), 2(2): Artcle 6, 27. [8] A. Iamntch, I. Foster, A problem-specfc faulttolerance mechansm for asynchronous, dstrbuted systems, Proceedngs of the 2 Internatonal Conference on Parallel Processng, 2. [9] A. Waheed, W. Smth, J. George, J. Yan, An nfrastructure for montorng and management n computatonal grds, Proceedngs of the 5th Worshop on Languages, Complers, and un-tme Systems for Scalable Computers, March 2. [] J. Frey, I. Foster, M. Lvny, T. Tannenbaum, S. Tuece, Condor-G: A Computaton, Management Agent for Mult- Insttutonal Grds, Unversty of Wsconsn, Madson, 2. []. Buyya, D. Abramson, J. Gddy, Nmrod/G: an archtecture of a resource management and schedulng system n a global computatonal grd, HPC Asa May 2. [2] A. Grmshaw, W. Wulf, Legon a vew from 5, feet, Proceedngs of 5th IEEE Symposum on Hgh Performance Dstrbuted Computng, 996. [3] L.M.Khanl, M.Analou, An Approach to Grd esource Selecton and Fault Management Based on ECA ules, Future Generaton Computng System, 27, do:.6/.future.27.5.2, 27. [4] L.M. Khanl, M. Analou, Grd-JQA a New Archtecture for QoS-guaranteed Grd Computng System, Feb 5-7, PDP26, France, 26. [5] L.M. Khanl, M.Analou, Grd-JQA : Grd Java based Qualty of servce management by Actve database, 4 th Australan Symposum on Grd Computng and e-esearch, AusGrd 26. [6]. aman, M. Lvny, M. Solomon, esource management through multlateral matchmang, Proceedngs of the Nnth IEEE Symposum on Hgh Performance Dstrbuted Computng, 2. [7].A. Moreno, Job schedulng and resource management technques n dynamc grd envronments, The Proceedngs of the st European Across Grds Conference, 22. [8] L. Yang, J.M. Schopf, I. Foster, Conservatve schedulng: usng predcted varance to mprove schedulng decsons n dynamc envronments, The Proceedngs of the ACM/IEEE SC23 Conference, 23. [9] P. Stellng, I. Foster, C. Kesselman, C. Lee, G. von Laszews, A fault detecton servce for wde area dstrbuted computatons, Proceedngs of 7th IEEE Symposum on Hgh Performance Dstrbuted Computng, 998. [2] M. Swany,. Wols, epresentng dynamc performance nformaton n grd envronments wth the networ weather servce, 2nd IEEE Internatonal Symposum on Cluster Computng and the Grd (CCGrd22), Berln, Germany, May 22. 68 Authorzed lcensed use lmted to: UNIVESIDADE FEDEAL DO IO GANDE DO SUL. Downloaded on October 9, 28 at 9:5 from IEEE Xplore. estrctons apply.