Unvesty of Nebaska - Lncoln DgtalCoons@Unvesty of Nebaska - Lncoln CSE Confeence and Woksho Paes Coute Scence and Engneeng, Deatent of 2013 Real-Te Schedulng n MaReduce Clustes Chen He Unvesty of Nebaska-Lncoln, che@cse.unl.edu Yng Lu Unvesty of Nebaska - Lncoln, ylu@cse.unl.edu Davd Swanson Unvesty of Nebaska - Lncoln, dswanson@cse.unl.edu Follow ths and addtonal woks at: htt://dgtalcoons.unl.edu/cseconfwok He, Chen; Lu, Yng; and Swanson, Davd, "Real-Te Schedulng n MaReduce Clustes" (2013). CSE Confeence and Woksho Paes. Pae 251. htt://dgtalcoons.unl.edu/cseconfwok/251 Ths Atcle s bought to you fo fee and oen access by the Coute Scence and Engneeng, Deatent of at DgtalCoons@Unvesty of Nebaska - Lncoln. It has been acceted fo ncluson n CSE Confeence and Woksho Paes by an authozed adnstato of DgtalCoons@Unvesty of Nebaska - Lncoln.
2013 IEEE Intenatonal Confeence on Hgh Pefoance Coutng and Councatons & 2013 IEEE Intenatonal Confeence on Ebedded and Ubqutous Coutng Real-Te Schedulng n MaReduce Clustes Chen He Yng Lu Davd Swanson Coute Scence & Engneeng Deatent, Unvesty of Nebaska-Lncoln Lncoln NE, Unted States {che, ylu, dswanson}@cse.unl.edu Abstact MaReduce has been wdely used as a Bg Data ocessng latfo. As t gets oula, ts schedulng becoes nceasngly otant. In atcula, snce any MaReduce alcatons eque eal-te data ocessng, schedulng ealte alcatons n MaReduce envonents has becoe a sgnfcant oble. In ths ae, we ceate a novel eal-te schedule fo MaReduce, whch ovecoes the defcences of an exstng schedule. It avods accetng jobs that wll lead to deadlne sses and oves the cluste utlzaton. We leent ou schedule n Hadoo syste and exeental esults show that ou schedule ovdes deadlne guaantees fo acceted jobs and acheves good cluste utlzaton. Keywods: eal-te schedulng; MaReduce; cluste utlzaton I. INTRODUCTION MaReduce s a faewok used by Google fo ocessng huge aounts of data n a dstbuted envonent [1] and Hadoo [2] s Aache s oen souce leentaton of the MaReduce faewok. Due to the slcty of the ogang odel, MaReduce s wdely used fo any alcatons [9]. Event logs fo Facebook s webste ae oted nto a Hadoo cluste evey hou, whee they ae used fo a vaety of alcatons, ncludng analyzng usage attens to ove ste desgn, detectng sa, data nng and ad otzaton [3]. The New Yok Tes ents a Hadoo cluste fo Aazon EC2 [9] to conduct lage scale age convesons [9]. Hadoo s also used to stoe and ocess tweets, log fles, and any othe tyes of data geneated acoss Twtte [9]. As MaReduce clustes get oula, the efoance odelng [24][25][26] and schedulng becoe nceasngly otant. Yahoo! develoed the caacty schedule to shae a Hadoo cluste aong ultle gous and uses [10]. Facebook s fa schedule enabled fa shang n MaReduce [3]. In atcula, snce any MaReduce alcatons [9], ncludng soe of the afoeentoned ones (e.g., onlne data analytcs fo sa detecton and ad otzaton), eque eal-te data ocessng, schedulng eal-te alcatons n MaReduce envonents has becoe a sgnfcant oble [11][12][13][18][19] [20][23]. Polo et al. [11] develoed a soft eal-te schedule that allows efoance-dven anageent of MaReduce jobs. Dong et al. [13] extended the wok by Polo et al., whee a two-level MaReduce schedule was develoed to schedule xed soft eal-te and non-eal-te jobs accodng to the esectve efoance deands. Although takng MaReduce jobs QoS nto consdeaton, ost exstng aoaches [11] [13][18][19][20] do not ovde deadlne guaantees fo the jobs. Feguson et al. develoed Jockey [23] to ovde guaanteed job latency n data aallel clustes. The aoach, howeve, can only be aled to contol ecung jobs. Kc and Anyanwu [12] develoed a Deadlne Constant schedule, ang to ovde te guaantees fo MaReduce jobs. Howeve, the Deadlne Constant schedule has seveal defcences, whch ay lead to not only esouce undeutlzaton but also deadlne volatons (lease efe to Secton III fo detaled analyss). Ths ae develos a novel Real-Te MaReduce (RTMR) schedule to not only ovde deadlne guaantees fo MaReduce alcatons but also ensue good utlzaton of MaReduce clustes. The eande of ths ae s oganzed as follows. Secton 2 esents the backgound. In Secton 3, we befly descbe the Deadlne Constant schedule [12] and ts defcences. Secton 4 esents ou new schedulng algoth n detal. Evaluatons of these two schedules ae ovded n Secton 5. Secton 6 concludes the ae. II. BACKGROUND In ths secton, we befly descbe how a Hadoo cluste woks snce othe MaReduce-style clustes wok slaly. In late ats of ths ae, we wll thus use the tes Hadoo cluste and MaReduce cluste ntechangeably. A Hadoo cluste s often coosed of any coodty PCs, whee one PC acts as the aste node and othes as slave/woke nodes. A Hadoo cluste uses Hadoo Dstbuted Fle Syste (HDFS) [14] to anage ts data. It dvdes each fle nto sall fxed-sze (e.g., 128 MB) blocks and stoes seveal (e.g., 3) coes of each block n local dsks of cluste achnes. A MaReduce [1] coutaton s coosed of two stages, a and educe, whch take a set of nut key/value as and oduce a set of outut key/value as. When a MaReduce job s subtted to the cluste, t s dvded nto M a tasks and R educe tasks, whee each a task wll ocess one block of nut data. A Hadoo cluste uses woke nodes to execute a and educe tasks. Thee ae ltatons on the nube of a and educe tasks that a woke node can accet and execute sultaneously (.e., a and educe slots). Peodcally, a woke node sends a heatbeat sgnal to the aste node. Uon ecevng a heatbeat fo a woke node that has ety a/educe slots, the aste node nvokes the MaReduce schedule to assgn tasks to the woke node. A woke node that s assgned a a task eads the content of the coesondng nut data block fo a local o eote dsk, ases nut key/value as out of the block, and asses each a to the use-defned a functon. The a functon geneates nteedate key/value as, whch ae buffeed n eoy, and eodcally wtten to the local dsk and dvded nto R egons by the attonng functon. The locatons of these nteedate data ae assed back to the aste node, whch s esonsble fo fowadng these locatons to educe tasks. A educe task uses eote ocedue calls to ead the nteedate data geneated by 978-0-7695-5088-6 2013 U.S. Govenent Wok Not Potected by U.S. Coyght DOI 10.1109/HPCC.and.EUC.2013.216 1536
the M a tasks of the job. Each educe task s esonsble fo a egon (atton) of nteedate data wth cetan keys. Thus, t has to eteve ts atton of data fo all woke nodes that have executed the M a tasks. Ths ocess s called shuffle, whch nvolves any-to-any councatons aong woke nodes. The educe task then eads n the nteedate data and nvokes the educe functon to oduce the fnal outut data (.e., outut key/value as) fo ts educe atton [1]. Fgue I llustates Hadoo faewok and coutaton. H D F S Ma functon Inut Inut Inut Ma functon Maste node Ma Stage Slave nodes Assgn Ma Task Inteedate <K,V> Assgn Reduce Task Shuffle Reduce Stage Shuffle Slave nodes Reduce functon Outut Outut Reduce functon Outut Fgue 1. Hadoo Faewok and Coutaton III. Deadlne Constant Schedule The Deadlne Constant schedule [12] as to ensue deadlnes fo eal-te MaReduce jobs. Afte a job s subtted, the schedule fst detenes whethe the job can be coleted wthn the secfed deadlne o not usng a schedulablty test. It assues that 1) a job s educe stage does not stat untl the job s a tasks fnsh and 2) a job s educe tasks all stat executon sultaneously fo the sae aount of te that s known a o. Based on these assutons, t fst calculates the latest stat te s ax fo a job s educe stage, whch s also the deadlne fo the job s a tasks. If the job aves at te A, then the job has at ost s ax - A aount of te to colete ts a stage. Unlke fo the educe stage, the Deadlne Constant schedule assues that each job executes at a nu degee of task aallels fo the a stage. That s, the n schedule only assgns the job the nu nube n of a slots that ae equed to eet ts deadlne. The schedule, howeve, deands all n n a slots to be avalable sultaneously at the job s aval te. Uon a job s subsson, the constant schedule caes out the schedulablty test. The job s ejected f n n nube of a slots ae not avalable at that te. The job s also ejected f the nube of educe slots avalable at ax s s salle than the total nube of educe tasks secfed fo the job. The Deadlne Constant schedule, howeve, has soe ltatons and defcences, whch ay lead to esouce H D F S undeutlzaton and deadlne volatons. Fst, because the schedule assues that all educe tasks of a job stat to un sultaneously, t cannot accet a job wth oe educe tasks than the cluste s total nube of educe slots. Second, by checkng the afoeentoned two condtons n the schedulablty test, the schedule only consdes a sngle scenao whee the job s deadlne ght be satsfed. Those condtons ae, howeve, unnecessay fo eetng a job s deadlne. Many jobs that do not ass the test can nevetheless be acceted and coleted by the deadlnes. n Fo nstance, even f the syste does not have n nube of a slots avalable uon the job s aval, the job can stll fnsh ts a stage on te and eet the job s deadlne f we have oe esouces avalable at a late te ont. Futheoe, the constant schedule does not consde the case whee slots becoe avalable and utlzed at dffeent te onts. Due to these easons, the Deadlne Constant schedule ejects tasks unnecessaly and cannot well utlze syste esouces. Last but not the least, the schedulablty test condtons checked by the schedule ae nsuffcent to ensue the deadlne constant. As a esult, acceted jobs ay actually ss the deadlnes, volatng the schedule s eal-te oety. The cause fo the deadlne volaton s that the schedule only checks f a cetan nube of educe slots ae ax avalable at a atcula te ont s. Instead, the job eques the secfed nube of educe slots avalable fo ax the whole te nteval [ s, D], whee D s the job s deadlne. IV. RTMR Schedule In ths ae, we develo a new Real-Te MaReduce (RTMR) schedule fo heteogeneous clustes. RTMR schedule not only ovdes deadlne guaantees to acceted jobs but also well utlzes syste esouces. We have ade the followng thee assutons when desgnng RTMR schedule: The nut data s avalable n Hadoo Dstbuted Fle Syste (HDFS) befoe a job stats. No eeton s allowed. The oosed schedule odes the job queue accodng to job deadlnes. Howeve, once a job stats to execute ts fst a task, the job wll not be eeted. That s, even f a new cong job B has an eale deadlne than a cuently unnng job A, ou schedule akes no attet to execute B s tasks befoe A s tasks. A MaReduce job contans two stages: a and educe stages. Sla to [11][12][13], we assue that a job s educe stage does not stat untl the job s a tasks have all fnshed. RTMR schedule s coosed of thee coonents. The fst and ost otant one s the adsson contolle, whch akes decsons on whethe to accet o eject a job. The second coonent s the job dsatche, whch assgns tasks to execute on woke nodes. The last coonent s the feedback contolle. Snce a job ay fnsh at a dffeent te than estated, a feedback contolle s desgned to kee the adsson contolle u-to-date. 1537
A. Defntons Befoe descbng the algoth, we fst esent the aaetes and data stuctues used n RTMR schedule. J=(A, D, M, R, ): A MaReduce job J s secfed by the tule (A, D, M, R, ), whee A s the job aval te, D s the elatve deadlne, M and R esectvely secfy the nube of a and educe tasks fo the job, and s the nut data sze of the job. Fo a MaReduce job, each a task ocesses a unque at,, of the job s nut data, M whee =. = 1 η : the estated axu ato between a job s nteedate data sze and nut data sze. That s, the nut data sze fo the job s educe stage s at ostη *. Fo a MaReduce job, each one of the R educe tasks ocesses a unque at,, of the job s nteedate data, whee R =. = 1 c : the estated te of etevng and ocessng a unt of data n a a task. ax c : the estated longest te of etevng and ocessng a unt of data n a a task. The te to eteve data fo a a task vaes deendng on whee the nut data s located (.e., n eoy, local dsk, o eote dsk). In addton, fo a heteogeneous cluste, the task executon te ax dffes on dffeent nodes. c gves the wost-case estaton. c : the estated te of etevng and ocessng a unt of data n a educe task. ax c : the estated longest te of etevng and ocessng a unt of data n a educe task. J. T = t1, t2,... t l : Fo each acceted job J, we antan a soted vecto T to ecod the estated avalable te of the cluste s a slots, afte the scheduled executon of J and J s edecessos. In the vecto, l denotes the total nube of a slots n the MaReduce cluste. J. T = t1, t2,... t q : Fo each acceted job J, we antan a soted vecto T to ecod the estated avalable te of the cluste s educe slots, afte the scheduled executon of J and J s edecessos. In the vecto, q denotes the total nube of educe slots n the MaReduce cluste. [ 1 2 l J. V = v, v,... v ] : Fo each acceted job J, we use a soted vecto V to eesent the actual avalable te of the cluste s a slots afte consdeng the actual executon of J and J s edecessos. J. V = v, v,... v ] : Fo each acceted job J, [ 1 2 q we use a soted vectov to eesent the actual avalable te of the cluste s educe slots afte consdeng the actual executon of J and J s edecessos. Δ: The theshold that we set fo tggeng the feedback contolle. That s, f the dffeence of a job s actual and estated fnsh tes s lage than Δ, RTMR schedule wll nvoke the feedback contolle to kee the adsson contolle u-todate. ε : the executon te of the th a task of job J. ε : the executon te of the th educe task of job J. RTMR schedule uses hstocal job executon data to ax estate soe of the afoeentoned aaetes: η, c, ax and c. Afte executng a job J, we could udate ato η though the followng equaton: η = ax( η, ) ax ax Slaly, we udate the values of c and c as follows: ax ax ε1 ε 2 ε M c = ax( c,,,... ) 1 2 M ax ax ε1 ε 2 ε R c = ax( c,,,... ) 1 2 R In a heteogeneous envonent, woke nodes have dffeent data etevng and ocessng owe. In ode to avod deadlne ss, we follow the sae echans as adoted by the Deadlne Constant schedule [12] whee the longest te of unnng a a/educe task s used n the executon te estaton. B. Adsson Contolle In ths ae, we assue, fo both Deadlne Constant and RTMR schedules, that jobs ae ut n a oty queue followng EDF (ealest deadlne fst) ode. Ou adsson contol echans s, howeve, alcable beyond EDF, n geneal, to any olcy (e.g., FIFO) that defnes an ode n whch jobs should be gven esouces. When a new MaReduce job aves, the adsson contolle detenes f t s feasble to schedule the new job wthout coosng the guaantees fo evously adtted jobs. Algoths I, II, and III show the seudo code of the adsson contol. RTMR schedule fst checks f the new 1538
job J s deadlne can be satsfed o not,.e., to check f e A + D, whee e s the estated fnsh te of the job (Algoth I lnes 1-9). To estate J s fnsh te, we stat wth dentfyng J s ecedng job J f J wee nseted n the oty queue. If J wee at the head of the queue, J s the job that has been stated latest by the dsatche. If J s the fst job subtted to the cluste, t does not have a ecedng job. Snce T and T ecod the estated avalable te of the cluste s a and educe slots afte the scheduled executon of J and J s edecessos, we can estate job J s fnsh te based on these vectos. If the new job J s deadlne can be satsfed, RTMR schedule then checks whethe accetng J wll volate the deadlne of any evously adtted job (Algoth I lnes 10-21). Snce only jobs that succeed job J n the oty queue wll be delayed, RTMR schedule e-estates the fnsh tes. If any of the wll ss deadlne as a esult of J s accetance, RTMR schedule ejects job J. Fnally, once the adsson contolle decdes to accet job J, the oty queue and the T and T vectos of J and J s successos wll be udated to eflect the change (Algoth I lnes 22-23). ALGORITHM I. ADMISSION CONTROLLER AC(J = (A, D, M, R, ), Poty-Q) // Identfyng J s ecedng job J f J wee nseted n the queue 1: J = getpedecesso(j, Poty-Q) 2: T = J. T ( T = [0,0, 0] f J = nl) 3: T = J. T ( T = [0,0, 0] f J = nl) // nvoke Algoths II and III to do the calculaton 4: J. T = CalT (J, T ). T 5: J. T = CalT (J, T, T ). T 6: e = CalT (J, T, T ).e 7: f e > A + D then 8: etun false 9: end f 10: J = J 11: J s = getsuccesso(j, Poty-Q) 12: whle (J s!= nl) do // nvoke Algoths II and III to do the calculaton 13: T s = CalT ( J s, J. T ). T 14: T s = CalT ( J s, J. T, J. T ). T 15: e s = CalT ( J s, J. T, J. T ).e 16: f e s > J s.a + J s.d then 17: etun false 18: end f 19: J = J s 20: J s = getsuccesso(j, Poty-Q) 21: end whle 22: Pooty-Q.nset(J) 23: ecod J. T, J. T, T s and Ts couted above as the new T & T vectos fo J and J s successos 24: etun tue ALGORITHM II. CACULATION OF T AND e CalT (J = (A, D, M, R, ), T = t1, t2,... t l ) // Ths algoth estates e, job J s a stage fnsh te and T, the avalable te of a slots afte the scheduled executon of J and J s edecessos 1: ~ ε = ax c *ax(, = 1,2,... M) 2: fo k =1 to M do 3: ck the sallest value n vectot,.e., t 1 4: t 1 = ax ( t 1, cuent Te) 5: t 1 += ~ ε 6: e = t 1 7: sot tes n T to kee T a soted vecto 8: end fo 9: etun T, e ALGORITHM III. CACULATION OF T AND e CalT (J = (A, D, M, R, ),,... T = t1 t l,,... T = t1 t q ) // Ths algoth estates e, job J s fnsh te and T, the avalable te of educe slots afte the scheduled executon of J and J s edecessos // nvoke Algoth II to estate J s a stage fnsh te 1: e = CalT (J, T ). e 2: ~ ε = ax c *ax(, = 1,2,... R) 3: fo k = 1 to R do 4: ck the sallest value n vecto T,.e., t 1 5: t 1 = ax ( t 1, e ) 6: 7: e = t 1 += ~ ε t 1 8: sot tes n 9: end fo 10: etun T, e T to kee T a soted vecto C. Dsatche As entoned n Secton II, a Hadoo cluste uses woke nodes to execute a and educe tasks. Each woke node has a fxed nube of a slots and educe slots, whch lt the nube of a tasks and educe tasks that a woke node can execute sultaneously. Peodcally, a woke node sends a heatbeat sgnal to the aste node. Uon ecevng a heatbeat fo a woke node wth ety a/educe slots, the aste node nvokes the schedule to assgn tasks. RTMR schedule s dsatche fulflls ths ole, allocatng tasks to execute on woke nodes. Algoth IV shows the seudo code of the dsatche. When jobs ae nseted nto the oty queue, the a stages can stat and the a tasks ae eady to un. Theefoe, t s staghtfowad to dsatch a tasks followng the job ode/oty. No odfcaton s needed hee and RTMR schedule dsatches a tasks followng the sae aoach as the default Hadoo syste (lnes 4-5). 1539
Howeve, snce a job s a stage fnsh te deends on not only the job s a stage stat te but also the nube of a tasks the job has, when thee ae ultle jobs concuently unnng n the cluste, whch jobs can fnsh the a stages and stat the educe stages eale s not detened by the job oty alone. Although jobs stat the a stages followng the job ode/oty, t s hghly lkely that jobs wll not fnsh the a stages n that ode. As a esult, the educe tasks of a lowe-oty job could becoe eady eale than those of a hghe-oty job. Thus, f eady educe tasks ae assgned to execute on woke nodes wthout any constant, the oe executon of hghe-oty jobs ay be ntefeed by the executon of lowe-oty jobs, leadng to deadlne volatons. One sle ethod to avod such ntefeences s to stctly enfoce that jobs stat the educe stages followng the job ode. That s, a job cannot stat the educe stage untl all ecedng jobs have fnshed the a stages. Howeve, ths staghtfowad ethod uts a stong constant on job aallels and causes neffcent utlzaton of syste esouces. Theefoe, we nstead desgn a esevaton-based dsatche, whch sly ensues that a lowe-oty job does not occuy slots that belong to hghe-oty jobs. That s, the dsatche eseves slots that ae needed by hghe-oty jobs to avod otental ntefeences. Uon ecevng a heatbeat fo a woke node wth ety educe slots, the dsatche assgns a educe task to the woke node only f enough educe slots have been left unused fo hgheoty jobs (lnes 6-21). We have oved that all jobs acceted by the adsson contolle can be successfully dsatched and coleted by the deadlnes n noal scenaos when thee s nethe a node falue no a task e-executon (lease efe to the Techncal Reot fo the oof [21]). ALGORITHM IV. DISPATCHER DP(J=(A, D, M, R, ), Poty-Q,,Ra) 1: : avalable a slots on node 2: : avalable educe slots on node 3: Ra: the nube of avalable educe slots n the cluste, whch s counted uon callng ths algoth // dsatch a tasks: 4: f (>0) then 5: follow the sae aoach as the default Hadoo syste to dsatch a tasks // dsatch educe tasks: 6: f > 0 then 7: esevedslot: the nube of educe slots eseved fo hgh-oty jobs 8: esevedslot = 0 9: fo J fo Poty-Q do 10: f esevedslot > Ra then 11: beak fo 12: end f 13: T = fndareadyreducetask(j) 14: f T!= nl then 15: assgn T to node 16: beak fo 17: else f J has not eached ts educe stage then 18: esevedslot += J.R 19: end f 20: end fo 21: end f D. Feedback Contolle A feedback contolle s develoed to kee the adsson contolle u-to-date. As descbed n Secton B, the adsson contolle akes decsons based on nfoaton antaned n job ecods,.e., J. T and J. T vectos. These vectos ecod the estated avalable te of the cluste s a and educe slots afte the scheduled executon of job J and ts edecessos. Howeve, these jobs actual executon ay be dffeent fo the estate. Fo nstance, due to the essstc estaton whee we use c ax and as the estated cost of etevng and ocessng a c ax unt of data n a a and a educe task and η as the estated ato between a job s nteedate data sze and nut data sze, t s hghly lkely that a job fnshes eale than that estated by the adsson contolle. In addton, node falues o seculatve e-executon of slow tasks can esult n a job fnsh te late than exected. To educe false negatves (.e., ejectng jobs that can eet the deadlnes) and deal wth unexected events (such as node falues), a feedback contolle s nvoked to udate all watng jobs T and T vectos f the dffeence between a job s actual and estated fnsh tes s lage than a cetan theshold Δ. The feedback contolle s also tggeed f a job sses ts deadlne due to unexected events. As a esult of the udate, the adsson contolle akes decsons based on oe accuate estates. Algoths V and VI show the seudo code of the feedback contolle. To avod hgh algoth ovehead, we do not kee tack of J. V and J. V, the actual avalable te of the cluste s a and educe slots afte consdeng the actual executon of job J and J s edecessos. Tackng these vectos s not an easy task. Fst, t eques dentfyng the coect executon slot and udatng t afte each task s executon. Second, as entoned n Secton C, to well utlze syste esouces, we develo a esevaton-based educe task dsatche, whch allows out of ode executon of jobs educe stages and out of ode coleton of jobs. Thus, a job ay fnsh ts executon befoe soe of ts edecessos and afte soe of ts successos. Due to these cases, sly takng snashots of the cluste when a job J s tasks fnsh wll not gve the coect J. V and J. V vectos. In addton, thee s a oe ctcal oble: due to out of ode job coleton, f soe of J s edecessos ae stll executng, the actual values of J. V and J. V ae unknown when job J fnshes and when the feedback contolle s tggeed. Thus, nstead of tackng these vectos, we deveu andu vectos as udated estates of J. V and J. V. Ths estaton s caed out only when the feedback contolle (Algoth V) nvokes the slot avalable te udate (Algoth VI). To deve U andu, lke devng J. T and J. T, we stll assue all J s edecessos fnsh and ake the slots avalable at T and T. Then the actual executon of job J s a and educe tasks ae consdeed followng a non-deceasng 1540
ode of task fnsh te and t s assued that the eale an executon slot becoes avalable,.e., the eale an executon slot stats to un a task, the eale t fnshes the task executon (Algoth VI lnes 7-21). These assutons ay not hold n the actual executon and thus U and U ae only udated estates of J. V and J. V. Howeve, as long as U J. V andu J. V, the feedback contolle stll woks coectly and eseves RTMR schedule s eal-te oety. ALGORITHM V. FEEDBACK CONTROLLER FC(J=(A, D, M, R, ), Poty-Q) 1: Δ: theshold to tgge the udate 2: e ~ : job J s actual fnsh te 3: J = getpedecesso(j, Poty-Q) 4: 5: T = J. T ( T = [0,0, 0] f J = nl) T = J. T ( T = [0,0, 0] f J = nl) // nvoke Algoth III to do the calculaton 6: e = CalT (J, T, T ).e 7: f e- e ~ Δ o e ~ > (A+D) then E ~, the soted vecto contanng the actual fnsh te 8: buld of job J s a tasks 9: buld job J s educe tasks // nvoke Algoth VI to calculate the udated estates 10: J. T = SATU(J, T, T, E ~, E ~ ). U 11: J. T = SATU(J, T, T, E ~, E ~ ). U E ~, the soted vecto contanng the actual fnsh te of 12: J = J 13: J s = getsuccesso( J, Poty-Q) 14: whle J s!= nl do // nvoke Algoths II and III to do the calculaton 15: 16: J = J s J. T s = Cal J. T s = Cal T ( J s, T ( J s, J. T ). T J., T 18: J s = getsuccesso( J, Poty-Q) 19: end whle 20: else etun 21: end f J. T ). T 17: ALGORITHM VI. SLOT AVAILABLE TIME UPDATE SATU (J=(A, D, M, R, ), T, T, E ~, E ~ ) 1: T : a slot avalable te n J s edecesso s ecod 2: T : educe slot avalable te n J s edecesso s ecod 3: E ~ : soted vecto contanng the actual fnsh te of job J s a tasks 4: educe tasks 5: U = 6: E ~ : soted vecto contanng the actual fnsh te of job J s U = T T E ~ s not ety do 7: whle 8: eove the te cuently located at the begnnng of vecto E ~, say t s e~ 9: e~ (whee u 1 = u 1 s the fst and sallest te n vectou ) 10: sot tes nu to keeu a soted vecto 11: end whle 12: whle E ~ s not ety do 13: eove the te cuently located at the begnnng of vecto E ~, say t s e~ 14: u 1 = e~ (whee u 1 s the fst and sallest te n vectou ) 15: sot tes nu to keeu a soted vecto 21: end whle 22: etunu, U We have oved the coectness of the feedback contolle by showng that U J. V andu J. V. Theefoe, afte udatng job J s vectost and T wth U andu n Algoth V (lnes 10-11), the condton J. T J. V and J. T J. V (.e., the estated slot avalable te s geate o equal to the actual avalable te) stll holds fo job J (lease efe to the Techncal Reot fo the oof [21]). Snce the devaton of J. T s and J. T s ae based on J. T and J. T (see Algoth V), J. T J. V and J. T J. V also ensues that J s. T J s. V and J s. T J s. V fo all succeedng jobs J s. V. EVALUATION Ou leentaton of RTMR schedule and Deadlne Constant schedule [12] ae all based on Hadoo 0.21 1. These two schedules ae leented and coaed exeentally n tes of eal-te oety and cluste utlzaton. To test the effects of feedback contol, we un RTMR schedule twce, wth and wthout the feedback contolle enabled. In addton, snce the cluste utlzaton s 1 Kc and Anyanwu [12] leented Constant schedule n Hadoo 0.20.2. We nstead choose Hadoo 0.21 because t s the closest veson to 0.20.2 but wth oved featues necessay fo sall and edu sze clustes. Snce Hadoo 0.23/2.x s anly desgned fo lage clustes, t s not adoted fo ou exeents. 1541
detened by not only the schedulng algoth but also the wokload volue, we un the default Hadoo FIFO schedule, whch accets all jobs to execute n the cluste, collectng ts esultant cluste utlzaton to eflect the wokload volue. If a eal-te schedule acheves a cluste utlzaton close to that acheved by the default Hadoo FIFO schedule, we thnk that the esouce cost of ovdng the eal-te oety s not hgh. Fo the RTMR schedule, the adsson contolle s leented n the JobQueueJobInPogessLstene class whch akes the adsson contol decson and antans the MaReduce job queue. The dsatche s n the RTMRTaskSchedule class whch extends fo the TaskSchedule class and s n chage of dsatchng a and educes tasks. The feedback contolle s also n the JobQueueJobInPogessLstene class, whee we set the theshold Δ to be a tycal a task executon te. Slaly, Deadlne Constant schedule s adsson contolle s n JobQueueJobInPogessLstene class and ts dsatche, called DCTaskSchedule, extends fo the TaskSchedule class. A heteogeneous Hadoo cluste that contans one aste node and 30 woke nodes s used as the testbed. The 30 woke nodes ae confgued as one ack and they ae of two tyes. 20 of the ae 2 dual-coe CPU nodes and 10 of the ae 2 sngle-coe CPU nodes. Table I gves the detaled hadwae nfoaton of the cluste. We ake the nube of a slots n a woke node equal to the nube of CPU coes. Because each node has only one Ethenet cad, we confgue one educe slot e woke node to avod bandwdth coetton between ultle educe tasks on a sngle node. Loadgen, a test exale n Hadoo souce code fo evaluatng Hadoo schedules [16][17], s used as the test alcaton. TABLE I. EXPERIMENTAL ENVIRONMENT Hadwae and Hadoo Nodes Quantty Confguaton 2 sngle-coe 2.2GHz Oteon- Maste node 1 248 CPUs, 8GB RAM, 1Gbs Tye I woke nodes Tye II woke nodes 20 10 Ethenet 2 dual-coe 2.2GHz Oteon- 275 CPUs, 4GB RAM, 1 Gbs Ethenet, 4 a and 1 educe slots e node 2 sngle-coe 2.2GHz Oteon- 64 CPUs, 4GB RAM, 1 Gbs Ethenet, 2 a and 1 educe slots e node We fst ceate a subsson schedule (wokload I) that s sla to the one used by Zahaa et al. [17]. Zahaa et al. [17] geneated a subsson schedule fo 100 jobs by salng job nte-aval tes and nut szes fo the dstbuton seen at Facebook ove a week n Octobe 2009. By salng job nte-aval tes at ando fo the Facebook tace, they found that the dstbuton of nteaval tes was oughly exonental wth a ean of 14 seconds. They also geneated job nut szes based on the Facebook wokload, by lookng at the dstbuton of the nube of a tasks e job at Facebook and ceatng datasets wth the coesondng szes (.e., each a task eques a 128 MB nut block). To ake t ossble to coae jobs n the sae bn wthn and acoss exeents, job szes wee quantzed nto nne bns, lsted n Table II [17]. Ou wokload I has sla job szes and job nteaval tes. In atcula, ou job sze dstbuton follows the fst sx bns of the benchak shown n Table II, whch eflect about 89% of the jobs at the Facebook oducton cluste. Because ou testbed s lted n sze, we exclude those jobs wth oe than 300 a tasks. Lke the schedule n [17], the dstbuton of nte-aval tes s exonental wth a ean of 14 seconds, akng ou wokload totally 21 nutes long. The subsson schedule used by Zahaa et al. [17], howeve, does not secfy the nube of educe tasks and the deadlne fo a job. To geneate wokload I, we ceate two ntevals n each job bn (see Table II), one fo educe task nube and one fo deadlne. Two ando nubes fo the two ntevals ae cked as the nube of educe tasks and the deadlne fo a job. Because the Deadlne Constant schedule cannot accet a job wth oe educe tasks than the cluste s total nube of educe slots, fo wokload I, we fx the axu nube of educe tasks e job to be 30, the total nube of educe slots n the cluste. TABLE II. DISTRIBUTION OF JOB SIZES (n Tes of Nube of Ma Tasks) at Facebook [17] Bn #Mas %Jobs at #Mas n # of jobs n Facebook Benchak Benchak 1 1 39% 1 38 2 2 16% 2 16 3 3-20 14% 10 14 4 21-60 9% 50 8 5 61-150 6% 100 6 6 151-300 6% 200 6 7 301-500 4% 400 4 8 501-1500 4% 800 4 9 >1501 3% 4800 4 TABLE III. WORKLOAD I S CONFIGURATION(n Tes of Nube of Ma, Reduce Tasks and Deadlne) Bn #Mas #Reduces Deadlne (second) 1 1 [1,5] [200,300] 2 2 [1,5] [200,300] 3 10 [5,10] [300,400] 4 50 [10,20] [500,800] 5 100 [20,30] [1000,1500] 6 200 30 [2000,2500] Snce ost jobs n the Facebook wokload ae sall, n atcula, soe of the havng only 1 a task, we ceate wokload II to nclude oe jobs wth hghe aallels. That s, n wokload II, we let the nube of a tasks e job follow noal dstbuton wth an aveage of 100. Agan, because of the odeate sze of ou cluste, we do not nclude the thee jobs that have oe than 300 a tasks. Table IV shows the detaled nfoaton of wokload II. To test how RTMR schedule woks wth lage jobs, we also ceate soe jobs wth oe educe tasks than the cluste s total nube of educe slots n wokload II. Howeve, snce we aleady know that Deadlne Constant schedule cannot accet such jobs, they ae not ncluded n wokload II when Deadlne Constant schedule s tested. 1542
Fo efoance evaluaton of the eal-te schedules, the followng thee etcs,.e. job accet ato, job success ato, and cluste utlzaton ae used: # acceted _ jobs AccetR = # jobs _ n _ a _ wokload # successful _ jobs SuccessR = # acceted _ jobs slot _ te_ used _ by _ successful_ jobs Utl = avalable_ slot _ te_ dung _ wokload _ exe TABLE IV. WORKLOAD II S CONFIGURATION (n Tes of Nube of Ma, Reduce Tasks and Deadlne) Bn No. Deadlne #Mas #Reduces Job (second) 1 9 [1,10] [1,5] [200,300] 2 24 [10,50] [5,10] [300,500] 3 25 [50,100] [15,30] [1000,1500] 4 18 [100,200] [25,50] [1500,2500] 5 13 [200,300] [35,70] [2500,3500] The followng equaton s used to calculate the cluste utlzaton acheved by default Hadoo FIFO schedule: slot _ te _ used _ by _ all _ jobs Utl = avalable _ slot _ te _ dung _ wokload _ exe Hee, successful_jobs denotes those jobs that fnsh befoe the deadlnes and slot_te_used_by_successful_jobs efes to the total a and educe slot te used to execute the. Snce Hadoo FIFO schedule does not consde job deadlnes and ovdes no eal-te guaantees, t accets all jobs and ts cluste utlzaton s calculated usng slot_te_used_by_all_jobs nstead. avalable_slot_te_dung_wokload_exe efes to the total usable te of cluste a and educe slots dung the executon of a wokload,.e., the oduct of the nube of slots and the tunaound executon te of all acceted jobs n a wokload. Tables V and VI show how schedules efo wth wokload I and II esectvely. As we can see, although coaed to RTMR schedule Deadlne Constant schedule accets oe jobs, t fals to ovde deadlne guaantees to all acceted jobs, wth job success ato of 85.7% and 22.5% esectvely. Snce not all acceted jobs ae successful whle oe jobs ae acceted, whch olong the wokload s executon n the cluste, Deadlne Constant schedule leads to uch lowe cluste utlzatons of only 5.7% and 0.7% esectvely. In contast, RTMR schedule antans good cluste utlzaton of 15.5% and 64.6%, n coason to 21.3% and 69.7% acheved by default Hadoo FIFO schedule. Deadlne Constant schedule s vey oo efoance wth wokload II exeentally deonstates ts defcences n handlng eal-te MaReduce jobs wth hgh aallels. Fo the data, we can also conclude that RTMR schedule efos bette when we enable the feedback contolle to kee the adsson contolle u-to-date, whch esults n bette job accet ato and cluste utlzaton. TABLE V. SCHEDULER PERFORMANCE WITH WORKLOAD I Metcs Accet Rato Success Rato Cluste Utlzaton Deadlne Constant RTMR RTMR w/o Feedback Hadoo FIFO 71.6% 56.8% 46.6% n/a 85.7% 100% 100% n/a 5.7% 15.5% 11.6% 21.3% TABLE VI. SCHEDULER PERFORMANCE WITH WORKLOAD II Metcs Accet Rato Success Rato Cluste Utlzaton VI. Deadlne Constant RTMR RTMR w/o Feedback Hadoo FIFO 49.4% 24.7% 15.7% n/a 22.5% 100% 100% n/a 0.7% 64.6% 49.8% 69.7% CONCLUSION AND FUTURE WORK Ths ae develos, leents, and exeentally evaluates a novel Real-Te MaReduce (RTMR) schedule fo cluste-based schedulng of eal-te MaReduce alcatons. RTMR schedule ovecoes the defcences of an exstng algoth and acheves good cluste utlzaton and 100% job success ato, ensung the ealte oety fo all adtted MaReduce jobs. In the futue, we wll nvestgate eal-te schedulng n MaReduce Onlne clustes [22], whch suot elnng to allow educes to begn ocessng data as soon as t s oduced by aes. VII. ACKNOWLEDGEMENTS The authos acknowledge suot fo NSF awad 1018467. Ths wok was coleted utlzng the Holland Coutng Cente of the Unvesty of Nebaska. REFERENCES [1] Dean, J. and Gheawat, S. 2008. MaReduce: Slfed Data Pocessng on Lage Clustes. Coun. ACM, 51(1):107 113. [2] Aache Hadoo htt://hadoo.aache.og. [3] M. Zahaa, D. Bothaku, J. S. Saa, K. Eleleegy, S. Shenke, and I. Stoca, Job schedulng fo ult-use aeduce clustes, EECS Deatent, Unvesty of Calfona, Bekeley, Tech. Re., A 2009. [4] Aecan Exess. htts://www.aecanexess.co/ [5] The Coact Muon Solenod Exeent. Avalable: htt://cs.web.cen.ch/cs/ndex.htl [6] The Lage Hadon Collde. Avalable: htt://lhc.web.cen.ch/lhc [7] Attebuy, G.; Baanovsk, A.; Bloo, K.; Bockelan, B.; Kca, D.; Letts, J.; Levshna, T.; Lundestedt, C.; Matn, T.; Mae, W.; Hafeng P; Rana, A.; Sflgo, I.; S, A.; Thoas, M.; Wuethwen, F.; Hadoo dstbuted fle syste fo the Gd. Nuclea Scence Syosu Confeence Recod (NSS/MIC), 2009 IEEE.. 1056 1061. [8] Oen Scence Gd. Avalable: htt://www.oenscencegd.og [9] Hadoo Uses htt://wk.aache.og/hadoo/poweedby#f [10] Caacty Schedule htt://hadoo.aache.og/coon/docs/0.19.2/caacty_schedule.htl 1543
[11] Joda Polo, Davd Caea, Yolanda Becea, Malgozata Stende, and Ian Whalley. Pefoance-dven task co-schedulng fo aeduce envonents. In Netwok Oeatons and Manageent Syosu (NOMS), 2010 IEEE, ages 373 380, 19-23 2010. [12] K. Kc and K. Anyanwu, Schedulng hadoo jobs to eet deadlnes, n 2nd IEEE Intenatonal Confeence on Cloud Coutng Technology and Scence (CloudCo), 2010,. 388 392. [13] Xcheng Dong, Yng Wang, Huang Lao Schedulng Mxed Realte and Non-eal-te Alcatons n MaReduce Envonent. In the oceedng of 17th Intenatonal Confeence on Paallel and Dstbuted Systes. 2011,. 9 16. [14] Xuan Ln, Yng Lu, J. Deogun, and S. Goddad. Real-te dvsble load schedulng fo cluste coutng. In Real Te and Ebedded Technology and Alcatons Syosu, 2007. RTAS 07. 13th IEEE ages 303 314, 3-6 2007. [15] HDFS htt://hadoo.aache.og/coon/docs/cuent/hdfsdesgn.htl [16] Chen He, Yng Lu, Davd Swanson. Matchakng : A New MaReduce Schedulng Technque. In the oceedng of 2011 CloudCo, Athens, Geece, 2011,. 40 47. [17] Mate Zahaa, Dhuba Bothaku, Joydee Sen Saa and Khaled Eleleegy, Scott Shenke, and Ion Stoca, Delay schedulng: a sle technque fo achevng localty and faness n cluste schedulng. In the oceedngs of the 5th Euoean confeence on Coute systes, 2010. 265-278. [18] Zhuo Tang, Junqng Zhou, Kenl L, and Ruxuan L "A MaReduce task schedulng algoth fo deadlne constants.", Cluste Coutng, Vol. 15, 2012. [19] Eunj Hwang, and Kyong Hoon K. "Mnzng Cost of Vtual Machnes fo Deadlne-Constaned MaReduce Alcatons n the Cloud." Gd Coutng (GRID), 2012 ACM/IEEE 13th Intenatonal Confeence on. IEEE, 2012. [20] Mcheal Mattess, Rodgo N. Calheos, and Rajkua Buyya. "Scalng MaReduce Alcatons acoss Hybd Clouds to Meet Soft Deadlnes." Techncal Reot CLOUDS-TR-2012-5, Cloud Coutng and Dstbuted Systes Laboatoy, the Unvesty of Melboune, August 15, 2012. [21] Chen He, Yng Lu, Davd Swanson. Real-Te Alcaton Schedulng n Heteogeneous MaReduce Envonents. Techncal Reot TR-UNL-CSE- 2012-0004, Unvesty of Nebaska-Lncoln, 2012. Avalable: htt://cse- as.unl.edu/facdb/ublcatons/tr-unl-cse- 20120004.df [22] T. Conde, N. Conway, P. Alvao, J. M. Hellesten, K. Elleegy, and R. Seas. Maeduce Onlne. In NSDI, 2010. [23] A. D. Feguson, P. Bodík, S. Kandula, E. Boutn, and R. Fonseca. Jockey: Guaanteed Job Latency n Data Paallel Clustes. In EuoSys, 2012. [24] G. Wang, A. R. Butt, P. Pandey, and K. Guta. A Sulaton Aoach to Evaluatng Desgn Decsons n MaReduce Setus. In MASCOTS 2009. [25] H. Heodotou and S. Babu. Poflng, What-f Analyss, and Cost-based Otzaton of MaReduce Pogas. In VLDB 2011. [26] H. Heodotou, F. Dong, and S. Babu. No One (Cluste) Sze Fts All: Autoatc Cluste Szng fo Datantensve Analytcs. In SoCC 2011. 1544