Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems


 Mary Stanley
 2 years ago
 Views:
Transcription
1 Jont Schedulng of Processng and Shuffle Phases n MapReduce Systems Fangfe Chen, Mural Kodalam, T. V. Lakshman Department of Computer Scence and Engneerng, The Penn State Unversty Bell Laboratores, AlcatelLucent Abstract MapReduce has emerged as an mportant paradgm for processng data n large data centers. MapReduce s a three phase algorthm comprsng of Map, Shuffle and Reduce phases. Due to ts wdespread deployment, there have been several recent papers outlnng practcal schemes to mprove the performance of MapReduce systems. All these efforts focus on one of the three phases to obtan performance mprovement. In ths paper, we consder the problem of jontly schedulng all three phases of the MapReduce process wth a vew of understandng the theoretcal complexty of the jont schedulng and workng towards practcal heurstcs for schedulng the tasks. We gve guaranteed approxmaton algorthms and outlne several heurstcs to solve the jont schedulng problem. I. INTRODUCTION MapReduce [] has emerged as a sgnfcant data processng paradgm n data centers. MapReduce s used n several applcatons ncludng searchng the web, URL frequency estmaton and ndexng. In a typcal applcaton, the data on whch MapReduce operates s parttoned nto chunks and assgned to dfferent processors. MapReduce s essentally a three step process: ) In the Map phase, several parallel tasks are created to operate on the relevant data chunks to generate ntermedate results. These results are stored n the form of key value pars. ) In the Shuffle phase, the partal computaton results of the map phase are transferred to the processors performng the reduce operaton. 3) Durng the Reduce phase, each processor that executes the reduce task aggregates the tuples generated n the map phase. A frequent requrement for routne data center requests s fast response tme. In a large data center where there are several MapReduce jobs that run concurrently, a centralzed master coordnates the assgnment and schedulng of MapReduce tasks across the data center. The assgnment problem s to decde whch processor wll execute a map or reduce task and the schedulng s to decde n what order the tasks wll be executed on each processor. In ths paper, we assume that tasks are already assgned to the processors and the focus of the paper s to determne the schedule that mnmzes the mean response tme over all jobs. MapReduce throws up several sgnfcant schedulng challenges due to the dependences between dfferent phases of a job s MapReduce operatons as well as dependences between dfferent processors that s the consequence of splttng a job across multple processors. Though dependences between tasks have been studed n the job shop schedulng lterature, MapReduce system presents several new challenges. For example, n tradtonal job shops, each job comprses of multple tasks but at most one task can be executed at any gven tme unlke MapReduce schedulng where tasks may be executed n parallel. The shuffle operaton n MapReduce assumes smultaneous possesson of multple resources (the egress of the transmt processor and the ngress of the receve processor) and ths phase resembles data mgraton problems more than jobshop problems. If the shuffle operaton s not a bottleneck, there s no advantage f one map task belongng to a partcular job s completed early whle another task belongng to the same job s delayed. Ths s the key dea n the MapReduce formulaton n Chang et al. []. Recent studes by Chowdhury et al. [3] have shown that the shuffle operaton accounts for as much as a thrd of the completon tme of a MapReduce job. Therefore the shuffle operaton has to be taken nto consderaton explctly n the schedulng problem. Includng the shuffle operaton ntroduces new tradeoffs n the schedulng problem. If all the map tasks belongng to a job complete almost smultaneously, and all the shuffle operatons for ths job can start at ths tme. Ths creates a bottleneck at the shuffle operaton. Therefore, once shuffle s ncluded n the system, t may be better to spread the completon of the map task n order to smoothly schedule the shuffle operaton. The wdespread use of MapReduce has lead to several recent papers [4], [5], [6] outlnng practcal approaches to mprovng the performance of MapReduce systems. Though these approaches are shown to mprove the performance of specfc parts of MapReduce, they do not address the endtoend performance of MapReduce. Unlke the formulaton n Chang et al. [], we explctly model the precedence between the map and reduce operatons n ths paper and outlne a constant factor approxmaton algorthm. Moreover, [] gnores the shuffle phase. The approach n ths paper s to develop a theoretcal framework to model the endtoend performance of MapReduce systems and n partcular model the nteractons between the three phases of MapReduce. To our knowledge, ths s the frst paper that addresses mapshufflereduce from a theoretcal vewpont. Ths paper makes the followng contrbutons: ) We propose a formulaton for explctly takng nto consderaton dependences between map and reduce operatons and outlne a constant factor guaranteed ap
2 proxmaton algorthm (MARES) for the problem. ) The approxmaton algorthm nvolves solvng a lnear programmng problem wth exponental number of constrants and we outlne a columngeneraton based approach to solve the lnear programmng relaxaton. 3) We develop a constant factor approxmaton algorthm for the mapshufflereduce problem (MASHERS). 4) We develop heurstcs for both the mapreduce (H MARES) as well as mapshufflereduce (DMASHERS, HMASHERS) based on the constant factor approxmaton algorthms and study ther performance. We show va experments that these heurstcs perform extremely well n practce. The rest of the paper s organzed as follows. Secton II formally defnes the schedulng problem wth precedence constrants, provdes a based lower bound and shows how to solve the. Secton III presents the approxmaton algorthm for the two phase mapreduce problem. Secton IV ntroduce the shuffle phase nto the problem and presents the approxmaton algorthm for the mapshufflereduce problem. Secton V evaluates these heurstcs by smulatons. Secton VI dscusses several related work and fnally Secton VII concludes the paper. II. SCHEDULING DEPENDENT TASKS: A COLUMN GENERATION APPROACH In ths secton we consder the problem of schedulng a set of dependent tasks on multple processors n order to mnmze the sum of the weghted completon tme of the tasks. We develop a lnear programmng based approach to get a lower bound on the optmal soluton. Soluton to ths lnear programmng problem explots the structure of the sngle processor schedulng polyhedron. The lnear programmng soluton s nfeasble to the schedulng problem but we explot the structure of the dependences n MapReduce to derve approxmaton algorthms for both the two phase mapreduce problem and the three phase mapshufflereduce problem. A. Problem Defnton We are gven a set J of jobs and a set P of processors. Job j J has a set of Tasks T j. Let T = j T j represent the set of all tasks. Each task u s assgned to a processor p(u) P and ts processng tme t u 0. The set of tasks assgned to processor p s denoted by J p. Let n p denote the number of tasks assgned to processor p. We assume that a processor can execute at most one task at any gven tme and all tasks are completed n a nonpreemptve manner. Let G = (V, L) denote the Precedence Graph among tasks, where the set of nodes s the set of tasks and (u, v) L ndcates that task v can only start after the completon of task u. Let C u denote the completon tme of task u. Assocated wth task u s a weght w u that ndcates ts relatve mportance and and the overall schedulng objectve s to mnmze the weghted sum of the task completon tmes u w uc u. The problem of schedulng tasks wth precedence constrants n order to mnmze the total weghted completon tme s NPhard [7] even on a sngle processor. Our problem s a sgnfcant generalzaton of ths problem and s therefore NPhard. B. Based Lower Bound Consder a processor p and the set of tasks J p assgned to ths processor. The schedulng problem on processor p s to decde the order n whch the tasks n J p are processed. It can be shown (see [3] and the references n t) that the completon tme C u for the tasks u J p les n the followng polyhedron denoted by P p : t u C u f(s, p) S J p, p () where u S f(s, p) = t v + u S ( u S ) t v The ntuton behnd these nequaltes s smple. Consder a processor on whch two tasks wth processng tmes t and t have to be processed. If the tasks are processed n the order, then the completon tmes of the two tasks wll be C = t and C = t + t. If the order s reversed then C = t + t and C = t. Note that t C +t C = t +t +t t = (t +t +(t +t ) ) n ether of the cases. Ths argument can be extended to any subset of jobs. Note that P p represents only the set of necessary condtons that the completon tmes C u have to satsfy. The polyhedron has many vectors that are not achevable by any schedulng polcy. It can be shown that P p has n p! extreme ponts, each correspondng a permutaton of the tasks n J p. A gven permutaton of tasks n J p, results n a set of completon tmes of the tasks n J p. Ths vector of completon tmes represents an extreme pont of the polyhedron P p. Let Ep k represent extreme pont k of the polyhedron P p. Note that Ep k s an n p dmensonal vector. We use Ep k (u) to denote the completon tme of job u J p correspondng to extreme pont k at processor p. An alternate way to represent P p s to wrte t as a convex combnaton of ts extreme ponts. We use λ k p to represent the nonnegatve weght assocated wth extreme pont k n polyhedron for processor p. n p! n p! P p = λ k pep k : λ k p =, λ k p 0 k= k= Snce the polyhedron P p represents the set of necessary condtons for the completon tme of the tasks on processor p, the lnear programmng problem: mn u T w u C u C u P p(u) C v C u + t v (u, v) L gves a lower bound on the weghted sum of completon tmes subject to the precedence constrants. Ths lnear programmng problem has an exponental number of constrants. It s not practcal to solve t except for small problem nstances. We
3 develop a column generaton technque for solvng the lnear programmng problem. C. Column Generaton In a column generaton procedure, the lnear program s frst wrtten n terms of a convex combnaton of an exponental number of columns. Ths s called the master problem (MP). The master problem s then solved by successve approxmaton. The approxmaton of the master problem s done by restrctng the set of columns n the lnear programmng problem. Ths s called the restrcted master problem (RMP). The dual soluton to the restrcted master problem s used to verfy f the current soluton s optmal. If not, a new column s generated to be ncluded n the restrcted master problem. Ths process s repeated untl optmalty s reached. The practcalty of the column generaton procedure depends on whether optmalty verfcaton and new column generaton can be done effcently. In our case, ths column generaton procedure s very easy to solve snce t reduces to a smple sortng procedure. We now gve a more detaled vew of the column generaton procedure for the schedulng problem on hand. The master problem (MP) n terms of the extreme ponts of the polyhedrons P p s the followng: subject to n p! n p! k= Z MP = mn u T w u C u () C u λ k p(u) Ek p(u) (u) u (3a) λ k p = p k= (3b) C v C u + t v (u, v) L (3c) Instead of usng all the extreme ponts of the polyhedron, the restrcted master problem s formulated over a subset of extreme ponts. Let Z(p) denote a subset of extreme ponts for processor p. Intally we generate one extreme pont for each processor. Ths s done as follows: For processor p, we randomly order the tasks n J p and compute the completon tme for all the tasks. Ths vector of completon tmes s an extreme pont of P p and s ncluded n the restrcted master problem. Therefore, ntally Z(p) contans only one element for each processor p. The restrcted master problem s smlar to the master problem except that t s formulated over the extreme ponts n Z(p). subject to k Z(p) C u Z RMP = mn u T w u C u (4) k Z(p(u)) λ k p(u) Ek p(u) (u) u (5a) λ k p = p (5b) C v C u + t v (u, v) L (5c) Note that Z RMP Z MP. We now wrte the dual to the master problem whch we call the column generator (CG): Z CG = max δ p + t v π uv (6) p v subject to θ u v:(u,v) L π uv + v:(v,u) L u:(u,v) L π vu w u u (7a) δ p u J p E k p (u)θ u 0 k p (7b) By lnear programmng dualty we know that Z CG = Z MP. Moreover any feasble soluton to the dual s a lower bound on Z MP. If we solve the restrcted master problem, and obtan the optmal dual varables then we can check f that soluton s feasble to CG. If t s, then the soluton s optmal to the master problem. The dual optmal soluton to the restrcted master problem wll be feasble to equatons Equaton (7a). However, t may not be feasble to equatons Equaton (7b) snce the RMP uses only a subset of the extreme ponts. Gven a current dual optmal soluton (δp, θu, πuv) to the RMP, we have to check f δp Ep k (u)θu k p. u J p Ths s equvalent to the followng lnear programmng problem. Z p (θ ) = mn θ C u P p uc u (8) u J p Ths s just the classcal problem of mnmzng the weghted completon tme of the tasks on processor p. The weght of task u s θu n ths case. The soluton procedure s called Smth s Rule [9] and s the followng: Sort the tasks such that θ θ... θ n p t t t np and schedulng the tasks n the order of ncreasng ndex. In ths case, the completon tme of task u s = u v= t v and n p Z p (θ ) = θuc u. (9) u= Note that defnes an extreme pont of the polyhedron P p. If δp Z p (θ), for all p then the current value of Z RMP = Z MP. If there s a p such that δp > Z p (θ) then extreme pont correspondng to ths processor s added to the restrcted master problem and the restrcted master s solved agan. Gven the current dual varable values of θ and π, we can derve a feasble dual soluton as follows: Equaton (7a) s satsfed by any dual soluton to the RMP. We know that settng δ p = Z p (θ ) makes a feasble dual soluton. Therefore, Z p (θ ) + t v πuv (0) p v u:(u,v) L s a lower bound on Z MP. The lower bound may not be monotoncally ncreasng. Therefore, we keep track of the current best (hghest) lower bound as LB. Instead of solvng the MP to optmalty, gven some threshold ɛ we can termnate computaton f the rato of the soluton to the restrcted master problem to the best lower bound s less than ( + ɛ). Algorthm summarzes the column generaton procedure.
4 Algorthm Column Generaton S j Release Tme : For each processor p generate Ep. : repeat 3: Solve RMP and obtan the dual varables. 4: for all processors p do 5: Solve CG for p and compute Z p(θ ) as n Equaton (9) 6: f (Z p(θ ) < δp ) then 7: Add extreme pont to RMP 8: Update lower bound LB as n Equaton (0) 9: untl No new extreme ponts or Z RMP < ( + ɛ)lb Map Tasks Reduce Tasks III. MARES: A MAPREDUCE SCHEDULER Solvng the lnear program MP defned n the last secton gves a lower bound on the mn weghted sum schedule. We outlne how to convert ths (nfeasble) lnear programmng soluton to a feasble soluton to the schedulng problem that s guaranteed to be wthn a constant factor of the lower bound (and hence wthn a constant factor of the optmal soluton). In general, t s not possble to convert the lnear programmng soluton nto a constant factor approxmaton algorthm for the schedulng problem. However, the specfc structure of the precedence graph for the MapReduce problem leads to a constant factor approxmaton algorthm. In ths secton, we focus on the MapReduce problem where the shuffle phase s not the bottleneck. Ths reduces the problem to a twophase structure. The constant factor approxmaton algorthm wll be called MapReduce Scheduler (MARES). Leavng out the shuffle phase s done for two reasons. Frst, the deas n MARES are used to derve the more complex algorthm n the next secton when the shuffle operaton n ntroduced nto the pcture. Second, there are nstances when shuffle s not a sgnfcant bottleneck and n ths case the algorthm developed here could be used to schedule the tasks. In order to keep the formulaton of the problem consstent wth the model n the last secton, we ntroduce dummy tasks nto the model. Each dummy task s assgned to ts own dummy processor. Ths dummy processor s not part of the lnear programmng formulaton. Dummy tasks wll ntroduce new precedence constrants nto the problem and these constrants are taken nto account n the lnear programmng formulaton. We ntroduce two dummy tasks for each job j : Start tme dummy task S j that takes r j unt of tme. There s a lnk n the precedence graph from S j to all the map tasks for job j. Fnsh tme dummy task F j whose processng tme s zero and there s a lnk from every reduce task of job j to F j n the precedence graph. Wth these two modfcatons, an example of the precedence graph of a job j s shown n Fgure. In ths example job j has 3 map tasks, and 3, and reduce tasks 4 and 5. The objectve of the lnear programmng problem s to mnmze the weghted sum of the completon tme of the fnsh tme tasks,.e., mn j w jc Fj. (The value of w u = 0 for all u / F j ). The lnear programmng relaxaton of the schedulng problem can be solved to get a lower bound on the optmal soluton value. We use the lnear programmng soluton to get a feasble soluton that s wthn a factor of 8 of the lower F j Fg. : Task Precedence of a Job Completon Tme bound, hence gvng an 8approxmaton algorthm. The basc dea n MARES s the followng: Task u becomes avalable only at ts completon tme C u. Tasks are scheduled n ncreasng order of completon tme as long as all the predecessors of the task have been completed. If the predecessor of a task are not completed, t wats untl ts predecessors complete and then s scheduled n the order of ts completon tme. Note that the completon tme s used for determnng the avalable tme of the job as well as the schedulng order. The algorthm MARES s outlned n Algorthm. In the descrpton, t s assumed that tme s slotted and the decsons are made n each tme slot. It s easy to make ths process polynomal tme by advancng tme by the task processng tmes. Algorthm MARES : Solve the relaxaton and obtan for all tasks. : A u s the avalable tme of u. 3: for each tme slot t do 4: for all processors p do 5: f p s not busy then 6: Schedule avalable u J p, A u t wth the lowest We now outlne the proof of the performance guarantee of the MARES. The proof explots the specal structure of the precedence constrants for mapreduce n order to derve the approxmaton rato. The approxmaton results wll not hold f the precedence relatons are arbtrary. We use M j to denote the set of map tasks for job j and R j to denote the set of reduce tasks for job j. Theorem : Let Z MP and Z H denote the objectve functon value of the MP lnear program and MARES respectvely. Then Z H 8 Z MP. Proof: Let H denote the fnsh completon tme of task u on processor p(u). Snce tasks are scheduled n the order of completon tmes, once a map task u becomes avalable on a partcular processor, the only tasks that can be processed on the processor before u are those whose completon tme s not greater than. We defne D(u) = {v : v J p(u), Cv }.
5 Therefore, for u M j, we have H + t v () And from Equaton () we have t v C v f(d(u), p(u)) Usng the fact that wrte C u t v () Cv for all v D(u), we can t v t v (3) Therefore t v, u M j for all jobs j. Applyng ths to Equaton () gves H 3, u M j. (4) We use CM H j = max u Mj H to denote the completon tme of all the map tasks belongng to job j. There s a key dfference between schedulng map and reduce tasks. A map task u can be scheduled as soon at t becomes avalable at ts completon tme. A reduce task u, on the other hand, even f t becomes avalable at, can only be scheduled after all ts predecessor map tasks are completed. In other words, a reduce task u R j can be scheduled only after CM H j. Once all the predecessors of a reduce task are completed, all jobs that wll be processed before the reduce task wll have completon tme lower than the reduce job. Ths statement s almost true except for the case where a task w wth Cw > s already n process on the processor p(u) when the predecessors of u R j are completed. In ths case, even f the reduce job has a lower completon tme, due to nonpreempton, the task w wll be completed before the reduce job s started. Therefore, we can wrte for all u R j, C H u C H M j + t v + t w (5) where Cw > and t w Cw CM H j. The reason we have Cw CM H j s because task w should have become avalable before the map completon tme snce t was n process when the map job completed. Use smlar analyss as for the map jobs, we have t v and CM H j 3 CM j. Wth t w Cw CM H j, Equaton (5) becomes H CM H j + + CM H j 6 CM j + 8 where the last nequalty follows from the fact that f v M j and u R j, then Cv snce ths wll be a constrant n the lnear program. Ths mples Z H 8 Z. A. HMARES: A Heurstc Implementaton of MARES The dea of makng a job avalable at ts completon tme s manly to bound the worst case performance. A smple heurstc mplementaton of MARES whch we call HMARES schedules tasks n the order of completon tme wthout watng for t to become avalable. The only reason for a task to wat s f some of ts predecessors have not been completed. The descrpton of the algorthm s exactly the same MARES except that A u = r j f u M j and A u = 0 otherwse. Recall that r j s the tme at whch job j enters the system. If all jobs are avalable at tme zero then A u = 0 for all tasks u. The soluton of the and the schedulng s done exactly as n MARES. We show that HMARES outperforms MARES n all the experments performed. However t does not seem straghtforward to prove any theoretcal performance guarantee for HMARES. In the performance evaluaton secton, we show that HMARES does very well n practce and on the average s wthn a factor of.5 of the lower bound. IV. MASHERS: A MAPSHUFFLEREDUCE SCHEDULER We now address the problem of jontly schedulng map, shuffle and reduce operatons. Before we gve a descrpton of the schedulng algorthm we frst outlne the constrants mposed by the shuffle process. When a map task for a job s completed, the result of ths task has to be sent to all the reduce tasks for the job. Some reduce tasks may be on the same processor as the map task whle others may be on dfferent processors. Each processor s assumed to have an egress port that s used to send out data and an ngress port to receve data. We assume that once data transfer between two processor begns t cannot be nterrupted. We also assume that at most one fle can be transmtted or receved at any gven tme nstant for any processor. It s possble to relax ths assumpton to make multple tasks share the shuffle lnk but the analyss becomes more complex. Unlke map and reduce tasks, shuffle tasks have to smultaneously possess resources at the sendng and recevng sde n order to complete the transmsson. Therefore shuffle processng can be vewed as an edge schedulng algorthm. Further, the shuffle process s done after the map process and ths precedence constrant before the shuffle phase makes the analyss more complex and the compettve rato s looser than just schedulng when multple resources are needed. to complete a task. In order to make the formulaton compatble wth the model n Secton II, we model the shuffle process as follows: Correspondng to processor p n the system, we add two more processors I(p) and O(p) where I(p) represents the ngress port at processor p and O(p) represents the egress port. Each of these addtonal processors s treated lke regular a processor n the formulaton. In addton to the start and fnsh nodes for each job as n Secton III, we ntroduce M j R j addtonal nodes where a par of nodes represent the two ends of the transfers. Each map node has R j transfer nodes assocated wth t and each reduce node has M j transfer nodes assocated wth t. Each map task precedes ts correspondng transfer tasks and each reduce task succeeds ts correspondng transfer task. The map part of the transfer precedes
6 S j Release Tme Algorthm 3 MASHERS : Solve the and obtan for all tasks. : Execute Algorthm 4 and obtan G, G,...G k 3: A u, A e(u,v) for e = (u, v) SH. 4: for each tme slot t do 5: Fnd the largest such that G has unscheduled edges. 6: ES {e : e E, Ae t, e s free and avalable} 7: whle ES do 8: Schedule e(u, v) ES wth the smallest Ce 9: for all processors p do 0: f p s not busy then : Schedule u J p, A u t wth the lowest predecessors are completed Map Tasks Output Tasks Input Tasks whose 6 F j Fg. : Task Precedence of a Job wth Data Shuffle 7 Reduce Tasks Completon Tme the reduce part of the transfer. The processng tme of the map sde transfer equals the actual transfer tme between the correspondng map and reduce processors; the processng tme of the reduce sde transfer s set to be zero. The fact that two resources have to be held smultaneously s not consdered n the lnear program and s only ntroduced n the schedulng algorthm. The completon tme of the two tasks that are at the two ends of a transfer are set equal. An example of ths new precedence graph s shown n Fgure. In the example, task, and 3 are map tasks, task 6 and 7 are reduce tasks. Tasks from 4 to 9 are outgong tasks and task from 0 to 5 are the ncomng tasks. In Fgure, the completon tme of task 9 and task 5 are set equal snce they represent a transfer. The same statement can be made for all the transfer task pars. Wth ths new precedence graph, the formulaton stays the same as n Secton II. The s solved usng the column generaton procedure and all the completon tmes are obtaned. A. An outlne of MASHERS We frst gve a hgh level vew of MASHERS Algorthm 3: Once the lower bound s solved, the map and reduce tasks are scheduled as Algorthm. A reduce task has to wat untl all ts predecessors are completed. The Shuffle tasks are vewed as edge schedulng problem whch schedules the shuffle edges n the precedence graph. Ths s done by parttonng the edges nto groups usng Algorthm 4 and schedulng the groups n order. Let G 0 denote the orgnal precedence constraned graph; let G and G denote the edge groups parttoned n the the teraton of Algorthm 4. Wthn a group, edges are scheduled n the order of ther completon tme. Let SH denote the set of shuffle edges n the precedence graph. In Fgure, edge (9, 5) s a shuffle edge that transfers the output of map task 3 to the reduce task 7. We extend the defnton of the completon tme as follows: We defne the completon tme of a shuffle edge e = (u, v) n the lnear programmng soluton as the completon tme of ts nodes, that s, C = C = C e u v. We assume tme s slotted and the schedulng algorthm s executed n each tme slot. In the descrpton of the algorthm MASHERS we assume that algorthm PARTITION has already parttoned the shuffle edges. We now descrbe the parttonng process. B. Parttonng the Shuffle Edges The partton algorthm groups the shuffle edges. The key dea s to construct the groups such that edges wth smlar processng tmes and completon tmes belong to the same group. In the descrpton of Algorthm 4 we use the followng addtonal notaton. Gven a graph H, m(h) = max e H Ce denote the maxmum completon tme of an edge n H. Gven some set of edges S, let p(s) denote sum of the processng tmes of the edges n S. Let φ(g ) = 3m(G ) + p(g ). Algorthm 4 PARTITION : Sort edges E n G 0 n nondecreasng order of C e. : 3: repeat 4: E 5: for all e E n the sorted order do 6: f φ(g ) φ(g ) holds by addng e to E then 7: add e to E 8: else 9: add e to E 0: + : untl E s empty The parttonng and schedulng algorthm works as follows: The algorthm frst parttons G 0 nto G and G wth respect to φ(g ) φ(g 0). Then t teratvely parttons G nto G and G untl G s empty. All edges n G are scheduled before consderng edges n G. C. MASHERS: Performance Guarantee In ths secton, we show that MASHERS gves a constant factor compettve rato. Ths s done by frst analyzng the performance of the shuffle operaton and then ncorporatng the map and reduce nto the analyss. The partton process s smlar to the partton operaton n [8]. The potental functon φ(g ) used for parttonng n [8] just nvolves the processng tme (the completon tmes are only used for orderng). For our problem, the potental functon used for parttonng s a
7 weghted sum of the completon tme of the edges and the total processng tme at the node. Ths s done n order to nclude the performance of the map operaton n the overall analyss. We now gve prelmnary result that s used to bound the performance guarantee of MASHERS. Lemma : For e E, C H e φ(g ) (6) Proof: We prove ths by nducton. Let N G (u) denote the set of edges ncdent to u G. p(n G (u)) s the sum of transfer tmes n N G (u). For an edge e = (u, v) n graph G k, we have C H e C H u + p(n G k (u)) + p(n G k (v)) C H u + p(g k) = C H u + p(g k ) 3 + p(g k ) φ(g k ). Now we wll prove that f we can schedule all edges n E k, E k,..., E + wthn φ(g ), then all edges n E k, E k,..., E can be scheduled wthn φ(g ). For e = (u, v) E, t needs to wat for at most φ(g ) tme before the set E s consdered. In addton, t can take H + p(n G (u)) + p(n G (v)) to complete. That s, Ce H φ(g ) + H + p(n G (u)) + p(n G (v)) (7) Snce we have φ(g ) φ(g )/ and H + p(n G (u)) + p(n G (v)) 3Ce + p(g ) φ(g ), Ce H φ(g )/ + φ(g ) = φ(g ) (8) Ths proves the lemma. Theorem 3: Let Z and Z H denote the objectve functon value of and the sum of completon tme by Algorthm 4 respectvely. Then Z H 58 Z. Proof: As n the case of MASHERS, for u M j, H 3 (9) An edge e = (u, v) E s added to G, because t could not be added to G. Let p(u) denote the sum of transfer tmes of edges ncdent to u n G whose completon tmes are less than Ce. Snce edges are added n the order of completon tmes and e could not be added to G mples that + p(u) > φ(g ). 3C e (In the above expresson we assume, wthout loss of generalty that node u was the blockng node.) Let D(u) denote the set of edges ncdent to u n G 0 and wth smaller completon tme than e, then p(d(u)) p(u). Together wth Ce p(d(u))/ from the we have, φ(g ) 3Ce Combne t wth Equaton (6), we have C H e + p(d(u)) 7C e 8 C e (0) For reduce tasks, smlar to the proof n Secton III, for any v R j t u + t w () C H v C H v + u D(v) 8 Cv + Cv + 8 Cv () 58 Cv (3) Ths mples Z H 58 Z. There are two comments n order here: It s possble to tghten the analyss to get a better approxmaton rato but we do not do that here n order keep the analyss smple. We use MASHERS to develop two heurstc algorthms DMASHERS and HMASHERS that we evaluate n the next secton. D. Heurstcs DMASHERS and HMASHERS The man reason for parttonng the shuffle edges s to guard aganst the worst case where a edge wth a long processng tme s scheduled early and t delays a large number of tasks. Though MASHERS provdes a worst case compettve guarantee, ts performance suffers n practce snce t guards aganst corner cases. If the transfer tmes are not too dfferent then we can skp the parttonng process and just use the soluton to gude whch task to schedule. We can use two dfferent varants: DMASHERS: A task s delayed by ts completon tme and s scheduled n the order of completon tmes as long as all ts predecessors are done. HMASHERS: Tasks are not delayed by ther completon tmes and are scheduled n the order of completon tmes as long as ts predecessors are done. In the expermental secton, we evaluate both DMASHERS and HMASHERS. V. PERFORMANCE EVALUATION In ths secton, we present a smulaton based evaluaton of the algorthms developed n ths paper. A. Smulaton Settng The workloads of real MapReduce system are not publcly avalable. Therefore, we use a synthetc workload generaton model smlar to []. Gven the number of jobs and processors, we generate a set of map and reduce tasks for each job. The processng tme of a task s unformly dstrbuted n [, 0] unt. Between a map task of sze m and a reduce task of sze r, the data shuffle delay s set to be (m + r)/3. For n jobs, ther weghts are unformly dstrbuted n [, n] and ther release tmes are unformly dstrbuted n [, 0]. Once we generate a problem nstance, we pass ts workload nformaton to our scheduler. We assume that the processng tme and shuffle delays are known to the scheduler. In the onlne case where jobs arrve over tme and ther nformaton s known only when they arrve, we update job nformaton by addng new jobs and remove fnshed tasks at the moment a job arrves. Then we run our scheduler agan. Furthermore, we may also adjust the exstng sets of extreme ponts for the new set of jobs, so that our scheduler does not have to run from the begnnng. However, ths s not the man focus of ths paper and s left for future work.
8 B. Stoppng Threshold Test As mentoned n Secton II, for large problem nstances, we may stop the Column Generaton once the rato of the master problem to the best lower bound becomes less than a threshold + ɛ. For the purpose of smulatng some large problem nstances, we need to test the effect of dfferent ɛ s on the lower bound and our heurstcs and pck a sutable ɛ value. In each test, frst we solve the optmal as reference pont. Then for the same problem nstance, we vary ɛ and solve the problem wth our heurstcs usng the degraded soluton. For the two phase problem, we use 50 processors and 30 jobs. Each job has 0 map tasks and 0 reduce tasks on average. For the three phase problem, we use 30 processors and 0 jobs. Each job has 8 map tasks and 3 reduce tasks on average. Then for each ɛ value, we run 0 trals. The results n Fgure 3 are compettve ratos (results over the optmal) averaged n these 0 trals. The threshold does not affect the performance of ether the or the MARES as shown n Fgure 3a. We only see a slght decrease/ncrease n the lower bound/mares compettve rato, even when ɛ = 0.5. Ths s not true for the three phase mapshufflereduce problem. In Fgure 3b, although the lower bound does not decrease much wth a large threshold, the compettve rato of DMASHER ncreases dramatcally. Therefore, n the followng tests, we always set ɛ = 0.5 for the two phase but ɛ = 0 for the three phase. Compettve Ratos lower bound 0 MARES Threshold (a) MARES Compettve Ratos Fg. 3: Threshold ɛ Test lower bound DMASHERS Threshold (b) MASHERS In the rest of the tests, for each test, we vary the number of jobs to generate dfferent szes of the problem. For a partcular number of jobs, we run 0 trals and plot them aganst ether the optmal or the lower bound. C. Evaluaton of Two Phase MapReduce Heurstcs For the two phase problem, we mplement an Integer Program (IP). Wth small problem nstances, we may obtan the optmal soluton n reasonable tme. Fgure 4a shows the compettve ratos for only 30 processors and from 5 to 6 jobs; each job has 0 map tasks and 3 reduce tasks on average. As we can see, HMARES s the closest to the optmal and the lower bound s closer than MARES, whose compettve rato s about.5 to. Wth large problem nstances wth 00 processors and from 5 to 50 jobs (30 map tasks and 0 reduce tasks per job), we do not calculate the optmal soluton. However, we can stll see n Fgure 4b the compettve ratos of MARES and HMARES are far away from MARES approxmaton guarantee of 8. D. Evaluaton of MapShuffleReduce Heurstcs We dd smlar tests for the three phase problem except that we do not have an IP to obtan the optmal soluton but we compre the heurstc results to the lower bound. Wth 30 processors and from 5 to 40 jobs (each has 8 map tasks and 3 reduce tasks on average), however, we can stll see n Fgure 5a, the two heurstcs generally acheve compettve rato less than 3. In Fgure 5b we plot the average compettve rato as we ncrease the number of jobs. It s nterestng to see that when there are fewer jobs, HMASHER beats DMASHER, for t does not hold a task. When there are more and more job the performance of DMASHERS mproves when compared to H MASHERS. Ths seems to ndcate that hold a task untl ts completon tme actually mproves the performance of the heurstc. One possble explanaton for ths phenomenon s that n larger problems there s a wder range of processng tme values and not holdng long tasks back hurts the completon tme of all tasks that come after t. VI. RELATED WORK Job schedulng on parallel machnes s a wellstuded problem, refer to [9] for more detals. In partcular, our work s related to the problem of machne schedulng wth precedence constrants. The schedulng problem wth a general precedence graph s among the most dffcult problems n the area of machne schedulng [0], [], [], [3]. For the objectve of mnmzng the total weghted completon tme, the best approxmaton algorthm s due to Queyranne and Schulz [3]. In the problem that we consder, the precedence between map and reduce nduces a depth two precedence graph. However unlke the standard machne schedulng problems, schedulng the shuffle phase resembles the data mgraton problem. In data mgraton problems, the mgraton process s modeled by a transfer graph, n whch nodes represent the data transferrng enttes and edges represent the transfer lnks. Two edges ncdent on the same node cannot transfer smultaneously. A node completes when all edges ncdent on t complete. Km [8] gave an based 0approxmaton for general processng tmes, whch was then mproved by Gandh and Mestre [4] wth a combnatoral algorthm. Hajaghay et al. [5] generalze the technque n [4] and provde a local transfer protocol where multple transfers can take place concurrently. Other related work s on sum multcolorng of graphs ncludng [6], [7]. Our work dffers from all these papers snce the map phase that precedes the shuffle phase makes the analyss dfferent from a tradtonal data transfer problems. The closest work to our problem s by Chang et al. [], but that paper does not consder the explct precedence constrants between map and reduce as well a the shuffle phase of the MapReduce problem. VII. CONCLUSION We modeled to endtoend performance of the MapReduce process and developed constant factor approxmaton algorthms for mnmzng the weghted response tme. Based
9 Compettve Rato Lower Bound HMARES MARES Problem nstance (a) Results agant the Optmal Compettve Rato Fg. 4: Performance test for Two phases MapReduce 0.5 HMARES 0 MARES Problem nstance (b) Large problem nstances Compettve Rato HMASHERS.4 DMASHERS Problem nstance (a) Compettve Ratos Compettve Rato Fg. 5: Performance test for MapShuffleReduce Number of Jobs (b) Average HMASHERS DMASHERS on these guaranteed performance algorthms we developed heurstcs that were tested expermentally and performed sgnfcantly better that the worst case guaranteed performance. We are currently workng on mprovng the worst case performance guarantees as well as testng the algorthm on larger data sets. REFERENCES [] J. Dean and S. Ghemawat, MapReduce: Smplfed Data Processng on Large Clusters, n USENIX OSDI, 004. [] H. Chang, M. S. Kodalam, R. R. Kompella, T. V. Lakshman, M. Lee, and S. Mukherjee, Schedulng n mapreducelke systems for fast completon tme, n IEEE INFOCOM, 0. [3] M. Chowdhury, M. Zahara, J. Ma, M. I. Jordan, and I. Stoca, Managng data transfers n computer clusters wth orchestra, n ACM SIGCOMM, 0. [4] M. Isard, V. Prabhakaran, J. rrey, U. Weder, K. Talwar, and A. Goldberg, Quncy: Far Schedulng for Dstrbuted Computng Clusters, n SOSP, 009. [5] M. Zahara, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoca, Delay schedulng: a smple technque for achevng localty and farness n cluster schedulng, n EuroSys 0: Proceedngs of the 5th European conference on Computer systems, 00, pp [6] G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoca, Y. Lu, B. Saha, and E. Harrs, Renng n the outlers n mapreduce clusters usng mantr, n USENIX OSDI, 00. [7] J. Lenstra, A. Rnnooy Kan, and P. Brucker, Complexty of machne schedulng problems, Annals of Dscrete Mathematcs, vol., pp , 977. [8] Y. Km, Data mgraton to mnmze the total completon tme, Journal of Algorthms, vol. 55, no., pp. 4 57, 005. [9] P. Brucker, Schedulng algorthms. Sprnger, 004. [0] R. Graham, Bounds on multprocessng tmng anomales, SIAM Journal on Appled Mathematcs, vol. 7, no., pp , 969. [] J. Lenstra and A. Kan, Complexty of schedulng under precedence constrants, Operatons Research, pp. 35, 978. [] S. Chakrabart, C. Phllps, A. Schulz, D. Shmoys, C. Sten, and J. Wen, Improved schedulng algorthms for mnsum crtera, Automata, Languages and Programmng, pp , 996. [3] M. Queyranne and A. Schulz, Approxmaton bounds for a general class of precedence constraned parallel machne schedulng problems, SIAM Journal on Computng, vol. 35, no. 5, pp. 4 53, 006. [4] R. Gandh and J. Mestre, Combnatoral algorthms for data mgraton to mnmze average completon tme, Algorthmca, vol. 54, no., pp. 54 7, 009. [5] M. Hajaghay, R. Khandekar, G. Kortsarz, and V. Laghat, On a local protocol for concurrent fle transfers, n Proceedngs of the 3rd ACM symposum on Parallelsm n algorthms and archtectures. ACM, 0, pp [6] A. BarNoy, M. Halldórsson, G. Kortsarz, R. Salman, and H. Shachna, Sum multcolorng of graphs, Journal of Algorthms, vol. 37, no., pp , 000. [7] R. Gandh, M. Halldórsson, G. Kortsarz, and H. Shachna, Improved bounds for schedulng conflctng jobs wth mnsum crtera, ACM Transactons on Algorthms (TALG), vol. 4, no., p., 008.
1 Approximation Algorithms
CME 305: Dscrete Mathematcs and Algorthms 1 Approxmaton Algorthms In lght of the apparent ntractablty of the problems we beleve not to le n P, t makes sense to pursue deas other than complete solutons
More informationThe Greedy Method. Introduction. 0/1 Knapsack Problem
The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton
More informationProject Networks With MixedTime Constraints
Project Networs Wth MxedTme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa
More informationLuby s Alg. for Maximal Independent Sets using Pairwise Independence
Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent
More informationJ. Parallel Distrib. Comput.
J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n
More informationModule 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..
More informationCommunication Networks II Contents
8 / 1  Communcaton Networs II (Görg)  www.comnets.unbremen.de Communcaton Networs II Contents 1 Fundamentals of probablty theory 2 Traffc n communcaton networs 3 Stochastc & Marovan Processes (SP
More informationRecurrence. 1 Definitions and main statements
Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.
More informationInstitute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic
Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange
More informationAn Alternative Way to Measure Private Equity Performance
An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate
More informationSupport Vector Machines
Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.
More informationData Broadcast on a MultiSystem Heterogeneous Overlayed Wireless Network *
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819840 (2008) Data Broadcast on a MultSystem Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,
More informationAn MILP model for planning of batch plants operating in a campaignmode
An MILP model for plannng of batch plants operatng n a campagnmode Yanna Fumero Insttuto de Desarrollo y Dseño CONICET UTN yfumero@santafeconcet.gov.ar Gabrela Corsano Insttuto de Desarrollo y Dseño
More information8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by
6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng
More informationbenefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).
REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or
More informationPowerofTwo Policies for Single Warehouse MultiRetailer Inventory Systems with Order Frequency Discounts
Powerofwo Polces for Sngle Warehouse MultRetaler Inventory Systems wth Order Frequency Dscounts José A. Ventura Pennsylvana State Unversty (USA) Yale. Herer echnon Israel Insttute of echnology (Israel)
More informationThe Development of Web Log Mining Based on ImproveKMeans Clustering Analysis
The Development of Web Log Mnng Based on ImproveKMeans Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.
More informationPeriod and Deadline Selection for Schedulability in RealTime Systems
Perod and Deadlne Selecton for Schedulablty n RealTme Systems Thdapat Chantem, Xaofeng Wang, M.D. Lemmon, and X. Sharon Hu Department of Computer Scence and Engneerng, Department of Electrcal Engneerng
More informationLogical Development Of Vogel s Approximation Method (LDVAM): An Approach To Find Basic Feasible Solution Of Transportation Problem
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME, ISSUE, FEBRUARY ISSN 77866 Logcal Development Of Vogel s Approxmaton Method (LD An Approach To Fnd Basc Feasble Soluton Of Transportaton
More informationAn Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems
STANCS73355 I SUSE73013 An Analyss of Central Processor Schedulng n Multprogrammed Computer Systems (Dgest Edton) by Thomas G. Prce October 1972 Techncal Report No. 57 Reproducton n whole or n part
More informationPAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of IllinoisUrbana Champaign
PAS: A Packet Accountng System to Lmt the Effects of DoS & DDoS Debsh Fesehaye & Klara Naherstedt Unversty of IllnosUrbana Champagn DoS and DDoS DDoS attacks are ncreasng threats to our dgtal world. Exstng
More information8 Algorithm for Binary Searching in Trees
8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the
More informationAn InterestOriented Network Evolution Mechanism for Online Communities
An InterestOrented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne
More information1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)
6.3 /  Communcaton Networks II (Görg) SS20  www.comnets.unbremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes
More informationChapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT
Chapter 4 ECOOMIC DISATCH AD UIT COMMITMET ITRODUCTIO A power system has several power plants. Each power plant has several generatng unts. At any pont of tme, the total load n the system s met by the
More information1 Example 1: Axisaligned rectangles
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton
More informationWhat is Candidate Sampling
What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble
More informationCompiling for Parallelism & Locality. Dependence Testing in General. Algorithms for Solving the Dependence Problem. Dependence Testing
Complng for Parallelsm & Localty Dependence Testng n General Assgnments Deadlne for proect 4 extended to Dec 1 Last tme Data dependences and loops Today Fnsh data dependence analyss for loops General code
More informationGeneralizing the degree sequence problem
Mddlebury College March 2009 Arzona State Unversty Dscrete Mathematcs Semnar The degree sequence problem Problem: Gven an nteger sequence d = (d 1,...,d n ) determne f there exsts a graph G wth d as ts
More informationPOLYSA: A Polynomial Algorithm for Nonbinary Constraint Satisfaction Problems with and
POLYSA: A Polynomal Algorthm for Nonbnary Constrant Satsfacton Problems wth and Mguel A. Saldo, Federco Barber Dpto. Sstemas Informátcos y Computacón Unversdad Poltécnca de Valenca, Camno de Vera s/n
More informationSolutions to the exam in SF2862, June 2009
Solutons to the exam n SF86, June 009 Exercse 1. Ths s a determnstc perodcrevew nventory model. Let n = the number of consdered wees,.e. n = 4 n ths exercse, and r = the demand at wee,.e. r 1 = r = r
More informationA hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm
Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):18841889 Research Artcle ISSN : 09757384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel
More informationFault tolerance in cloud technologies presented as a service
Internatonal Scentfc Conference Computer Scence 2015 Pavel Dzhunev, PhD student Fault tolerance n cloud technologes presented as a servce INTRODUCTION Improvements n technques for vrtualzaton and performance
More informationRate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Prioritybased scheduling. States of a process
Dsadvantages of cyclc TDDB47 Real Tme Systems Manual scheduler constructon Cannot deal wth any runtme changes What happens f we add a task to the set? RealTme Systems Laboratory Department of Computer
More informationEfficient Project Portfolio as a tool for Enterprise Risk Management
Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse
More informationAvailabilityBased Path Selection and Network Vulnerability Assessment
AvalabltyBased Path Selecton and Network Vulnerablty Assessment Song Yang, Stojan Trajanovsk and Fernando A. Kupers Delft Unversty of Technology, The Netherlands {S.Yang, S.Trajanovsk, F.A.Kupers}@tudelft.nl
More informationFeature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College
Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure
More informationBERNSTEIN POLYNOMIALS
OnLne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful
More informationA Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy Scurve Regression
Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy Scurve Regresson ChengWu Chen, Morrs H. L. Wang and TngYa Hseh Department of Cvl Engneerng, Natonal Central Unversty,
More informationSciences Shenyang, Shenyang, China.
Advanced Materals Research Vols. 314316 (2011) pp 13151320 (2011) Trans Tech Publcatons, Swtzerland do:10.4028/www.scentfc.net/amr.314316.1315 Solvng the TwoObectve Shop Schedulng Problem n MTO Manufacturng
More informationANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING
ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 6105194390,
More informationA Secure PasswordAuthenticated Key Agreement Using Smart Cards
A Secure PasswordAuthentcated Key Agreement Usng Smart Cards Ka Chan 1, WenChung Kuo 2 and JnChou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,
More informationLecture 2: Single Layer Perceptrons Kevin Swingler
Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCullochPtts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses
More informationFORMAL ANALYSIS FOR REALTIME SCHEDULING
FORMAL ANALYSIS FOR REALTIME SCHEDULING Bruno Dutertre and Vctora Stavrdou, SRI Internatonal, Menlo Park, CA Introducton In modern avoncs archtectures, applcaton software ncreasngly reles on servces provded
More informationFisher Markets and Convex Programs
Fsher Markets and Convex Programs Nkhl R. Devanur 1 Introducton Convex programmng dualty s usually stated n ts most general form, wth convex objectve functons and convex constrants. (The book by Boyd and
More informationEE201 Circuit Theory I 2015 Spring. Dr. Yılmaz KALKAN
EE201 Crcut Theory I 2015 Sprng Dr. Yılmaz KALKAN 1. Basc Concepts (Chapter 1 of Nlsson  3 Hrs.) Introducton, Current and Voltage, Power and Energy 2. Basc Laws (Chapter 2&3 of Nlsson  6 Hrs.) Voltage
More informationAN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE
AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE YuL Huang Industral Engneerng Department New Mexco State Unversty Las Cruces, New Mexco 88003, U.S.A. Abstract Patent
More informationINSTITUT FÜR INFORMATIK
INSTITUT FÜR INFORMATIK Schedulng jobs on unform processors revsted Klaus Jansen Chrstna Robene Bercht Nr. 1109 November 2011 ISSN 21926247 CHRISTIANALBRECHTSUNIVERSITÄT ZU KIEL Insttut für Informat
More informationHow Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence
1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh
More information行 政 院 國 家 科 學 委 員 會 補 助 專 題 研 究 計 畫 成 果 報 告 期 中 進 度 報 告
行 政 院 國 家 科 學 委 員 會 補 助 專 題 研 究 計 畫 成 果 報 告 期 中 進 度 報 告 畫 類 別 : 個 別 型 計 畫 半 導 體 產 業 大 型 廠 房 之 設 施 規 劃 計 畫 編 號 :NSC 962628E009026MY3 執 行 期 間 : 2007 年 8 月 1 日 至 2010 年 7 月 31 日 計 畫 主 持 人 : 巫 木 誠 共 同
More informationRobust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School
Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management
More informationOn the Optimal Control of a Cascade of HydroElectric Power Stations
On the Optmal Control of a Cascade of HydroElectrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;
More informationNew bounds in BalogSzemerédiGowers theorem
New bounds n BalogSzemerédGowers theorem By Tomasz Schoen Abstract We prove, n partcular, that every fnte subset A of an abelan group wth the addtve energy κ A 3 contans a set A such that A κ A and A
More informationAn ILP Formulation for Task Mapping and Scheduling on Multicore Architectures
An ILP Formulaton for Task Mappng and Schedulng on Multcore Archtectures Yng Y, We Han, Xn Zhao, Ahmet T. Erdogan and Tughrul Arslan Unversty of Ednburgh, The Kng's Buldngs, Mayfeld Road, Ednburgh, EH9
More information"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *
Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC
More informationDEFINING %COMPLETE IN MICROSOFT PROJECT
CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMISP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,
More informationVirtual Network Embedding with Coordinated Node and Link Mapping
Vrtual Network Embeddng wth Coordnated Node and Lnk Mappng N. M. Mosharaf Kabr Chowdhury Cherton School of Computer Scence Unversty of Waterloo Waterloo, Canada Emal: nmmkchow@uwaterloo.ca Muntasr Rahan
More informationAnswer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy
4.02 Quz Solutons Fall 2004 MultpleChoce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multplechoce questons. For each queston, only one of the answers s correct.
More informationResearch Article Enhanced TwoStep Method via Relaxed Order of αsatisfactory Degrees for Fuzzy Multiobjective Optimization
Hndaw Publshng Corporaton Mathematcal Problems n Engneerng Artcle ID 867836 pages http://dxdoorg/055/204/867836 Research Artcle Enhanced TwoStep Method va Relaxed Order of αsatsfactory Degrees for Fuzzy
More informationgreatest common divisor
4. GCD 1 The greatest common dvsor of two ntegers a and b (not both zero) s the largest nteger whch s a common factor of both a and b. We denote ths number by gcd(a, b), or smply (a, b) when there s no
More informationEnergy Efficient Routing in Ad Hoc Disaster Recovery Networks
Energy Effcent Routng n Ad Hoc Dsaster Recovery Networks Gl Zussman and Adran Segall Department of Electrcal Engneerng Technon Israel Insttute of Technology Hafa 32000, Israel {glz@tx, segall@ee}.technon.ac.l
More informationEnabling P2P Oneview Multiparty Video Conferencing
Enablng P2P Onevew Multparty Vdeo Conferencng Yongxang Zhao, Yong Lu, Changja Chen, and JanYn Zhang Abstract MultParty Vdeo Conferencng (MPVC) facltates realtme group nteracton between users. Whle P2P
More informationA Constant Factor Approximation for the Single Sink Edge Installation Problem
A Constant Factor Approxmaton for the Sngle Snk Edge Installaton Problem Sudpto Guha Adam Meyerson Kamesh Munagala Abstract We present the frst constant approxmaton to the sngle snk buyatbulk network
More informationForecasting the Direction and Strength of Stock Market Movement
Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract  Stock market s one of the most complcated systems
More informationA Performance Analysis of View Maintenance Techniques for Data Warehouses
A Performance Analyss of Vew Mantenance Technques for Data Warehouses Xng Wang Dell Computer Corporaton Round Roc, Texas Le Gruenwald The nversty of Olahoma School of Computer Scence orman, OK 739 Guangtao
More informationA Simple Approach to Clustering in Excel
A Smple Approach to Clusterng n Excel Aravnd H Center for Computatonal Engneerng and Networng Amrta Vshwa Vdyapeetham, Combatore, Inda C Rajgopal Center for Computatonal Engneerng and Networng Amrta Vshwa
More informationTo Fill or not to Fill: The Gas Station Problem
To Fll or not to Fll: The Gas Staton Problem Samr Khuller Azarakhsh Malekan Julán Mestre Abstract In ths paper we study several routng problems that generalze shortest paths and the Travelng Salesman Problem.
More informationSurvey on Virtual Machine Placement Techniques in Cloud Computing Environment
Survey on Vrtual Machne Placement Technques n Cloud Computng Envronment Rajeev Kumar Gupta and R. K. Paterya Department of Computer Scence & Engneerng, MANIT, Bhopal, Inda ABSTRACT In tradtonal data center
More informationLoop Parallelization
  Loop Parallelzaton C52 Complaton steps: nested loops operatng on arrays, sequentell executon of teraton space DECLARE B[..,..+] FOR I :=.. FOR J :=.. I B[I,J] := B[I,J]+B[I,J] ED FOR ED FOR analyze
More informationMultiplePeriod Attribution: Residuals and Compounding
MultplePerod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens
More informationDominant Resource Fairness in Cloud Computing Systems with Heterogeneous Servers
1 Domnant Resource Farness n Cloud Computng Systems wth Heterogeneous Servers We Wang, Baochun L, Ben Lang Department of Electrcal and Computer Engneerng Unversty of Toronto arxv:138.83v1 [cs.dc] 1 Aug
More informationNONCONSTANT SUM REDANDBLACK GAMES WITH BETDEPENDENT WIN PROBABILITY FUNCTION LAURA PONTIGGIA, University of the Sciences in Philadelphia
To appear n Journal o Appled Probablty June 2007 OCOSTAT SUM REDADBLACK GAMES WITH BETDEPEDET WI PROBABILITY FUCTIO LAURA POTIGGIA, Unversty o the Scences n Phladelpha Abstract In ths paper we nvestgate
More informationMultiResource Fair Allocation in Heterogeneous Cloud Computing Systems
1 MultResource Far Allocaton n Heterogeneous Cloud Computng Systems We Wang, Student Member, IEEE, Ben Lang, Senor Member, IEEE, Baochun L, Senor Member, IEEE Abstract We study the multresource allocaton
More informationA DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATIONBASED OPTIMIZATION. Michael E. Kuhl Radhamés A. TolentinoPeña
Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATIONBASED OPTIMIZATION
More informationMinimal Coding Network With Combinatorial Structure For Instantaneous Recovery From Edge Failures
Mnmal Codng Network Wth Combnatoral Structure For Instantaneous Recovery From Edge Falures Ashly Joseph 1, Mr.M.Sadsh Sendl 2, Dr.S.Karthk 3 1 Fnal Year ME CSE Student Department of Computer Scence Engneerng
More informationMining Multiple Large Data Sources
The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of
More informationExamensarbete. Rotating Workforce Scheduling. Caroline Granfeldt
Examensarbete Rotatng Workforce Schedulng Carolne Granfeldt LTH  MAT  EX   2015 / 08   SE Rotatng Workforce Schedulng Optmerngslära, Lnköpngs Unverstet Carolne Granfeldt LTH  MAT  EX   2015
More informationTraffic State Estimation in the Traffic Management Center of Berlin
Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D763 Karlsruhe, Germany phone ++49/72/965/35, emal peter.vortsch@ptv.de Peter Möhl, PTV AG,
More informationAnts Can Schedule Software Projects
Ants Can Schedule Software Proects Broderck Crawford 1,2, Rcardo Soto 1,3, Frankln Johnson 4, and Erc Monfroy 5 1 Pontfca Unversdad Católca de Valparaíso, Chle FrstName.Name@ucv.cl 2 Unversdad Fns Terrae,
More informationA Prefix Code Matching Parallel LoadBalancing Method for SolutionAdaptive Unstructured Finite Element Graphs on Distributed Memory Multicomputers
Ž. The Journal of Supercomputng, 15, 25 49 2000 2000 Kluwer Academc Publshers. Manufactured n The Netherlands. A Prefx Code Matchng Parallel LoadBalancng Method for SolutonAdaptve Unstructured Fnte Element
More informationSolving Factored MDPs with Continuous and Discrete Variables
Solvng Factored MPs wth Contnuous and screte Varables Carlos Guestrn Berkeley Research Center Intel Corporaton Mlos Hauskrecht epartment of Computer Scence Unversty of Pttsburgh Branslav Kveton Intellgent
More informationLecture 7 March 20, 2002
MIT 8.996: Topc n TCS: Internet Research Problems Sprng 2002 Lecture 7 March 20, 2002 Lecturer: Bran Dean Global Load Balancng Scrbe: John Kogel, Ben Leong In today s lecture, we dscuss global load balancng
More informationRealTime Process Scheduling
RealTme Process Schedulng ktw@cse.ntu.edu.tw (RealTme and Embedded Systems Laboratory) Independent Process Schedulng Processes share nothng but CPU Papers for dscussons: C.L. Lu and James. W. Layland,
More informationSimple Interest Loans (Section 5.1) :
Chapter 5 Fnance The frst part of ths revew wll explan the dfferent nterest and nvestment equatons you learned n secton 5.1 through 5.4 of your textbook and go through several examples. The second part
More informationOn the Interaction between Load Balancing and Speed Scaling
On the Interacton between Load Balancng and Speed Scalng Ljun Chen and Na L Abstract Speed scalng has been wdely adopted n computer and communcaton systems, n partcular, to reduce energy consumpton. An
More information) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance
Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell
More informationA Probabilistic Theory of Coherence
A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want
More informationPerformance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application
Internatonal Journal of mart Grd and lean Energy Performance Analyss of Energy onsumpton of martphone Runnng Moble Hotspot Applcaton Yun on hung a chool of Electronc Engneerng, oongsl Unversty, 511 angdodong,
More information2008/8. An integrated model for warehouse and inventory planning. Géraldine Strack and Yves Pochet
2008/8 An ntegrated model for warehouse and nventory plannng Géraldne Strack and Yves Pochet CORE Voe du Roman Pays 34 B1348 LouvanlaNeuve, Belgum. Tel (32 10) 47 43 04 Fax (32 10) 47 43 01 Emal: corestatlbrary@uclouvan.be
More informationPSYCHOLOGICAL RESEARCH (PYC 304C) Lecture 12
14 The Chsquared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed
More informationAn efficient constraint handling methodology for multiobjective evolutionary algorithms
Rev. Fac. Ing. Unv. Antoqua N. 49. pp. 141150. Septembre, 009 An effcent constrant handlng methodology for multobjectve evolutonary algorthms Una metodología efcente para manejo de restrccones en algortmos
More informationDiVA Digitala Vetenskapliga Arkivet
DVA Dgtala Vetenskaplga Arkvet http://umudvaportalorg Ths s a book chapter publshed n Hghperformance scentfc computng: algorthms and applcatons (ed Berry, MW; Gallvan, KA; Gallopoulos, E; Grama, A; Phlppe,
More informationExtending Probabilistic Dynamic Epistemic Logic
Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σalgebra: a set
More informationLogistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification
Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson
More informationThe OC Curve of Attribute Acceptance Plans
The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4
More informationA Lyapunov Optimization Approach to Repeated Stochastic Games
PROC. ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING, OCT. 2013 1 A Lyapunov Optmzaton Approach to Repeated Stochastc Games Mchael J. Neely Unversty of Southern Calforna http://wwwbcf.usc.edu/
More informationdenote the location of a node, and suppose node X . This transmission causes a successful reception by node X for any other node
Fnal Report of EE359 Class Proect Throughput and Delay n Wreless Ad Hoc Networs Changhua He changhua@stanford.edu Abstract: Networ throughput and pacet delay are the two most mportant parameters to evaluate
More informationForecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network
700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School
More informationNPAR TESTS. OneSample ChiSquare Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6
PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has
More informationDescriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications
CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary
More information