Dynamic Resource Allocation for MapReduce with Partitioning Skew

Size: px
Start display at page:

Download "Dynamic Resource Allocation for MapReduce with Partitioning Skew"

Transcription

1 Ths artcle has been accepted for publcaton n a future ssue of ths journal, but has not been fully edted. Content may change pror to fnal publcaton. Ctaton nformaton: DOI 1.119/TC , IEEE Transactons on Computers IEEE TRANSACTIONS ON COMPUTERS, VOL. 13, NO. 9, SEPTEMBER Dynamc Resource Allocaton for MapReduce wth Parttonng Skew Zhhong Lu, Student Member, IEEE, Q Zhang, Student Member, IEEE, Reaz Ahmed, Member, IEEE, Raouf Boutaba, Fellow, IEEE, Yapng Lu, and Zhenghu Gong Abstract MapReduce has become a prevalent programmng model for buldng data processng applcatons n the cloud. Whle beng wdely used, exstng MapReduce schedulers stll suffer from an ssue known as parttonng skew, where the output of map tasks s unevenly dstrbuted among reduce tasks. Exstng solutons follow a smlar prncple that reparttons workload among reduce tasks. However, those approaches often ncur hgh performance overhead due to the partton sze predcton and reparttonng. In ths paper, we present DREAMS, a framework that provdes run-tme parttonng skew mtgaton. Instead of reparttonng workload among reduce tasks, we cope wth the parttonng skew problem by controllng the amount of resources allocated to each reduce task. Our approach completely elmnates the reparttonng overhead, yet s smple to mplement. Experments usng both real and synthetc workloads runnng on a 21-node Hadoop cluster demonstrate that DREAMS can effectvely mtgate the negatve mpact of parttonng skew, thereby mprovng the job completon tme by up to a factor of 2.29 over the natve Hadoop YARN. Compared to the state-of-the-art soluton, DREAMS can mprove the job completon tme by a factor of Index Terms MapReduce, Hadoop YARN, resource allocaton, parttonng skew 1 INTRODUCTION In recent years, the exponental growth of raw data has generated tremendous needs for large-scale data processng. In ths context, MapReduce [1], a parallel computng framework, ganed sgnfcant popularty. A MapReduce job conssts of two types of tasks, namely Map and Reduce. Each map task takes a chunk of nput data and runs a userspecfed map functon to generate ntermedate key-value pars. Subsequently, each reduce task collects the ntermedate key-value pars and apples a user-specfed reduce functon to produce the fnal output. Due to ts remarkable advantages n terms of smplcty, robustness and scalablty, MapReduce has been wdely used by companes such as Amazon, Facebook, and Yahoo! to process large volumes of data on a daly bass. Consequently, t has attracted consderable attenton from both ndustry and academa. Despte ts success, the current mplementatons of MapReduce suffer from a few lmtatons. In partcular, the wdely-used MapReduce system, Apache Hadoop MapReduce [2], uses a hash functon to partton the ntermedate key-value pars across reduce tasks. The goal of usng a hash functon s to evenly dstrbute the workload to each reduce task. In realty ths goal s rarely acheved [3], [4]. For example, Zachelas et al. [3] have demonstrated the exstence of skewness n a Youtube socal graph applcaton usng real-world data. The experments n [3] showed that the bggest workload among reduce tasks s larger than the Zhhong Lu s wth the College of Computer, Natonal Unversty of Defense Technology, Changsha, Chna and Davd R. Cherton School of Computer Scence, Unversty of Waterloo, Waterloo, ON, Canada. E-mal: zhlu@nudt.edu.cn Q. Zhang, R. Ahmed and R. Boutaba are wth Davd R. Cherton School of Computer Scence, Unversty of Waterloo. Yapng Lu and Zhenghu Gong are wth Natonal Unversty of Defense Technology Manuscrpt receved Aprl 19, 25; revsed September 17, 214. smallest by more than a factor of fve. The skewed workload dstrbuton among reduce tasks can have a severe mpact on job completon tme. Note that the completon tme of a MapReduce job s determned by the completon tme of the slowest reduce task. Data skewness causes certan tasks wth heavy workload run slower than others. Ths n turn prolongs the job completon tme. Several recent approaches are proposed to handle the parttonng skew problem [4], [5], [6], [7], [8], [9], [1]. They follow a smlar prncple that predcts the workload for ndvdual reduce tasks based on certan statstcs of key-value pars (e.g. key frequences [6], [8]), and then reparttons the workload to acheve a better balance among the reduce tasks. However, n order to collect the statstcs of keyvalue pars, most of those solutons ether have to prevent the reduce phase from overlappng wth the map phase, or add a samplng phase before executng the actual job. Skewtune [4] can reduce ths watng tme by redstrbutng the unprocessed workload of a slow reduce task at runtme. However, Skewtune ncurs an addtonal run-tme overhead of approxmately 3 seconds (as reported n [4]). Ths overhead can be qute expensve for small jobs wth average lfe span of around 1 seconds, whch are very common n today s producton clusters [11]. Motvated by the lmtatons of the exstng solutons, n ths paper, we take a radcally dfferent approach to address data skewness. Instead of reparttonng the workload among reduce tasks, our approach dynamcally allocates resources to reduce tasks accordng to ther workload. Snce no reparttonng s nvolved, our approach completely elmnates the reparttonng overhead. To ths end, we present DREAMS, a Dynamc REsource Allocaton technque for MapReduce wth parttonng Skew. DREAMS leverages hstorcal records to construct profles for each job type. Ths s reasonable because many producton jobs are executed (c) 215 IEEE. Personal use s permtted, but republcaton/redstrbuton requres IEEE permsson. See for more nformaton.

2 Ths artcle has been accepted for publcaton n a future ssue of ths journal, but has not been fully edted. Content may change pror to fnal publcaton. Ctaton nformaton: DOI 1.119/TC , IEEE Transactons on Computers IEEE TRANSACTIONS ON COMPUTERS, VOL. 13, NO. 9, SEPTEMBER repeatedly n today s producton clusters [12]. At run-tme, DREAMS can dynamcally detect data skewness and assgn more resources to reduce tasks wth large parttons to make them fnsh faster. Compared to the prevous works, our contrbutons can be summarzed as follows: We frst develop a partton sze predcton model that can forecast the partton szes of reduce tasks at run-tme. Specfcally, we can accurately predct the sze of each partton when only 5% of map tasks have completed. We establsh a task performance model that correlates the completon tme of ndvdual reduce tasks wth ther partton szes and resource allocaton. We propose a schedulng algorthm that dynamcally adjusts resource allocaton to each reduce task usng our task performance model and the estmaton of the partton sze. Ths can reduce the runnng tme dfference among reduce tasks that have dfferent szes of parttons to process, thereby acceleratng the job completon. Experments usng both real and synthetc workloads runnng on a 21-node Hadoop cluster demonstrate that DREAMS can effectvely mtgate the negatve mpact of parttonng skew, thereby mprovng the job completon tme by up to a factor of 2.29 over the natve Hadoop YARN. Compared to the state-of-the-art soluton lke SkewTune, DREAMS can mprove the job completon tme by a factor of Ths paper extends our prelmnary work [13] n a number of ways. Frst, the tme complexty of the on-lne partton sze predcton model has been presented. Second, we have added memory allocaton nto the reduce task performance model. Thrd, the schedulng algorthm n the orgnal manuscrpt has been reformulated as an optmzaton problem and ts optmal soluton s presented. Fnally, we have conducted addtonal experments to evaluate the effectveness of DREAMS. The rest of ths paper s organzed as follows. Secton 2 provdes the motvatons of our work. We descrbe the system archtecture of DREAMS n Secton 3. Secton 4 llustrates the desgn of DREAMS n detal. Secton 5 provdes the results from expermental evaluaton. Fnally, we summarze the exstng works related to DREAMS n Secton 7, and draw our concluson n Secton 8. 2 MOTIVATION In the state-of-the-art MapReduce systems, each map task processes one chunk of the nput data, and generates a sequence of ntermedate key-value pars. A hash functon s then used to partton these key-value pars and dstrbute them to reduce tasks. Snce all map tasks use the same hash functon, the key-value pars wth the same hash value are assgned to the same reduce task. Durng the reduce stage, each reduce task takes one partton (.e. the ntermedate key-value pars correspondng to the same hash value) as nput, and performs a user-specfed reduce functon on ts partton to generate the fnal output. Ths process s llustrated n Fgure 1. Ideally, the hash functon s expected to generate equal sze parttons f the key frequences, Fg. 1: MapReduce Programmng Model and szes of the key-value pars are unformly dstrbuted. However, n realty, the hash functon often fals to acheve unform parttonng, resultng nto skewed partton szes. For example n the InvertedIndex job [14], the hash functon parttons the ntermedate key-value pars based on the occurrence of words n the fles. Therefore, reduce tasks processng more popular words wll be assgned a larger number of key-value pars. As shown n Fgure 1, parttons are unevenly dstrbuted by the hash functon. P 1 s larger than P 2, whch causes workload mbalance between R 1 and R 2. Zachelas et al. [3] presented the followng reasons of parttonng skew: Skewed key frequences: Some keys occur more frequently n the ntermedate data. As a result, parttons that contan these keys become extremely large, thereby overloadng the reduce tasks that they are assgned to. Skewed tuple szes: In MapReduce jobs where szes of the values n the key-value pars vary sgnfcantly, even though key frequences are unform, uneven workload dstrbuton among reduce tasks may arse. In order to address the weaknesses and nadequaces experenced n the frst verson of Hadoop MapReduce (MRv1), the next generaton of the Hadoop compute platform, YARN [15], has been developed. Compared to MRv1, YARN manages the schedulng process usng two components: a) ResourceManager s responsble for allocatng resources to the runnng MapReduce jobs subject to capacty constrants, farness and so on; b) an Applcaton- Master, on the other hand, works for each runnng job, and has the responsblty of negotatng approprate resources from ResourceManager and assgnng the obtaned resources to ts tasks. Ths removes the sngle pont bottleneck of JobTracker n MRv1 and mproves the ablty to scale Hadoop clusters. In addton, YARN deprecates the slot-based resource management approach n MRv1, and adopts a more flexble resource unt called contaner. The contaner provdes resource-specfc, fne-gran accountng (e.g. < 2 GB RAM, 1 CP U >). A task runnng wthn a contaner s enforced to abde by the prescrbed lmts. Nevertheless, n both Hadoop MRv1 and YARN, the schedulers assume each reduce task has unform workload and resource consumpton, and therefore allocate dentcal resources to each reduce task. Specfcally, MRv1 adopts a (c) 215 IEEE. Personal use s permtted, but republcaton/redstrbuton requres IEEE permsson. See for more nformaton.

3 Ths artcle has been accepted for publcaton n a future ssue of ths journal, but has not been fully edted. Content may change pror to fnal publcaton. Ctaton nformaton: DOI 1.119/TC , IEEE Transactons on Computers IEEE TRANSACTIONS ON COMPUTERS, VOL. 13, NO. 9, SEPTEMBER Fg. 2: Archtecture of DREAMS slot-based allocaton scheme, where each machne s dvded nto dentcal slots that can be used to execute tasks. However, MRv1 does not provde resource solaton among co-located tasks, whch may cause performance degradaton at run-tme [16]. On the other hand, YARN uses a contanerbased allocaton scheme, where each task s scheduled n an solated contaner. But, t stll allocates contaners of dentcal sze to all reduce tasks that belong to the same job. Ths schedulng scheme can cause varaton n task runnng tme due to parttonng skew, snce the executon tme of a reduce task wth a large partton can be prolonged because of the fxed contaner sze. As the job completon tme s domnated by the slowest task, the run-tme varaton of reduce tasks wll prolong the job executon tme. Most of the exstng approaches tackle the parttonng skew problem by makng the workload assgnment unformly dstrbuted among reduce tasks, thereby mtgatng the neffcences n both performance and utlzaton. However, achevng ths goal requres (sometmes heavy) modfcaton to the current Hadoop mplementaton, and often requres addtonal overhead n terms of samplng and adaptve parttonng. Therefore, n ths work we seek an alternatve soluton, where we adjust the sze of the contaner based on parttonng skew. Ths approach not only requres mnmal modfcaton to the exstng Hadoop mplementaton, but at the same tme effectvely mtgates the negatve mpact of parttonng skew. 3 SYSTEM ARCHITECTURE Ths secton descrbes the desgn of our proposed resource allocaton framework called DREAMS. The archtecture of DREAMS s shown n Fgure 2. There are fve man components: Partton Sze Montor, runnng n the NodeManager; Partton Sze Predctor, Task Duraton Estmator and Resource Allocator, runnng n the ApplcatonMaster; and Fne-graned Contaner Scheduler, runnng n the ResourceManager. Each Partton Sze Montor records the statstcs of ntermedate data that a map task generates at run-tme and sends them to the ApplcatonMaster though heartbeat messages. The Partton Sze Predctor collects the partton sze reports from NodeManagers and predcts the partton szes of every reduce task for ths job. The Task Duraton Estmator constructs statstcal estmaton model of reduce task performance as a functon of ts partton sze and resource allocaton. That s, the duraton of a reduce task can be estmated f the partton sze and resource allocaton of ths task are gven. The Resource Allocator determnes the amount of resources to be allocated to each reduce task based on the performance estmaton. Lastly, the Fne-graned Contaner Scheduler s responsble for schedulng resources among all the ApplcatonMasters n the cluster, based on schedulng polces such as Far schedulng [17] and Domnant Resource Farness (DRF) [18]. Note that the schedulers n orgnal Hadoop assume that all reduce tasks (and smlarly, all map tasks ) have homogeneous resource requrements n terms of CPU and memory. However, ths s not approprate for MapReduce jobs wth parttonng skew. We have modfed the orgnal schedulers to support fne-graned contaner schedulng that allows each task to request resources of customzable sze. The workflow of resource allocaton mechansm used by DREAMS conssts of 4 steps as shown n Fgure 2. (1) After the ApplcatonMaster s launched, t schedules all the map tasks frst and then ramps up the reduce task requests gradually accordng to the slowstart settng, whch s used to control when to start reduce tasks based on the percentage of map tasks that have fnshed. Durng ther executon, each Partton Sze Montor records the sze of ntermedate key-value pars produced by map tasks. Each Partton Sze Montor sends locally gathered statstcs to the ApplcatonMaster through the TaskUmblcalProtocal, whch s a RPC protocol used to montor task status n Hadoop. (2) Upon recevng the partton sze reports from the Partton Sze Montors, the Partton Sze Predctor performs sze predcton usng our proposed predcton model (see Secton 4.1). After all the estmated szes of reduce tasks are known, the Task Duraton Estmator uses the reduce task performance model (Secton 4.2) to predct the duraton of each reduce task wth specfed amount of resources. Based on that, the Resource Allocator determnes the amount of resources for each reduce task accordng to our proposed resource allocaton algorthm (Secton 4.3) to equalze the executon tme of all reduce tasks and then sends resource requests to the ResourceManager. Note that the Resource- Manager reports to the ApplcatonMaster the current total amount of avalable resources through heartbeat messages every second. Thus, the Resource Allocator can check the avalablty of resources when requestng contaners. (3) Next, the ResourceManager receves ApplcatonMasters resource requests through the heartbeat messages, and schedules free contaners n the cluster to correspondng ApplcatonMasters. (4) Once the ApplcatonMaster obtans new contaners from the ResourceManager, t assgns the correspondng contaners to the pendng tasks, and fnally launches the tasks (c) 215 IEEE. Personal use s permtted, but republcaton/redstrbuton requres IEEE permsson. See for more nformaton.

4 Ths artcle has been accepted for publcaton n a future ssue of ths journal, but has not been fully edted. Content may change pror to fnal publcaton. Ctaton nformaton: DOI 1.119/TC , IEEE Transactons on Computers IEEE TRANSACTIONS ON COMPUTERS, VOL. 13, NO. 9, SEPTEMBER DREAMS DESIGN There are three man challenges to be addressed n DREAMS. Frst, n order to dentfy parttonng skew, t s necessary to develop a run-tme forecastng algorthm that predcts the partton sze of each reduce task. Second, n order to determne the rght contaner sze for each reduce task, t s necessary to develop a task performance model that correlates task runnng tme wth resource allocaton. Lastly, there are multple resource dmensons such as CPU, memory, dsk I/O and network bandwdth. Allocatons wth dfferent combnaton of these resource dmensons may yeld the same completon tme. Determnng the approprate combnaton of these resource dmensons n order to mnmze the cost s a challengng problem. In the rest of ths secton, we shall descrbe our solutons for each of these challenges. 4.1 Predctng Partton Sze In order to cull the parttonng skew, the workload dstrbuton among the reduce tasks should be known n advance. Unfortunately, the sze of the partton belongng to each reduce task really depends on the nput dataset, the map functon and the number of reduce tasks n a MapReduce job. Even though most of the MapReduce jobs are routnely executed, the same job processng dfferent nput dataset would produce dfferent workload dstrbuton among ts reduce tasks. Several recently proposed approaches calculate the workload dstrbuton among reduce tasks [3], [5], [6], [7], [19]. Exstng solutons, however, ether have to wat for all the map tasks to fnsh [3], [5], [6], or need an addtonal samplng procedure before executng a job [7], [19]. However, n order to mprove the job completon tme, exstng Hadoop schedulers allow reduce tasks to be launched before the completon of all map tasks (e.g. the default slowstart settng s 5%). It has also been demonstrated by the exstng works [8], [2] that startng the shuffle phase after the completon of all the map tasks wll severely prolong the job completon tme. Therefore, t s necessary to predct the partton sze at run-tme wthout ntroducng a barrer between map and reduce phases. The nput datasets of MapReduce jobs n a producton cluster tend to be very large. Hence, the HDFS storage system [21] splts a large dataset nto smaller data chunks, whch naturally creates a samplng space. Ths suggests that a small set of random samples n ths sample space may reveal the characterstcs of the whole dataset n terms of workload dstrbuton among reduce tasks. Therefore, we can analyze the pattern of the ntermedate data after a fracton of map tasks have completed, and then predct workload dstrbuton among reduce tasks for the entre dataset. In DREAMS, we perform k measurements (j = 1, 2,..., k) over tme durng the map phase, and collect the followng two metrcs ( F j, S j ) : F j s the percentage of map tasks that have been processed, where j ([1, k] and) k refers to the number of collected tuples F j, S j. Note that each map task processes one nputsplt, and each nputsplt has dentcal sze (64MB, 128MB etc.). As a result, F j s Sze of Generated Partons (MB) regresson value Fracton of Map Tasks (%) (a) a reduce task n InvertedIndex Sze of Generated Partons (MB) regresson value Fg. 3: Partton sze predcton Fracton of Map Tasks (%) (b) a reduce task n WordCount approxmately equal to the fracton of whole dataset that has been processed. S j s the sze of the ntermedate data generated by the completed map tasks for reduce task. In our mplementaton, we have modfed the reportng mechansm so that each map task reports ths nformaton to the ApplcatonMaster upon map task completons. Our expermental evdences reveal that S j s lnearly proportonal to F j. Fgure 3 shows the typcal results n InvertedIndex and WordCount jobs. Note that when 1% map tasks are completed, S j wll represent the actual partton sze for reduce task. Hence, we use lnear regresson to determne the followng equaton for each reduce task [1, N]: S j = α 1 + β 1 F j j = 1, 2, k (1) where α 1 and β 1 are the regresson coeffcents. We ntroduce an outer factor, δ, whch works as a threshold to control our predcton model to stop the tranng process, and fnalze the predcton. In practce, δ can be the map completon percentage (e.g. 5%) at whch schedulng of the reduce tasks may be started. Every tme a new map task has fnshed, a new tranng data s generated. When the fracton of map tasks reaches δ, we calculate the regresson coeffcents (α 1, β 1 ), and predct the partton sze for each reduce task. Note that k s determned by δ. For nstance, consder there are 1 map tasks n the job, f δ = 5%, then k = 5. The computatonal complexty of our on-lne partton sze predcton model s O(k N). In partcular, for each reduce task [1, N], the scalng factors can be determned by the followng equaton: where ( α1 ) = ( X β T X ) 1 X T Y, (2) 1 1 F 1 S 1 F 2 1 X =.., Y = S F k S k It takes O(2 2 k) to multply X T by X, O(2 3 ) to compute the nverse of X T X, O(2 2 k) to multply ( X T X ) 1 by X T (c) 215 IEEE. Personal use s permtted, but republcaton/redstrbuton requres IEEE permsson. See for more nformaton.

5 Ths artcle has been accepted for publcaton n a future ssue of ths journal, but has not been fully edted. Content may change pror to fnal publcaton. Ctaton nformaton: DOI 1.119/TC , IEEE Transactons on Computers IEEE TRANSACTIONS ON COMPUTERS, VOL. 13, NO. 9, SEPTEMBER Duraton (Seconds) Duraton (Seconds) Duraton (Seconds) ftted value Sze of Partton (MB) (a) Sort 5G ftted value Sze of Partton (MB) (c) Sort 1G (5G) ftted value(5g) (1G) ftted value(1g) Sze of Partton (MB) (e) Sort 5G and 1G Duraton (Seconds) Duraton (Seconds) Duraton (Seconds) ftted value Sze of Partton (MB) (b) InvertedIndex 5G ftted value Sze of Partton (MB) (d) InvertedIndex 1G (5G) ftted value(5g) (1G) ftted value(1g) Sze of Partton (MB) (f) InvertedIndex 5G and 1G Fg. 4: Relatonshp between task duraton and partton sze use lnear regresson to determne ths relatonshp wth Equaton 3 shown as follows: T = α 2 + β 2 P, [1, N] (3) where T and P are the runnng tme and partton sze of reduce task, respectvely. The regresson results are also shown n Fgures 4a and 4b as sold lnes. Note that f the tme complextes of the reduce functons n other MapReduce jobs grow nonlnearly wth the szes of processng data, the relatonshp can also be easly learned by updatng the regresson model. Furthermore, we change the nput sze of the jobs from 5GB to 1GB and check whether the characterstc of ths relatonshp s workload ndependent. Agan, the runnng tme s lnearly correlated wth partton sze, as shown n Fgure 4c and 4d. However, we also fnd that the sze of total ntermedate data, denoted as D (the sum of all parttons), has an mpact on task duraton. Smlar observaton s also made n [22], where Zhang et al. show the duraton of the shuffle phase can be approxmated wth a pece-wse lnear functon when the ntermedate data per reduce task s larger than 3.2 GB n ther Hadoop Cluster. Ths s consstent wth the phenomenon we observed. Therefore, we update the regresson functon to Equaton 4 and tran the model by the samples from both 5GB and 1GB datasets together. T = α 2 + β 2 P + ζ 2 D, [1, N] (4) and fnally O(2k) to multply ( X T X ) 1 X T by Y. Therefore, the total computatonal complexty of the predcton model for a MapReduce job wth N reduce tasks s O(k N). 4.2 Reduce Task Performance Model In ths secton, we desgn a reduce task performance model to estmate the executon tme of reduce tasks. Currently, there are many technques for predctng MapReduce job duratons [12], [22], [23], [24]. These approaches, however, cannot estmate the duratons of ndvdual tasks. In our performance model we consder the executon tme of a reduce task s correlated wth two parameters: sze of partton to process and resource allocaton (e.g. CPU, dsk I/O and bandwdth). As Hadoop YARN only allows users to specfy the CPU and memory szes of a contaner, n our mplementaton we focus on capturng the mpact of CPU and memory allocatons on task performance. In order to dentfy the relatonshp between task runnng tme, partton sze and resource allocaton, we run a set of experments n our testbed cluster by varyng resource allocaton and nput datasets. In the frst set of experments, we fx the CPU and memory allocatons of each reduce task and focus on dentfyng the relatonshp between partton sze and task runnng tme. Fgure 4a and 4b show the results of runnng the 5G Sort and InvertedIndex jobs, respectvely. It s evdent that there s a lnear relatonshp between partton sze and task runnng tme. Hence, we The regresson results are shown n Fgure 4e and 4f. It can be seen that ths updated functon serves as a good ft for the relatonshp between partton sze and task runnng tme, although there are two dfferent datasets nvolved. In the next set of experments, we fx the nput sze and vary ether the CPU or memory allocaton of each reduce task. Fgure 5 shows the typcal results for 3G Sort and InvertedIndex jobs by varyng CPU allocaton from 1 to 8 vcores (memory allocaton s fxed to 1 GB). We use a nonlnear regresson method to model ths relatonshp wth Equaton 5, and fnd that task runnng tme s nversely proportonal to CPU allocaton. Whle ths relatonshp fts well when the number of vcores s small, we also found ths model s no longer accurate when a large amount of CPU resource s allocated to a task. In these cases, the resource bottleneck may swtch from CPU to other resource dmensons lke dsk I/O, thus the beneft of ncreasng CPU allocaton dmnshes. Smlar observaton s also made n [24], where Jalapart et al. show ncreasng network bandwdth beyond a threshold does not help snce the job completon tme s domnated by dsk performance. Ths s consstent wth the phenomenon we observed. Thus, we can expect that the duraton of reduce tasks mght be approxmated wth a dfferent nversely proportonal functon when CPU allocaton exceeds a threshold µ. Ths threshold could be related to job characterstcs and cluster confguraton. However, for a dfferent job and Hadoop cluster, µ can be easly determned by comparng the change n task duraton (c) 215 IEEE. Personal use s permtted, but republcaton/redstrbuton requres IEEE permsson. See for more nformaton.

6 Ths artcle has been accepted for publcaton n a future ssue of ths journal, but has not been fully edted. Content may change pror to fnal publcaton. Ctaton nformaton: DOI 1.119/TC , IEEE Transactons on Computers IEEE TRANSACTIONS ON COMPUTERS, VOL. 13, NO. 9, SEPTEMBER Duraton (Seconds) ftted value ftted value (pece wse) CPU allocaton (vcores) (a) a reduce task n Sort Duraton (Seconds) ftted value ftted value (pece wse) CPU allocaton (vcores) (b) a reduce task n InvertedIndex Fg. 5: Relatonshp between task duraton and CPU allocaton wll postpone the completon tme of the task. For example, durng the shuffle sub-phase n reduce stage, memory defct wll cause addtonal process of spllng data to dsk, because of nadequate space to store all the data of a task n the RAM, thereby prolongng the task and addng a burden to dsk I/O as well. However, wth the allocaton contnually rsng, the mprovement becomes smaller. The reason s that, as memory allocaton ncreases beyond a threshold, the resource bottleneck of a task shfts to other resources. After that pont, the completon tme of a task wll not be reduced despte the ncrease n memory allocaton. Ths observaton s consstent wth the CPU resource. Duraton (Seconds) ftted value Contaner Heap Sze (MB) (a) a reduce task n Sort Duraton (Seconds) ftted value Contaner Heap Sze (MB) (b) a reduce task n InvertedIndex T = α 4 + β 4, [1, N] (6) Based on the above observatons, we now derve our reduce task performance model. For each reduce task among N reduce tasks, let T denote the executon tme of reduce task, P denote the sze of partton for reduce task, denote the CPU allocaton for reduce task, and denote the memory allocaton for reduce task, the performance model can be stated as follows: Fg. 6: Relatonshp between task duraton and mem. allocaton whle ncreasng CPU allocaton. 1 T = α 3 + β 3, [1, N] (5) We then repeat the same set of experments for memory. Dfferent from the CPU allocaton n YARN, whch s determned by the number of vrtual cores used by the task contaner, there are two confguratons that control the memory allocaton n YARN: physcal RAM lmt and JVM heap sze lmt for a task. The former settng s a logcal allocaton used by the Nodemanager to montor the task memory usage. If the usage exceeds ths lmt, the Nodemanager wll kll the task. The latter settng s maxmum heap sze of the JVM process that executes the task. It determnes the maxmum memory that can be used by ths JVM. Hence, JVM heap sze lmt should be less than physcal RAM lmt. More mportantly, JVM heap sze ndcates the amount of memory allocaton that a task can use. Consequently, we vary the JVM heap sze lmt from 2 MB (the default value) to 56 MB whle keepng the CPU allocaton to 1 vcore, and use a non-lnear regresson method to learn ths relatonshp wth Equaton 6. We fnd that an nversely proportonal functon s also applcable n ths case. Fgure 6 shows the task runnng tme as a functon of memory allocaton whle runnng 3G Sort and InvertedIndex jobs. From ths fgure we can see an obvous mprovement when the memory allocaton ncreases at the begnnng. That s because memory defct 1. We use the followng polcy n ths paper: we ncrease the CPU allocaton from 1 to 8 vcores, and calculate the speedup of task runnng tme between current and prevous CPU allocatons denoted as Speedup j (j [1, 7]). The frst CPU allocaton where Speedup j <.5 Speedup j 1 s consdered as the threshold µ. T = (α 5 +β 5 P +ζ 5 D) (ξ 5 + γ 5 = α 5 ξ 5 + α 5γ 5 + α 5η 5 +ζ 5 ξ 5 D+ ζ 5γ 5 D = λ 1 + λ 2 +λ 7 D+ λ 8D + λ 3 + η 5 +β 5 ξ 5 P + β 5γ 5 P + ζ 5η 5 D + λ 9D +λ 4 P + λ 5P ) + λ 6P + β 5η 5 P where λ 1, λ 2, λ 3, λ 4, λ 5, λ 6, λ 7, λ 8 and λ 9 are the coeffcents to be solved usng nonlnear regresson. In practce, we may leverage hstorcal records of job executon to provde nput to the regresson algorthm. Ths s reasonable n producton envronments as many jobs are executed routnely n today s producton data centers. Specfcally, the hstorcal profles are generated by varyng CPU allocaton = {1 vcore, 2 vcores,, 8 vcores}, memory allocaton = {1 GB, 2 GB,, 4 GB}, and nput dataset D set = {5 GB, 3 GB} for dfferent jobs. We then capture a tuple (T, P,, D) for each reduce task of the job. Usng the tuples for all reduce tasks as tranng data, we can easly learn the coeffcent factors n the performance model for each job. In the end, we produce one performance model M j (.e. job profle) for each job j that can be used as an nput for schedulng. Note that, f no job profle s avalable, DREAMS resorts to the default contaner allocaton scheme (.e. unform contaner sze for all the reduce tasks). Fnally, we would lke to menton that whle our performance model focuses on CPU and memory allocatons, we beleve our model can be extended to handle the case where other resources becomes the performance bottleneck by havng addtonal terms n our performance model. (7) (c) 215 IEEE. Personal use s permtted, but republcaton/redstrbuton requres IEEE permsson. See for more nformaton.

7 Ths artcle has been accepted for publcaton n a future ssue of ths journal, but has not been fully edted. Content may change pror to fnal publcaton. Ctaton nformaton: DOI 1.119/TC , IEEE Transactons on Computers IEEE TRANSACTIONS ON COMPUTERS, VOL. 13, NO. 9, SEPTEMBER Resource Allocaton Algorthm Once the performance model has been traned and the partton sze has been predcted, the scheduler s ready to fnd the optmal resource allocaton to each reduce task so as to mtgate ther run-tme varaton caused by parttonng skew. Here, our strategy s to equalze the runnng tme of all reduce tasks. As mentoned n Secton 4.2, task duraton s a monotoncally ncreasng functon of partton sze. Thus, we consder that the duraton of the task wth average partton sze (P avg ) as a baselne denoted as T base, whch can be obtaned accordng to Equaton 7 wth P avg and the default CPU and memory allocatons confgured n YARN 2. Then we ncrease the resources allocated to the reduce tasks wth larger partton szes to make them run no slower than T base. We observed that there s also no need to allocate too much resources to large reduce tasks to make them run faster than T base. Thus we wsh to fnd the mnmum CPU and memory allocatons for enablng slower reduce tasks to meet the baselne T base. It can be calculated usng a varaton of Equaton 7 ntroduced n Secton 4.2, where P, D and T base are known. T base = λ 1 + λ 2 +λ 7 D+ λ 8D + λ 3 + λ 9D +λ 4 P + λ 5P We can present Equaton 8 n followng form: C 1 + C 2 + λ 6P (8) + C 3 = (9) where C 1 =λ 1 +λ 4 P +λ 7 D T base, C 2 =λ 2 +λ 5 P +λ 8 Dand C 3 =λ 3 +λ 6 P +λ 9 D. Evdently, C 1, C 2 and C 3 are constants derved from known values. Snce there are two varables, ) needed to be solved usng only one equaton, more than one root can be obtaned. In other words, there can be many possble CPU and memory combnatons that wll yeld the same completon tme, T base. Hence, we formulate ths resource allocaton problem as a constraned optmzaton problem: ( mn x,y f(x, y ) = x + ωy s.t. C 1 + C 2 + C 3 = x y (1) Cap cpu >x >1, Cap mem >y >1, [1, N] where x =, y =, and Cap cpu and Cap mem are the capactes of workers n terms of CPU and memory, respectvely. We defne the optmzaton functon as the sum of CPU and memory resources, x +ωy, where a factor ω s ntroduced for representng the weght of memory over CPU. We can confgure a hgher weght to the bottleneck resource that has lower avalablty. For nstance, f CPU s lackng n the cluster but memory s not, CPU wll become more expensve comparng to memory. In ths case, ncreasng 2. Here, snce P can be predcted by the partton sze predcton model, P avg can be easly obtaned. And the default CPU and memory allocatons to a contaner n YARN are 1vCore and 1GB, respectvely. Algorthm 1 Resource allocaton algorthm Input: δ - Threshold of stoppng tranng the Partton Sze Predcton Model; M j - Reduce Phase Performance Model of Job j; µ cpu, µ mem- Maxmum allowable allocaton of CPU and memory. Output: C - Set of resource allocatons for each reduce task (, ) 1: (S, F ) handlep arttonreport(). 2: f CompletedMap percentage δ then 3: Set < P > P redctp artton() 4: D N 1 P 5: P avg Avg(Set < P >) 6: T base P redctduraton(p avg, D, default, Amem default, M j) 7: for each reduce task [1, N] do 8: (, ) F ndoptmalalloc(p, D, T base, M j). 9: = mn(, µ cpu) 1: = mn(, µ mem) 11: C = C {(, )} 12: end for 13: end f 14: return C the weght of CPU can mprove schedulng avalablty of tasks, thereby mprovng resource utlzatons. ω depends on the capacty and the run-tme resource avalablty of the cluster. How to tune ω s out of the scope of ths work. In partcular, we use ω = 1 n ths paper. Snce ths s a lnear optmzaton problem, we use Lagrange multplers to solve ths problem. Accordngly, we get the Lagrangan L(x, y, ϕ) as follows: L(x, y ) = x + ωy + ϕ(c 1 + C 2 x + C 3 y ) (11) Then, we dfferentate L(x, y, ϕ) partally wth respect to x, y and ϕ, and we get: L =1 ϕ C 2 x x 2 = L =ω ϕ C 3 y y 2 = (12) L ϕ =C 1+ C 2 + C 3 = x y Solvng these equatons smultaneously, we get: x= C 2 ± ωc 3 C 3, y = ωc 3 ± ωc 3 C 3 C1 ωc1 (13) The detal of our resource allocaton mechansm s shown n Algorthm 1. NodeManagers perodcally send partton sze reports to the ApplcatonMaster along wth heartbeat messages. As shown n Lne 1, the Applcaton- Master handles each partton sze report and collects the partton sze statstcs (S, F ). Once the percentage of completed map tasks reaches the threshold δ, we start to predct the partton sze and adjust the allocaton for each reduce task as shown n Lne In terms of partton sze predcton, we predct the partton sze of each reduce task usng the model presented n Secton 4.1. Wth respect to the resource allocaton, we compute the optmal combnaton of CPU and memory tuples (, ) usng (c) 215 IEEE. Personal use s permtted, but republcaton/redstrbuton requres IEEE permsson. See for more nformaton.

8 Ths artcle has been accepted for publcaton n a future ssue of ths journal, but has not been fully edted. Content may change pror to fnal publcaton. Ctaton nformaton: DOI 1.119/TC , IEEE Transactons on Computers IEEE TRANSACTIONS ON COMPUTERS, VOL. 13, NO. 9, SEPTEMBER Applcaton Domans Dataset Type TABLE 1: Benchmarks characterstcs Input Sze Small (GB) #Map, #Reduce Input Sze Large (GB) Skewness(%) Skewness(%) #Map, #Reduce WordCount text retreval Wkpeda , , 64 BgramCount text retreval Wkpeda , , 64 Pars text retreval Wkpeda , , 64 RelatveFreq text retreval Wkpeda , , 64 InvertedIndex web search Wkpeda , , 64 AdjLst web search GraphGenerator , , 64 KMeans machne learnng Netflx , , 6 Classfcaton machne learnng Netflx , , 6 DataJon database RandomTextWrter , , 64 SelfJon database Synthetc , , 64 Sort others RandomWrter , , 64 Hsto-moves others Netflx , , 8 Lagrange Multplers. More specfcally, we calculate the executon tme T base frst, whch represents the tme t takes to complete the task wth the average partton sze P avg and default resource allocaton ( default, Amem default )3, accordng to Equaton 8. After that, we set T base as a target for each reduce task, and calculate the resource tuples (, ) by solvng Equaton 13 and takng the floor of the postve root. Because nodes have fnte resource capactes n terms of CPU and memory (e.g., the default settngs for the maxmum CPU and memory allocaton to a contaner n YARN are 8 vcores and 8 GB, respectvely), both and should be less than the physcal capactes, Cap cpu and Cap mem, respectvely. Besdes, from our experence, after a resource allocaton to a task reaches a threshold, ncreasng allocaton wll not mprove the executon tme, rather t results n resource wastage as shown n Secton 4.2. We consder and should be less than the thresholds µ cpu and µ mem, respectvely, whch are consdered as nputs to our algorthm. 5 EVALUATION We have mplemented DREAMS on Hadoop YARN 2.4. as an addtonal feature. We deployed DREAMS on a real Hadoop cluster wth 21 vrtual machnes (VMs) n the SAVI Testbed [25]. The SAVI Testbed s a vrtual nfrastructure managed by OpenStack [26] usng Xen [27] vrtualzaton technque. Each VM has four 2 GHz cores, 8 GB RAM and 8 GB hard dsk. We use one VM as ResourceManager and NameNode, and the remanng 2 VMs as workers. Each worker s confgured wth 8 vrtual cores and 7GB RAM (leavng 1GB for background processes). The HDFS block sze s set to 64 MB, and the replcaton level s set to 3. The CgroupsLCEResourcesHandler confguraton s enabled, and we also actvate the confguraton of map output compresson. 4. We use CapactyScheduler to schedule contaners n YARN. In the guest OS, we confgure CGroups (Control Groups) and CFQ (Completely Far Queung) for schedulng CPU and dsk I/O among processes, respectvely. We evaluate our approach usng a wde range of applcatons that nclude text retreval, web search, machne 3. The default CPU and memory allocatons to a contaner are 1 vcore and 1 GB, respectvely. 4. Usng compresson n Hadoop to optmze MapReduce performance s prevalent n ndustry and academa. [28], [29] learnng, database domans, etc. These applcatons are lsted below: 1) Text Retreval WordCount (WC): WordCount computes the occurrence frequency of each word n a corpus. We use Wkpeda data as the nput dataset. BgramCount (BC): Bgrams are sequences of two consecutve words. BgramCount computes the occurrence frequency of bgrams n a corpus. We use the mplementaton n Cloud9 [3] and Wkpeda data as the nput dataset. Pars (PS): Pars s a desgn pattern ntroduced n [31]. Usng ths desgn pattern, PS computes the word co-occurrence matrx for a corpus. We use the mplementaton n Cloud9 and Wkpeda data as the nput dataset. RelatveFrequency (RF): Relatve Frequences s ntroduced n [31]. It measures the proporton of tme word w j appears n the context of word w. It s also denoted as F (w j w ). We use the mplementaton n Cloud9 and Wkpeda data as the nput dataset. 2) Web Search InvertedIndex (II): It takes a lst of documents as nput and generates a word-to-document ndex for these documents. We use Wkpeda data as the nput dataset. AdjacencyLst (AL): It generates the adjacency lst for a graph. The graph s represented by a set of edges, whch s generated by a Graph Generator. We use the mplementaton and the nput dataset provded by PUMA benchmarks [14]. 3) Machne Learnng KMeans (KM): Ths applcaton classfes moves based on ther ratngs usng the Netflx move ratng data. We use the startng values of the cluster centrods provded by PUMA and run one teraton. Classfcaton: It classfes the moves nto one of k pre-determned clusters. Smlar to KMeans, we use the startng values of the cluster centrods provded by PUMA, and use the Netflx move ratng data. 4) Database DataJon (DJ): It combnes text fles based on a desgnated key. The text dataset s generated by (c) 215 IEEE. Personal use s permtted, but republcaton/redstrbuton requres IEEE permsson. See for more nformaton.

9 Ths artcle has been accepted for publcaton n a future ssue of ths journal, but has not been fully edted. Content may change pror to fnal publcaton. Ctaton nformaton: DOI 1.119/TC , IEEE Transactons on Computers IEEE TRANSACTIONS ON COMPUTERS, VOL. 13, NO. 9, SEPTEMBER MAPE (Percentage) WC BC PS RF Threshold δ (a) Small dataset II AL KM CF DJ SJ SRT HM MAPE (Percentage) WC BC PS RF Threshold δ II AL KM CF (b) Large dataset Fg. 7: Predcton accuracy wth dfferent threshold δ DJ SJ SRT HM Job Completon Tme (s) Sort Kmeans Hs-moves Threshold δ Fg. 8: Job completon tme wth dfferent threshold δ RandomTextWrter, and the frst word of each lne n the fles serves as the jon key. We have modfed the orgnal RandomTextWrter and used Zpf.5 dstrbuton to skew the nput data. SelfJon (SJ): Ths applcaton s ntroduced n PUMA. It generates (k+1)-szed assocatons gven the set of k-szed assocatons. We use the mplementaton of ths applcaton as well as the synthetc dataset n PUMA. 5) Others Sort (SRT): Ths applcaton sorts sequence fles generated by Hadoop RandomWrter. Smlar to [2], we have modfed RandomWrter to produce nonunformly dstrbuted data. Hstogram-moves: Ths applcaton bns moves nto 8 bns based on the average ratngs of moves. We use the mplementaton of ths applcaton n PUMA. Table 1 gves an overvew of these benchmarks wth ther confguratons used n our experments. The skewness of the workload among reduce tasks s measured by the coeffcent stdev mean of varaton (CV),, whch s used as a farness metrc n lterature [32]. The larger the rato, the more skewness s expected n the dstrbuton of workload among reduce tasks. In order to better demonstrate the skew mtgaton, we do not use the combner functon n our benchmarks. We wll present the results of runnng these jobs n the followng sectons. 5.1 Accuracy of Predcton of Partton Sze In ths set of experments, we want to valdate the accuracy of the partton sze predcton model. To ths end, we execute MapReduce jobs on dfferent datasets, and compute the mean absolute percentage error (MAPE) of all parttons n each scenaro. The MAPE s defned as follows. P pred P measrd MAP E = 1 N N =1 P measrd (14) where N s the number of reduce tasks n a job, P pred and P measrd are the predcted and of partton sze of reduce task, respectvely. Table 2 summarzes the TABLE 2: Mean absolute percentage error of partton sze predcton model on Small and Large datasets Applcaton MAPE on Small dataset MAPE on Large dataset WordCount 5.34% 3.94% BgramCount 8.67% 7.25% Pars 6.16 % 4.31 % RelatveFrequency 6.73% 5.75% InvertedIndex 3.69% 3.4% AdjLst 11.36% 1.1% KMeans 8.56% 4.13% Classfcaton 5.29% 3.17% DataJon 5.6 % 2.8% SelfJon 1.23%.63% Sort 6.32% 5.34% Hstogram-moves.47%.35% MAPE for the benchmarks wth threshold δ =.5 on two dfferent datasets. It can be seen that the error rates for most of the MapReduce applcatons are less than 5%. In partcular, Adjlst reaches the hghest error rate at 11.36%. Furthermore, Fgure 7 llustrates the mpact of dfferent values of δ on predcton accuracy. It s clear that as δ ncreases, the predcton accuracy mproves. That s because the number of tranng samples wll augment along wth the ncrease of δ. When δ =.15, the predcton error acheves less than 6% for all testng applcatons. Generally speakng, ncreasng sample sze can mprove accuracy at the cost of ncreased overhead. In DREAMS, the larger the sample sze used, the longer DREAMS has to wat for the completon of the map tasks for predctng the partton sze 5. However, we observed that as δ ncreases, the overhead n terms of job completon tme does not necessarly become larger. Fgure 8 shows the job completon tmes whle usng dfferent values of δ. As shown n Fgure 8, for reduce-ntensve jobs such as Sort and Kmeans, there wll be a sweet spot where the job completon tme s lowest; for map-ntensve jobs such as Hstogrammoves, no much dfference can be observed. The reason s that overlappng map and reduce phases can let the reduce task start to shuffle data earler, but t wll also waste resources whle the map tasks output rate s smaller than 5. The computatonal overhead s neglgble, because the maxmum number of samples s hundreds n our experments (c) 215 IEEE. Personal use s permtted, but republcaton/redstrbuton requres IEEE permsson. See for more nformaton.

10 Ths artcle has been accepted for publcaton n a future ssue of ths journal, but has not been fully edted. Content may change pror to fnal publcaton. Ctaton nformaton: DOI 1.119/TC , IEEE Transactons on Computers IEEE TRANSACTIONS ON COMPUTERS, VOL. 13, NO. 9, SEPTEMBER Duraton (Seconds) Duraton (Seconds) ftted value Sequental ID (a) WordCount ftted value Sequental ID (c) InvertedIndex Duraton (Seconds) Duraton (Seconds) ftted value Sequental ID (b) Pars ftted value Sequental ID (d) Sort Fg. 9: Fttng results for the reduce phase performance model the bandwdth. Tang et al. [33] proposed a soluton to fnd the best tmng to start the reduce phase. Ths s out of the scope of ths paper. In ths paper, we use δ =.5 n the followng experments. 5.2 Accuracy of Reduce Task Performance Model In order to formally evaluate the accuracy and workload ndependency of the performance model, we compute the predcton error usng dfferent datasets. That s, we tran and test our model based on the samples from both the Small and Large datasets. Fgure 9 shows the typcal results n terms of the goodness of ft for the performance model. Smlar results can be observed n other applcatons. To make the demonstraton more clear, we sort the experment results by the values n ascendng order. The marks + represent the measured task duratons and the sold lne represents the ftted values usng the performance model. We also perform two valdatons [34] to study the predcton accuracy of the model: Resubsttuton Method - All the avalable data s used for tranng as well as testng. That s, we compute the predcted reduce task duraton for each tuple (P, Alloc cpu, Alloc mem, D) by usng the performance model whch s learned from the tranng dataset, then compute a predcton error; K-fold Cross-valdaton - The avalable data s dvded nto K dsjont subsets, 1 K m. m s the total sze of the avalable samples. And the predcton accuracy s evaluated by the average of the separate errors 1 K K =1 Error. For each of the K sub-valdatons, (K 1) subsets are used for tranng and the remanng one for testng. Here, we choose K = 1. TABLE 3: Mean absolute percentage error of the reduce phase performance model Applcaton Resubsttuton Method K-fold Cross-valdaton WordCount 13.13% 13.45% BgramCount 1.26% 1.98% Pars % % RelatveFrequency 13.3% 14.91% InvertedIndex 12.97% 13.7% AdjLst 15.45% 18.2% KMeans 12.52% 15.13% Classfcaton 4.61% 7.58% DataJon 7.84 % 14.9 % SelfJon 9.8 % % Sort 1.95% 11.46% Hstogram-moves 11.14% 14.46% For both valdatons, we leverage the MAPE to evaluate the accuracy usng followng equaton: MAP E = m 1 m T pred l Tl measrd l=1 (15) T measrd l where m s the number of testng samples. Table 2 summarzes the MAPE of reduce task performance model for our testng workloads. Wth regard to the Resubsttuton Method valdaton, the predcton error for all of the workloads s less than 15.45%. In terms of the K-fold Cross-valdaton, the predcton error s slghtly hgher than the error n the Resubsttuton valdaton. However, the error rate s stll less than 18.2%. For some applcatons such as Adjlst, the predcton error s relatvely hgher. But overall, the predcton error s less than 15% for most of the applcatons. Lastly, tunng the parameters of the performance model by contnuously tranng the new comng data may mprove the accuracy, whch s consdered as our future work. 5.3 Job Completon Tme In ths secton, we want to valdate how well DREAMS can mtgate skew. We compare DREAMS aganst 1) Hadoop YARN 2.4.; 2) Speculaton-based straggler mtgaton approach (LATE), whch launches speculatve tasks for the slower tasks; 3) repartton-based skew mtgaton approach (SkewTune), whch reparttons the unprocessed workload of the slower tasks at run-tme and 4) Hadoop.21. wth slot solaton (MRv1 ISO). To the best of our knowledge, n addton to SkewTune, many other state-of-the-art solutons such as LEEN [6], TopCluster [9], are mplemented on top of MRv1, whch s slot-based and there s no solaton between slots. In order to farly compare DREAMS aganst SkewTune, we have mplemented solaton between slots n Hadoop.21. and nstalled SkewTune on top of MRv1 ISO. We confgure each worker wth 6 map slots and 2 reduce slots whle runnng SkewTune and MRv1 ISO. Note that tunng the number of reduce tasks of a MapReduce job can mprove the job completon tme [35]. To solate ths effect, we use the same number of reduce tasks n the correspondng experments when comparng the job completon tme. Fgure 1 shows the comparson among YARN, LATE, SkewTune, MRv1 and DREAMS n regards to job completon tme. We can see from the fgure that DREAMS (c) 215 IEEE. Personal use s permtted, but republcaton/redstrbuton requres IEEE permsson. See for more nformaton.

11 Ths artcle has been accepted for publcaton n a future ssue of ths journal, but has not been fully edted. Content may change pror to fnal publcaton. Ctaton nformaton: DOI 1.119/TC , IEEE Transactons on Computers IEEE TRANSACTIONS ON COMPUTERS, VOL. 13, NO. 9, SEPTEMBER Job Completon Tme (s) YARN LATE SkewTune MRv1_ISO DREAMS Job Completon Tme (s) YARN LATE SkewTune MRv1_ISO DREAMS WC BC PS RF II AL KM CF DJ SJ SRT HM WC BC PS RF II AL KM CF DJ SJ SRT HM (a) Small dataset (b) Large dataset Fg. 1: The comparson of job completon tme Task slots map shuffle reduce Tme (Seconds) (a) YARN Task slots map shuffle reduce Tme (Seconds) (b) LATE Task slots map shuffle reduce Tme (Seconds) (c) SkewTune Task slots map shuffle reduce Tme (Seconds) (d) DREAMS Fg. 11: Executon Tmelne for 5G Pars.7.6 YARN LATE SkewTune MRv1_ISO DREAMS.9.8 YARN LATE SkewTune MRv1_ISO DREAMS Coeffcent of Varaton Coeffcent of Varaton WC BC PS RF II AL KM CF DJ SJ SRT HM WC BC PS RF II AL KM CF DJ SJ SRT HM (a) Small dataset (b) Large dataset Fg. 12: The comparson of makespan varance of reduce tasks outperforms other skew mtgaton strateges. In partcular, DREAMS acheves 2.29, 1.93, 1.42, 1.34, 1.31, 1.29 and 1.26 speedups over YARN for Pars, RelatveFreq, Sort, DataJon, WordCount, InvtIndex and Kmeans, respectvely. Compared to other mtgaton strateges, DREAMS can acheve the hghest mprovements of 1.85 and 1.65 over LATE and SkewTune, respectvely. We also observed that DREAMS cannot mprove the job completon tme for SelfJon and Adjlst. Ths s because the skewness n these jobs s low, leavng lttle room for DREAMS and other mtgaton strateges to mprove. Snce DREAMS only adjusts resource allocaton for reduce tasks, for jobs such as Classfcaton and HsMoves n whch the reduce phase only lasts for a few seconds, no mprovement n terms of the job completon tme can be observed. In order to understand the reason behnd the mprovement of DREAMS, n Fgure 11 we demonstrate the executon tmelne whle runnng 5G Pars wth YARN, LATE, SkewTune, and DREAMS, respectvely. As shown n Fgure 11a, several large reduce tasks take much longer than other reducers, whch domnate the completon tme of the job. In comparson, LATE executes replca tasks for these large reduce tasks usng free resources, whch can accelerate those large reduce tasks. However, snce replca tasks process the same amount of work as orgnal tasks, the mprovement s not sgnfcant. SkewTune splts the unprocessed work (c) 215 IEEE. Personal use s permtted, but republcaton/redstrbuton requres IEEE permsson. See for more nformaton.

12 Ths artcle has been accepted for publcaton n a future ssue of ths journal, but has not been fully edted. Content may change pror to fnal publcaton. Ctaton nformaton: DOI 1.119/TC , IEEE Transactons on Computers IEEE TRANSACTIONS ON COMPUTERS, VOL. 13, NO. 9, SEPTEMBER of stragglers at runtme, and launches new jobs (called mgraton jobs) to process these tasks. As we can see n Fgure 11c, three addtonal jobs are launched to process those long lastng reduce tasks. Hence, overloaded tasks are processed usng more cluster resources, whch results n reducng ther executon tmes. Note that the executon tmes of the stragglers are domnated by the completon tmes of the correspondng mgraton jobs. However, the overhead of reparttonng the runnng tasks s not small. As reported n [4], approxmately 3s overhead s ncurred for reduce skew mtgaton, and SkewTune does not perform skew mtgaton for the tasks wth remanng tme less than 2 w, where w s on the order of 3s. As a results, for small jobs that complete n 1s or wth small skewness, SkewTune cannot mprove the job completon tme. In contrast, DREAMS predcts the partton sze of each reduce task at runtme and proactvely allocates more resources to overloaded reducers. Ths reduces the duratons of overloaded tasks, thereby acceleratng the job completon wth neglgble overhead. As shown n Fgure 11d, the runnng tmes of those large reducers are sgnfcantly mproved. We also compare the makespan varance of reduce tasks n DREAMS aganst other solutons. As we started earler, DREAMS s desgned to reduce the run-tme dfference among reduce tasks wth dfferent loads, thereby shortenng the job completon tme. Fgure 12 shows the comparson results wth respect to the coeffcent of varaton (CV) of reduce tasks duratons for our benchmarks. The graphs reflect that DREAMS can effectvely reduce the makespan varance of reduce tasks. More specally, the hghest reducton rato can acheve 2.47, 1.84 and 2.23 over YARN, LATE and SkewTune, respectvely. Snce the shuffle phase n reduce stage s overlappng the entre map stage, there s no need to count the makespan when the shuffle phase s watng for the output of map tasks. Here, we compare the duratons of reduce tasks startng from the completon of the last map task. ARIA [12] consders only the non-overlappng portons of shuffle nto account. Chowdhury et al. [36] also defne the begnnng of the shuffle phase as when ether the last map task fnshes or the last reduce task starts. 6 DISCUSSION The concept of dynamc contaner sze adjustment used n DREAMS s not restrcted to MapReduce. It can be appled to other large-scale programmng models such as Spark [37] and Storm [38] as well. Take Spark as an example, a Spark job conssts of a number of tasks as a form of a DAG (Drect Acyclc Graph). These tasks are scheduled to a number of the Spark Executors and executed n a dstrbuted manner. Each Spark executor runs n a contaner on top of the resource management platform (e.g. YARN and Mesos [39]). If some executors have more workload to process, dynamcally adjustng the contaner sze based on the resource requrements and workload characterstcs may brng beneft as well. Nevertheless, there are lmtatons n DREAMS s current desgn. Frst, DREAMS can only adjust CPU and memory for a contaner n our current mplementaton, and t reles on the TCP farness and Completely Far Queung to farly share the network bandwdth and dsk I/O, respectvely. Hence, DREAMS may not be able to gve a precse estmaton of the task executon tme n a hghly dynamc envronment. However, our performance model works well n DREAMS. It roughly estmates the executon tme of the reduce task based on the hstorcal data and n turn helps DREAMS to determne how much resources should be allocated to the task. Through allocatng more resources to the reduce tasks wth more workload, the executons of these tasks can be accelerated, and therefore the job completon tme can be mproved. We would lke to extend DREAMS to take account of network bandwdth and dsk I/O n future work. One nterestng dea s to ntegrate the management of contaners network and dsk I/O resources to YARN usng CGroups. Note that CGroups can support solatng network and dsk I/O between processes currently. Ths deserves further research. Second, there may be some applcatons where DREAMS s not applcable, for example, the applcatons that contan computatonal skew [7] n ther reduce functons. The computatonal skew refers to the case where the task runnng tme depends on the content of the nput rather than ts sze. For ths knd of applcatons, DREAMS resorts to YARN n current desgn. One straghtforward extenson s to montor the resource usage and progresses of tasks at run-tme, and then adjust ther allocaton dynamcally. In ths way, skewed tasks could be accelerated n a more generc manner. 7 RELATED WORK The parttonng skew problem n MapReduce has been extensvely nvestgated recently. The authors n [5] and [6] defne a cost model for assgnng reduce keys to reduce tasks so as to balance the load among reduce tasks. However, both approaches have to wat for the completon of all the map tasks. Ramakrshnan et al. [7] and Yan et al. [19] propose to sample partton sze before executng actual jobs to estmate the ntermedate data dstrbuton, and then partton the data to balance the load across all reducers. However, the addtonal samplng phase can be tme-consumng. Smlarly, Kolb et al. [4] propose two approaches, BlockSplt and ParRange, to handle data skew for enttes resoluton based on MapReduce. However, both of these two approaches have to run an addtonal MapReduce job to generate the block dstrbuton matrx (BDM). Gufler et al. [9] and Chen et al. [1] propose to aggregate selected statstcs of the key-value pars (e.g. top k keys). Ther solutons can reduce the overhead whle estmatng the reducer s workload, but these solutons stll have to wat for the completon of all the map tasks. SkewTune [4] reparttons heavly skewed parttons at runtme to mtgate skew. However, t mposes an overhead whle reparttonng data and concatenatng fnal outputs. Compared to SkewTune, our soluton dynamcally allocates resources to reduce tasks and equalzes the reduce tasks completon tme, whch s smpler and ncurs no overhead. There are also related works on cullng stragglers n MapReduce. LATE [41] speculatvely executes a replca task for the tasks at a slow progress rate. However, executng a redundant copy for a data-skew task, may result n wastng resource, snce the duplcate tasks wth data skew stll have the same amount of data. Mantr [42] culls stragglers based (c) 215 IEEE. Personal use s permtted, but republcaton/redstrbuton requres IEEE permsson. See for more nformaton.

13 Ths artcle has been accepted for publcaton n a future ssue of ths journal, but has not been fully edted. Content may change pror to fnal publcaton. Ctaton nformaton: DOI 1.119/TC , IEEE Transactons on Computers IEEE TRANSACTIONS ON COMPUTERS, VOL. 13, NO. 9, SEPTEMBER on ther causes. Wth respect to data skew, Mantr schedules tasks n descendng order of ther nput szes to mtgate skew, whch s complementary to DREAMS. Wrangler [43] predcts the status of worker nodes based on ther runtme resource usage statstcs, then selectvely delays the executon of tasks f a node s predcted to create a straggler. However, Wrangler neglects that the stragglng stuaton can also be ncurred by the task tself; parttonng skew s one such example. Resource-aware schedulng has receved consderable attenton n recent years. To address the lmtaton of slotbased resource allocaton scheme n the frst verson of Hadoop, YARN [15] represents a major endeavor towards resource-aware schedulng n MapReduce. It offers the ablty to specfy the sze of contaner. However, YARN assumes the resource consumpton for each map (or reduce) task n a job s dentcal, whch s not true for data skewed MapReduce jobs. Sharma et al. propose MROrchestrator [16], a MapReduce resource framework that can dentfy resource bottlenecks and resolve them by run-tme resource allocaton. However, MROrchestrator neglects the workload mbalance among tasks and cannot mtgate the parttonng skew. There are several other proposals that fall n other categores of resource schedulng polces such as [12], [18], [44], [45]. The man focus of those approaches s on adjustng the resource allocaton n terms of the number of map and reduce slots for the jobs n order to acheve farness, maxmze resource utlzaton or meet job deadlne. These however do not address the data skew problem. 8 CONCLUSION In ths paper, we presented DREAMS, a framework for run-tme parttonng skew mtgaton. Unlke prevous approaches that try to balance the reducers workload by reparttonng the workload assgned to each reduce task, n DREAMS we cope wth parttonng skew by adjustng runtme resource allocaton to reduce tasks. Specfcally, we frst developed an on-lne partton sze predcton model whch can estmate the partton sze of each reduce task at runtme. We then presented a reduce task performance model that correlates run-tme resource allocaton and the sze of the reduce task wth task duraton. In our experments usng a 21-node cluster runnng both real and synthetc workloads, we showed that both our partton sze predcton model and task performance model acheve hgh accuracy n most cases (wth hghest predcton error at 11.36% and 18.2%, respectvely). We also demonstrated that DREAMS can effectvely mtgate the negatve mpact of parttonng skew whle ncurrng neglgble overhead, thereby mprovng the job runnng tme by up to a factor of 2.29 and 1.65 n comparson to the natve Hadoop YARN and the state-of-the-art soluton, respectvely. ACKNOWLEDGMENTS Ths work s supported n part by the Natonal Natural Scence Foundaton of Chna (No ), and n part by the Smart Applcatons on Vrtual Infrastructure (SAVI) project funded under the Natonal Scences and Engneerng Research Councl of Canada (NSERC) Strategc Networks grant number NETGP REFERENCES [1] J. Dean and S. Ghemawat, Mapreduce: smplfed data processng on large clusters, Communcatons of the ACM, vol. 51, no. 1, pp , 28. [2] Apache hadoop yarn, current/hadoop-yarn/hadoop-yarn-ste/yarn.html. [3] N. Zachelas and V. Kalogerak, Real-tme schedulng of skewed mapreduce jobs n heterogeneous envronments, n Proceedngs of 11th Internatonal Conference on Autonomc Computng. USENIX, 214, pp [4] Y. Kwon, M. Balaznska, B. Howe, and J. Rola, Skewtune: mtgatng skew n mapreduce applcatons, n Proceedngs of the 212 ACM SIGMOD Internatonal Conference on Management of Data. ACM, 212, pp [5] B. Gufler, N. Augsten, A. Reser, and A. Kemper, Handng data skew n mapreduce, n Proceedngs of the 1st Internatonal Conference on Cloud Computng and Servces Scence, vol. 146, 211, pp [6] S. Ibrahm, H. Jn, L. Lu, B. He, G. Antonu, and S. Wu, Handlng parttonng skew n mapreduce usng leen, Peer-to-Peer Networkng and Applcatons, vol. 6, no. 4, pp , 213. [7] S. R. Ramakrshnan, G. Swart, and A. Urmanov, Balancng reducer skew n mapreduce workloads usng progressve samplng, n Proceedngs of the Thrd ACM Symposum on Cloud Computng. ACM, 212, p. 16. [8] Y. Le, J. Lu, F. Ergun, and D. Wang, Onlne load balancng for mapreduce wth skewed data nput, n INFOCOM, 214 Proceedngs IEEE. IEEE, 214, pp [9] B. Gufler, N. Augsten, A. Reser, and A. Kemper, Load balancng n mapreduce based on scalable cardnalty estmates, n ICDE 212, Aprl 212, pp [1] Q. Chen, J. Yao, and Z. Xao, Lbra: Lghtweght data skew mtgaton n mapreduce, Parallel and Dstrbuted Systems, IEEE Transactons on, vol. 26, no. 9, pp , 215. [11] L. Cheng, Q. Zhang, and R. Boutaba, Mtgatng the negatve mpact of preempton on heterogeneous mapreduce workloads, n Proceedngs of the 7th Internatonal Conference on Network and Servces Management. Internatonal Federaton for Informaton Processng, 211, pp [12] A. Verma, L. Cherkasova, and R. H. Campbell, Ara: automatc resource nference and allocaton for mapreduce envronments, n Proceedngs of the 8th ACM nternatonal conference on Autonomc computng. ACM, 211, pp [13] Z. Lu, Q. Zhang, M. F. Zhan, R. Boutaba, Y. Lu, and Z. Gong, Dreams: Dynamc resource allocaton for mapreduce wth data skew, n IM TechSessons, Ottawa, Canada, may 215. [14] F. Ahmad, S. Lee, M. Thottethod, and T. Vjaykumar, Puma: Purdue mapreduce benchmarks sute, 212. [15] V. K. Vavlapall, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth et al., Apache hadoop yarn: Yet another resource negotator, n Proceedngs of the 4th annual Symposum on Cloud Computng. ACM, 213, p. 5. [16] B. Sharma, R. Prabhakar, S. Lm, M. T. Kandemr, and C. R. Das, Mrorchestrator: A fne-graned resource orchestraton framework for mapreduce clusters, n Cloud Computng (CLOUD), 212 IEEE 5th Internatonal Conference on. IEEE, 212, pp [17] Far scheduler, hadoop-yarn/hadoop-yarn-ste/farscheduler.html. [18] A. Ghods, M. Zahara, B. Hndman, A. Konwnsk, S. Shenker, and I. Stoca, Domnant resource farness: Far allocaton of multple resource types. n NSDI, vol. 11, 211, pp [19] W. Yan, Y. Xue, and B. Maln, Scalable and robust key group sze estmaton for reducer load balancng n mapreduce, n Bg Data, 213 IEEE Internatonal Conference on. IEEE, 213, pp [2] M. Hammoud, M. S. Rehman, and M. F. Sakr, Center-of-gravty reduce task schedulng to lower mapreduce network traffc, n Cloud Computng (CLOUD), 212 IEEE 5th Internatonal Conference on. IEEE, 212, pp [21] D. Borthakur, The hadoop dstrbuted fle system: Archtecture and desgn, Hadoop Project Webste, vol. 11, p. 21, 27. [22] Z. Zhang, L. Cherkasova, and B. T. Loo, Benchmarkng approach for desgnng a mapreduce performance model, n Proceedngs of the 4th ACM/SPEC Internatonal Conference on Performance Engneerng. ACM, 213, pp [23] H. Herodotou, H. Lm, G. Luo, N. Borsov, L. Dong, F. B. Cetn, and S. Babu, Starfsh: A self-tunng system for bg data analytcs. n CIDR, vol. 11, 211, pp (c) 215 IEEE. Personal use s permtted, but republcaton/redstrbuton requres IEEE permsson. See for more nformaton.

14 Ths artcle has been accepted for publcaton n a future ssue of ths journal, but has not been fully edted. Content may change pror to fnal publcaton. Ctaton nformaton: DOI 1.119/TC , IEEE Transactons on Computers IEEE TRANSACTIONS ON COMPUTERS, VOL. 13, NO. 9, SEPTEMBER [24] V. Jalapart, H. Ballan, P. Costa, T. Karaganns, and A. Rowstron, Brdgng the tenant-provder gap n cloud servces, n Proceedngs of the Thrd ACM Symposum on Cloud Computng. ACM, 212, p. 1. [25] Smart applcatons on vrtual nfrastructure (sav), savnetwork.ca/. [26] Open stack cloud operatng system, org/. [27] Xen project, [28] Mcrosoft whte papers: Compresson n hadoop, technet.mcrosoft.com/en-us/lbrary/dn aspx. [29] Y. Chen, A. Ganapath, and R. H. Katz, To compress or not to compress-compute vs. o tradeoffs for mapreduce energy effcency, n Proceedngs of the frst ACM SIGCOMM workshop on Green networkng. ACM, 21, pp [3] J. Ln, Cloud 9: A mapreduce lbrary for hadoop, 21. [31] J. Ln and C. Dyer, Data-ntensve text processng wth mapreduce, Synthess Lectures on Human Language Technologes, vol. 3, no. 1, pp , 21. [32] R. Jan, D.-M. Chu, and W. R. Hawe, A quanttatve measure of farness and dscrmnaton for resource allocaton n shared computer system, [33] Z. Tang, L. Jang, J. Zhou, K. L, and K. L, A self-adaptve schedulng algorthm for reduce start tme, Future Generaton Computer Systems, vol. 43, pp. 51 6, 215. [34] A. K. Jan, R. P. W. Dun, and J. Mao, Statstcal pattern recognton: A revew, Pattern Analyss and Machne Intellgence, IEEE Transactons on, vol. 22, no. 1, pp. 4 37, 2. [35] Z. Zhang, L. Cherkasova, and B. T. Loo, Autotune: Optmzng executon concurrency and resource usage n mapreduce workflows. n ICAC, 213, pp [36] M. Chowdhury, M. Zahara, J. Ma, M. I. Jordan, and I. Stoca, Managng data transfers n computer clusters wth orchestra, n ACM SIGCOMM Computer Communcaton Revew, vol. 41, no. 4. ACM, 211, pp [37] Apache spark, [38] Apache storm, [39] Apache mesos, [4] L. Kolb, A. Thor, and E. Rahm, Load Balancng for MapReducebased Entty Resoluton, n Internatonal Conference on Data Engneerng, 212, pp [41] M. Zahara, A. Konwnsk, A. D. Joseph, R. H. Katz, and I. Stoca, Improvng mapreduce performance n heterogeneous envronments. n OSDI, vol. 8, no. 4, 28, p. 7. [42] G. Ananthanarayanan, S. Kandula, A. G. Greenberg, I. Stoca, Y. Lu, B. Saha, and E. Harrs, Renng n the outlers n mapreduce clusters usng mantr. n OSDI, vol. 1, no. 1, 21, p. 24. [43] N. J. Yadwadkar, G. Ananthanarayanan, and R. Katz, Wrangler: Predctable and faster jobs usng fewer resources, n Proceedngs of the ACM Symposum on Cloud Computng. ACM, 214, pp [44] J. Polo, D. Carrera, Y. Becerra, J. Torres, E. Ayguadé, M. Stender, and I. Whalley, Performance-drven task co-schedulng for mapreduce envronments, n Network Operatons and Management Symposum (NOMS), 21 IEEE. IEEE, 21, pp [45] J. Wolf, D. Rajan, K. Hldrum, R. Khandekar, V. Kumar, S. Parekh, K.-L. Wu, and A. Balmn, Flex: A slot allocaton schedulng optmzer for mapreduce workloads, n Mddleware 21. Sprnger, 21, pp Q Zhang receved hs B.A.Sc., M.Sc. and Ph.D. from Unversty of Ottawa (Canada), Queen s Unversty (Canada) and Unversty of Waterloo (Canada), respectvely. Hs current research focuses on resource management for cloud computng systems. He s currently pursung a Post-doctoral fellowshp at Unversty of Toronto (Canada) He s also nterested n related areas ncludng bg-data analytcs, software-defned networkng, network vrtualzaton and management. Reaz Ahmed receved hs PhD n Computer Scence from the Unversty of Waterloo, Canada n 27. Hs BSc. and MSc. degrees n Computer Scence are from the Bangladesh Unversty of Engneerng and Technology (BUET), Dhaka, Bangladesh n 2 and 22, respectvely. He s currently an Assstant Research Professor at the School of Computer Scence n the Unversty of Waterloo. Hs research nterests nclude Network Vrtualzaton, Network Functon Vrtualzaton, Software Defned Networkng, Internet of thngs and Future Internet Archtectures. Raouf Boutaba receved the M.Sc. and Ph.D. degrees n computer scence from the Unversty Perre and Mare Cure, Pars, France, n 199 and 1994, respectvely. He s currently a Professor of computer scence wth the Unversty of Waterloo, Waterloo, ON, Canada. Hs research nterests nclude control and management of networks and dstrbuted systems. He s a fellow of the IEEE and the Engneerng Insttute of Canada. Yapng Lu receved the Ph.D. degree n computer scence from Natonal Unversty of Defense Technology, Chna, n 26. She s currently a Professor n School of Computer wth Natonal Unversty of Defense Technology. Her current research nterests nclude network archtecture, nter-doman routng, network vrtualzaton and network securty. Zhhong Lu receved hs B.A.Sc. and M.Sc. degrees n computer scence from South Chna Unversty of Technology and Natonal Unversty of Defense Technology, respectvely. He s a Ph.D. canddate n Natonal Unversty of Defense Technology wth research nterests n bg-data analytcs and resource management n cloud computng. Currently, he s a vstng student n Unversty of Waterloo, Canada. Zhenghu Gong receved the B.E. degree n electronc engneerng from Tsnghua Unversty, Bejng, Chna, n 197. He s currently a Professor n School of Computer wth Natonal Unversty of Defense Technology, Changsha, Chna. Hs research nterests nclude computer network and communcaton, network securty and datacenter networkng (c) 215 IEEE. Personal use s permtted, but republcaton/redstrbuton requres IEEE permsson. See for more nformaton.

DREAMS: Dynamic Resource Allocation for MapReduce with Data Skew

DREAMS: Dynamic Resource Allocation for MapReduce with Data Skew DREAMS: Dynamc Resource Allocaton for MapReduce wth Data Skew Zhhong Lu, Q Zhang, Mohamed Faten Zhan, Raouf Boutaba Yapng Lu and Zhenghu Gong College of Computer, Natonal Unversty of Defense Technology,

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing A Replcaton-Based and Fault Tolerant Allocaton Algorthm for Cloud Computng Tork Altameem Dept of Computer Scence, RCC, Kng Saud Unversty, PO Box: 28095 11437 Ryadh-Saud Araba Abstract The very large nfrastructure

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1 Send Orders for Reprnts to reprnts@benthamscence.ae The Open Cybernetcs & Systemcs Journal, 2014, 8, 115-121 115 Open Access A Load Balancng Strategy wth Bandwdth Constrant n Cloud Computng Jng Deng 1,*,

More information

Efficient Striping Techniques for Variable Bit Rate Continuous Media File Servers æ

Efficient Striping Techniques for Variable Bit Rate Continuous Media File Servers æ Effcent Strpng Technques for Varable Bt Rate Contnuous Meda Fle Servers æ Prashant J. Shenoy Harrck M. Vn Department of Computer Scence, Department of Computer Scences, Unversty of Massachusetts at Amherst

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

Survey on Virtual Machine Placement Techniques in Cloud Computing Environment

Survey on Virtual Machine Placement Techniques in Cloud Computing Environment Survey on Vrtual Machne Placement Technques n Cloud Computng Envronment Rajeev Kumar Gupta and R. K. Paterya Department of Computer Scence & Engneerng, MANIT, Bhopal, Inda ABSTRACT In tradtonal data center

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,

More information

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications Methodology to Determne Relatonshps between Performance Factors n Hadoop Cloud Computng Applcatons Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng and

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

IWFMS: An Internal Workflow Management System/Optimizer for Hadoop

IWFMS: An Internal Workflow Management System/Optimizer for Hadoop IWFMS: An Internal Workflow Management System/Optmzer for Hadoop Lan Lu, Yao Shen Department of Computer Scence and Engneerng Shangha JaoTong Unversty Shangha, Chna lustrve@gmal.com, yshen@cs.sjtu.edu.cn

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy 4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

More information

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng

More information

An MILP model for planning of batch plants operating in a campaign-mode

An MILP model for planning of batch plants operating in a campaign-mode An MILP model for plannng of batch plants operatng n a campagn-mode Yanna Fumero Insttuto de Desarrollo y Dseño CONICET UTN yfumero@santafe-concet.gov.ar Gabrela Corsano Insttuto de Desarrollo y Dseño

More information

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters Frequency Selectve IQ Phase and IQ Ampltude Imbalance Adjustments for OFDM Drect Converson ransmtters Edmund Coersmeer, Ernst Zelnsk Noka, Meesmannstrasse 103, 44807 Bochum, Germany edmund.coersmeer@noka.com,

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

A Programming Model for the Cloud Platform

A Programming Model for the Cloud Platform Internatonal Journal of Advanced Scence and Technology A Programmng Model for the Cloud Platform Xaodong Lu School of Computer Engneerng and Scence Shangha Unversty, Shangha 200072, Chna luxaodongxht@qq.com

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST) yaoqi.feng@yahoo.

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST) yaoqi.feng@yahoo. ICSV4 Carns Australa 9- July, 007 RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL Yaoq FENG, Hanpng QIU Dynamc Test Laboratory, BISEE Chna Academy of Space Technology (CAST) yaoq.feng@yahoo.com Abstract

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign PAS: A Packet Accountng System to Lmt the Effects of DoS & DDoS Debsh Fesehaye & Klara Naherstedt Unversty of Illnos-Urbana Champagn DoS and DDoS DDoS attacks are ncreasng threats to our dgtal world. Exstng

More information

Fault tolerance in cloud technologies presented as a service

Fault tolerance in cloud technologies presented as a service Internatonal Scentfc Conference Computer Scence 2015 Pavel Dzhunev, PhD student Fault tolerance n cloud technologes presented as a servce INTRODUCTION Improvements n technques for vrtualzaton and performance

More information

Network Aware Load-Balancing via Parallel VM Migration for Data Centers

Network Aware Load-Balancing via Parallel VM Migration for Data Centers Network Aware Load-Balancng va Parallel VM Mgraton for Data Centers Kun-Tng Chen 2, Chen Chen 12, Po-Hsang Wang 2 1 Informaton Technology Servce Center, 2 Department of Computer Scence Natonal Chao Tung

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

How To Solve An Onlne Control Polcy On A Vrtualzed Data Center

How To Solve An Onlne Control Polcy On A Vrtualzed Data Center Dynamc Resource Allocaton and Power Management n Vrtualzed Data Centers Rahul Urgaonkar, Ulas C. Kozat, Ken Igarash, Mchael J. Neely urgaonka@usc.edu, {kozat, garash}@docomolabs-usa.com, mjneely@usc.edu

More information

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems Jont Schedulng of Processng and Shuffle Phases n MapReduce Systems Fangfe Chen, Mural Kodalam, T. V. Lakshman Department of Computer Scence and Engneerng, The Penn State Unversty Bell Laboratores, Alcatel-Lucent

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

Enabling P2P One-view Multi-party Video Conferencing

Enabling P2P One-view Multi-party Video Conferencing Enablng P2P One-vew Mult-party Vdeo Conferencng Yongxang Zhao, Yong Lu, Changja Chen, and JanYn Zhang Abstract Mult-Party Vdeo Conferencng (MPVC) facltates realtme group nteracton between users. Whle P2P

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

Cloud-based Social Application Deployment using Local Processing and Global Distribution

Cloud-based Social Application Deployment using Local Processing and Global Distribution Cloud-based Socal Applcaton Deployment usng Local Processng and Global Dstrbuton Zh Wang *, Baochun L, Lfeng Sun *, and Shqang Yang * * Bejng Key Laboratory of Networked Multmeda Department of Computer

More information

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6 PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has

More information

Fair Virtual Bandwidth Allocation Model in Virtual Data Centers

Fair Virtual Bandwidth Allocation Model in Virtual Data Centers Far Vrtual Bandwdth Allocaton Model n Vrtual Data Centers Yng Yuan, Cu-rong Wang, Cong Wang School of Informaton Scence and Engneerng ortheastern Unversty Shenyang, Chna School of Computer and Communcaton

More information

denote the location of a node, and suppose node X . This transmission causes a successful reception by node X for any other node

denote the location of a node, and suppose node X . This transmission causes a successful reception by node X for any other node Fnal Report of EE359 Class Proect Throughput and Delay n Wreless Ad Hoc Networs Changhua He changhua@stanford.edu Abstract: Networ throughput and pacet delay are the two most mportant parameters to evaluate

More information

Credit Limit Optimization (CLO) for Credit Cards

Credit Limit Optimization (CLO) for Credit Cards Credt Lmt Optmzaton (CLO) for Credt Cards Vay S. Desa CSCC IX, Ednburgh September 8, 2005 Copyrght 2003, SAS Insttute Inc. All rghts reserved. SAS Propretary Agenda Background Tradtonal approaches to credt

More information

Politecnico di Torino. Porto Institutional Repository

Politecnico di Torino. Porto Institutional Repository Poltecnco d Torno Porto Insttutonal Repostory [Artcle] A cost-effectve cloud computng framework for acceleratng multmeda communcaton smulatons Orgnal Ctaton: D. Angel, E. Masala (2012). A cost-effectve

More information

Traffic State Estimation in the Traffic Management Center of Berlin

Traffic State Estimation in the Traffic Management Center of Berlin Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal peter.vortsch@ptv.de Peter Möhl, PTV AG,

More information

Analysis of Premium Liabilities for Australian Lines of Business

Analysis of Premium Liabilities for Australian Lines of Business Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton

More information

How To Calculate The Accountng Perod Of Nequalty

How To Calculate The Accountng Perod Of Nequalty Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT Chapter 4 ECOOMIC DISATCH AD UIT COMMITMET ITRODUCTIO A power system has several power plants. Each power plant has several generatng unts. At any pont of tme, the total load n the system s met by the

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS 21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS

More information

Cloud Auto-Scaling with Deadline and Budget Constraints

Cloud Auto-Scaling with Deadline and Budget Constraints Prelmnary verson. Fnal verson appears In Proceedngs of 11th ACM/IEEE Internatonal Conference on Grd Computng (Grd 21). Oct 25-28, 21. Brussels, Belgum. Cloud Auto-Scalng wth Deadlne and Budget Constrants

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION

More information

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

Improved SVM in Cloud Computing Information Mining

Improved SVM in Cloud Computing Information Mining Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

J. Parallel Distrib. Comput. Environment-conscious scheduling of HPC applications on distributed Cloud-oriented data centers

J. Parallel Distrib. Comput. Environment-conscious scheduling of HPC applications on distributed Cloud-oriented data centers J. Parallel Dstrb. Comput. 71 (2011) 732 749 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. ournal homepage: www.elsever.com/locate/pdc Envronment-conscous schedulng of HPC applcatons

More information

Sketching Sampled Data Streams

Sketching Sampled Data Streams Sketchng Sampled Data Streams Florn Rusu, Aln Dobra CISE Department Unversty of Florda Ganesvlle, FL, USA frusu@cse.ufl.edu adobra@cse.ufl.edu Abstract Samplng s used as a unversal method to reduce the

More information

2. SYSTEM MODEL. the SLA (unlike the only other related mechanism [15] we can compare it is never able to meet the SLA).

2. SYSTEM MODEL. the SLA (unlike the only other related mechanism [15] we can compare it is never able to meet the SLA). Managng Server Energy and Operatonal Costs n Hostng Centers Yyu Chen Dept. of IE Penn State Unversty Unversty Park, PA 16802 yzc107@psu.edu Anand Svasubramanam Dept. of CSE Penn State Unversty Unversty

More information

INSTITUT FÜR INFORMATIK

INSTITUT FÜR INFORMATIK INSTITUT FÜR INFORMATIK Schedulng jobs on unform processors revsted Klaus Jansen Chrstna Robene Bercht Nr. 1109 November 2011 ISSN 2192-6247 CHRISTIAN-ALBRECHTS-UNIVERSITÄT ZU KIEL Insttut für Informat

More information

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell

More information

Lecture 2: Single Layer Perceptrons Kevin Swingler

Lecture 2: Single Layer Perceptrons Kevin Swingler Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses

More information

Traffic-light a stress test for life insurance provisions

Traffic-light a stress test for life insurance provisions MEMORANDUM Date 006-09-7 Authors Bengt von Bahr, Göran Ronge Traffc-lght a stress test for lfe nsurance provsons Fnansnspetonen P.O. Box 6750 SE-113 85 Stocholm [Sveavägen 167] Tel +46 8 787 80 00 Fax

More information

行 政 院 國 家 科 學 委 員 會 補 助 專 題 研 究 計 畫 成 果 報 告 期 中 進 度 報 告

行 政 院 國 家 科 學 委 員 會 補 助 專 題 研 究 計 畫 成 果 報 告 期 中 進 度 報 告 行 政 院 國 家 科 學 委 員 會 補 助 專 題 研 究 計 畫 成 果 報 告 期 中 進 度 報 告 畫 類 別 : 個 別 型 計 畫 半 導 體 產 業 大 型 廠 房 之 設 施 規 劃 計 畫 編 號 :NSC 96-2628-E-009-026-MY3 執 行 期 間 : 2007 年 8 月 1 日 至 2010 年 7 月 31 日 計 畫 主 持 人 : 巫 木 誠 共 同

More information

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm A New Task Schedulng Algorthm Based on Improved Genetc Algorthm n Cloud Computng Envronment Congcong Xong, Long Feng, Lxan Chen A New Task Schedulng Algorthm Based on Improved Genetc Algorthm n Cloud Computng

More information

SIMPLE LINEAR CORRELATION

SIMPLE LINEAR CORRELATION SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused.

More information

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall SP 2005-02 August 2005 Staff Paper Department of Appled Economcs and Management Cornell Unversty, Ithaca, New York 14853-7801 USA Farm Savngs Accounts: Examnng Income Varablty, Elgblty, and Benefts Brent

More information

IMPACT ANALYSIS OF A CELLULAR PHONE

IMPACT ANALYSIS OF A CELLULAR PHONE 4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently.

To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently. Corporate Polces & Procedures Human Resources - Document CPP216 Leave Management Frst Produced: Current Verson: Past Revsons: Revew Cycle: Apples From: 09/09/09 26/10/12 09/09/09 3 years Immedately Authorsaton:

More information

Period and Deadline Selection for Schedulability in Real-Time Systems

Period and Deadline Selection for Schedulability in Real-Time Systems Perod and Deadlne Selecton for Schedulablty n Real-Tme Systems Thdapat Chantem, Xaofeng Wang, M.D. Lemmon, and X. Sharon Hu Department of Computer Scence and Engneerng, Department of Electrcal Engneerng

More information

Multi-Resource Fair Allocation in Heterogeneous Cloud Computing Systems

Multi-Resource Fair Allocation in Heterogeneous Cloud Computing Systems 1 Mult-Resource Far Allocaton n Heterogeneous Cloud Computng Systems We Wang, Student Member, IEEE, Ben Lang, Senor Member, IEEE, Baochun L, Senor Member, IEEE Abstract We study the mult-resource allocaton

More information

The Load Balancing of Database Allocation in the Cloud

The Load Balancing of Database Allocation in the Cloud , March 3-5, 23, Hong Kong The Load Balancng of Database Allocaton n the Cloud Yu-lung Lo and Mn-Shan La Abstract Each database host n the cloud platform often has to servce more than one database applcaton

More information

Multiple-Period Attribution: Residuals and Compounding

Multiple-Period Attribution: Residuals and Compounding Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens

More information

Optimal Map Reduce Job Capacity Allocation in Cloud Systems

Optimal Map Reduce Job Capacity Allocation in Cloud Systems Optmal Map Reduce Job Capacty Allocaton n Cloud Systems Marzeh Malemajd Sharf Unversty of Technology, Iran malemajd@ce.sharf.edu Danlo Ardagna Poltecnco d Mlano, Italy danlo.ardagna@polm.t Mchele Cavotta

More information

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Brigid Mullany, Ph.D University of North Carolina, Charlotte Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte

More information

Self-Adaptive SLA-Driven Capacity Management for Internet Services

Self-Adaptive SLA-Driven Capacity Management for Internet Services Self-Adaptve SLA-Drven Capacty Management for Internet Servces Bruno Abrahao, Vrglo Almeda and Jussara Almeda Computer Scence Department Federal Unversty of Mnas Geras, Brazl Alex Zhang, Drk Beyer and

More information

An Integrated Dynamic Resource Scheduling Framework in On-Demand Clouds *

An Integrated Dynamic Resource Scheduling Framework in On-Demand Clouds * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 30, 1537-1552 (2014) An Integrated Dynamc Resource Schedulng Framework n On-Demand Clouds * College of Computer Scence and Technology Zhejang Unversty Hangzhou,

More information

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE Yu-L Huang Industral Engneerng Department New Mexco State Unversty Las Cruces, New Mexco 88003, U.S.A. Abstract Patent

More information

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy Fnancal Tme Seres Analyss Patrck McSharry patrck@mcsharry.net www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton

More information

An ILP Formulation for Task Mapping and Scheduling on Multi-core Architectures

An ILP Formulation for Task Mapping and Scheduling on Multi-core Architectures An ILP Formulaton for Task Mappng and Schedulng on Mult-core Archtectures Yng Y, We Han, Xn Zhao, Ahmet T. Erdogan and Tughrul Arslan Unversty of Ednburgh, The Kng's Buldngs, Mayfeld Road, Ednburgh, EH9

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and Ths artcle appeared n a journal publshed by Elsever. The attached copy s furnshed to the author for nternal non-commercal research and educaton use, ncludng for nstructon at the authors nsttuton and sharng

More information

Enterprise Master Patient Index

Enterprise Master Patient Index Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an

More information

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Research Note APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES * Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC

More information

A New Quality of Service Metric for Hard/Soft Real-Time Applications

A New Quality of Service Metric for Hard/Soft Real-Time Applications A New Qualty of Servce Metrc for Hard/Soft Real-Tme Applcatons Shaoxong Hua and Gang Qu Electrcal and Computer Engneerng Department and Insttute of Advanced Computer Study Unversty of Maryland, College

More information

DBA-VM: Dynamic Bandwidth Allocator for Virtual Machines

DBA-VM: Dynamic Bandwidth Allocator for Virtual Machines DBA-VM: Dynamc Bandwdth Allocator for Vrtual Machnes Ahmed Amamou, Manel Bourguba, Kamel Haddadou and Guy Pujolle LIP6, Perre & Mare Cure Unversty, 4 Place Jusseu 755 Pars, France Gand SAS, 65 Boulevard

More information

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model

More information

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application Internatonal Journal of mart Grd and lean Energy Performance Analyss of Energy onsumpton of martphone Runnng Moble Hotspot Applcaton Yun on hung a chool of Electronc Engneerng, oongsl Unversty, 511 angdo-dong,

More information

A Performance Analysis of View Maintenance Techniques for Data Warehouses

A Performance Analysis of View Maintenance Techniques for Data Warehouses A Performance Analyss of Vew Mantenance Technques for Data Warehouses Xng Wang Dell Computer Corporaton Round Roc, Texas Le Gruenwald The nversty of Olahoma School of Computer Scence orman, OK 739 Guangtao

More information