SCIENTIFIC simulations executed on parallel computing


 Julian Walton
 2 years ago
 Views:
Transcription
1 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 20, NO. X, XXX Scheduling Parallel Task Grahs on (Alost) Hoogeneous Multicluster Platfors PierreFrançois Dutot, Tchiou N Také, Frédéric Suter, Meber, IEEE, and Henri Casanova Abstract Alications structured as arallel task grahs exhibit both data and task arallelis and arise in any doains. Scheduling these alications efficiently on arallel latfors has been a longstanding challenge. In the case of a single hoogeneous latfor, such as a cluster, results have been obtained both in theory, i.e., guaranteed algoriths, and, in ractice, i.e., ragatic heuristics. Due to task arallelis, these alications are well suited for execution on distributed latfors that san ultile clusters ossibly in ultile institutions. However, the only available results in this context are nonguaranteed heuristics. In this aer, we develo a scheduling algorith, MCGAS, which is alicable to ulticluster latfors that are alost hoogeneous. Such latfors are often found as large subsets of ulticluster latfors. Our novel contribution is that MCGAS coutes task allocations so that a (tunable) erforance guarantee is rovided. Since a erforance guarantee does not necessarily ily good average erforance in ractice, we also coare MCGAS with a recently roosed nonguaranteed algorith. Using siulation over a wide range of exeriental scenarios, we find that MCGAS leads to better average alication akesans than its coetitor. Index Ters Mixed arallelis, arallel task grah scheduling, erforance guarantee, ulticluster latfor. Ç 1 INTRODUCTION SCIENTIFIC siulations executed on arallel couting latfors can exloit two tyes of arallelis: task arallelis and data arallelis. A taskarallel alication is artitioned into a set of tasks with ossible recedence and counication constraints. A dataarallel alication tyically exhibits arallelis at the level of loos, i.e., iterations can be executed concetually in a Single Instruction Multile Data (SIMD) fashion. A way to exose increased arallelis, to, in turn, achieve higher scalability and erforance, is to write arallel alications that use both tyes of arallelis, using what is often called ixed arallelis. With ixed arallelis, alications are structured as arallel task grahs (PTGs), that is, task grahs of dataarallel tasks. PTGs arise naturally in any alications (see [1] for a discussion of the benefits of ixed arallelis and alication exales). One wellknown challenge for PTGs is scheduling, that is, aking decisions for aing coutation and data transfers to latfor coonents in a view to otiizing soe erforance etric. The vast ajority of works that target the scheduling of PTGs use alication execution tie, or akesan, as the erforance etric. Mixed arallelis adds another level of difficulty to the already challenging scheduling. P.F. Dutot is with the Université Pierre MendèsFrance, Grenoble 2/LIG., UMR 5217 CNRS INPG INRIA UJF, Grenoble 1 UPMF, Grenoble 2, France. Eail: T. N Také and F. Suter are with the Nancy University/LORIA. UMR 7503 CNRS INPL INRIA Nancy 2 UHP, Nancy 1. Eail: {tchiou.ntake, H. Casanova is with the Inforation and Couter Sciences Deartent at the University of Hawaii at Manoa, 1680 EastWest Rd, POST 317, Honolulu, HI Eail: Manuscrit received 29 Jan. 2008; revised 4 Set. 2008; acceted 8 Dec. 2008; ublished online 7 Jan Recoended for accetance by D. Trystra. For inforation on obtaining rerints of this article, lease send eail to: and reference IEEECS Log Nuber TPDS Digital Object Identifier no /TPDS roble for taskarallel alications because dataarallel tasks are oldable, i.e., they can be executed on various nubers of rocessors, with ore rocessors leading to faster task execution ties. This raises the question of how any rocessors should be allocated to each dataarallel task. In other words, what is the best tradeoff between running ore concurrent dataarallel tasks with each fewer rocessors, or running fewer concurrent tasks each with ore rocessors? The ost oular arallel couting latfors today are coodity clusters, which are therefore riary candidates for running PTGs. Most clusters consist of identical coute nodes (at least when they are initially ut in roduction), and thus, the question of scheduling PTGs on hoogeneous latfors has been studied by any researchers. Fro a theoretical standoint, although the scheduling roble is NPcolete, algoriths with erforance guarantees, defined as the axiu ratio between the roduced akesan and the otial akesan, have been develoed in [2], [3], [4], [5]. Fro a ore alied standoint, any nonguaranteed heuristics have been roosed and shown to lead to good average erforance in ractice [6], [7], [8], [9], [10], [11]. In site of the abundance of deloyed hoogeneous clusters, heterogeneous latfors have received a lot of attention in the last decade. The riary otivation coes fro iroveents in network and iddleware technology that have ade it ossible to aggregate several clusters over ultile institutions. These ulticluster latfors, which are a for of grid couting, hold the roise of higher levels of scale and erforance than ossible with a single cluster. This is articularly true for taskarallel alications, which are less tightly couled than urely dataarallel alications and can thus accoodate large intercluster network latencies (e.g., on widearea networks). Consequently, any PTGs are wellsuited to execution on ulticluster latfors. Multicluster latfors raise two challenges for the scheduling of arallel alications, and PTGs are no /09/$25.00 ß 2009 IEEE Published by the IEEE Couter Society
2 2 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 20, NO. X, XXX 2009 TABLE 1 An Alost Hoogeneous Subset of Grid 5000 TABLE 2 Intersite Network Latencies (in Milliseconds) on Grid 5000 excetion. First, these latfors are heterogeneous because they consist of clusters in different institutions. Second, they are coosite and, thus, it is inadvisable to run dataarallel tasks across clusters, which adds an additional constraint when coared to the PTG scheduling roble for noncoosite latfors. Develoing PTG scheduling algoriths with erforance guarantees on (ulticluster) heterogeneous latfors is an oen research question and revious work has instead focused on develoing ragatic heuristics [12]. In this aer, we adat theoretical results for PTG scheduling on hoogeneous latfors to ulticluster latfors. We address the two aforeentioned challenges as follows. With resect to heterogeneity, we ake the observation that there are deloyed ulticluster latfors (or significant subsets thereof) that exhibit low heterogeneity in ters of rocessor seed. Therefore, guaranteed scheduling algoriths develoed for hoogeneous latfors could lead to good results in our context. With resect to the coosite nature of ulticluster latfors, we adat recent theoretical results obtained for hierarchical clusters of Syetric Multirocessors (SMPs) [13], noting that collection of clusters hierarchies are akin to cluster of SMPs hierarchies. More secifically, we ake the following contributions.. We develo the first ractical ileentation and exeriental evaluation of a reviously described task allocation algorith that leads to a erforance guarantee.. We design a scheduling algorith with a tunable erforance guarantee for hoogeneous ulticluster latfors.. We evaluate our scheduling algorith in siulation to ut its average erforance in ersective with its erforance guarantee.. We coare our scheduling algorith with a recently ublished ragatic heuristic for scheduling PTGs on ulticluster latfors. This aer is organized as follows: Section 2 details our latfor and alication odels. Section 3 discusses related work. Section 4 resents our scheduling algorith, which we evaluate in Section 5. Section 6 concludes the aer with a suary of our findings. 2 PLATFORM AND APPLICATION MODELS 2.1 Platfor Model In this aer, we base our latfor odel on a realworld ulticluster latfor, Grid 5000 [14], [15]. The goal of Grid 5000 is to build a highly reconfigurable, controllable, and onitorable exeriental latfor to allow exeriental arallel and distributed couting research. The latfor consists of nine geograhically distributed sites, aggregating a total of 5,000 CPUs, and is funded by the French ACI Grid incentive of the French Ministry of Research and Education. Each of the nine sites hosts at least one coodity cluster, and the nuber of rocessors er cluster ranges fro around 100 to around 1,000. The architectures of these rocessors are AMD Oteron, Intel Xeon, Intel Itaniu 2, or PowerPC. Although the Grid 5000 latfor is heterogeneous, it was established as a concerted effort with a goal of avoiding wide heterogeneity of rocessor erforance. This ay not be the case for other roduction grid latfors, in which it ay not be ossible to find a significant subset of the resources with low heterogeneity. By contrast, we can easily identify an alost hoogeneous subset of Grid This subset corises 545 rocessors distributed aong six clusters. Table 1 suarizes the nuber of rocessors er cluster and the couting seed of the rocessors in each cluster, in GFlo/sec. These values were obtained with the High Perforance Linack benchark over the AMD Core Math Library (ACML) either with the original clock rates (on the Lyon, Nancy, Orsay, Rennes, and Sohia sites) or by underclocking the rocessors, which is done on the Lille site secifically to reduce the heterogeneity of Grid Five hundred fortyfive rocessors, although aounting to only a little over 10 ercent of the overall latfor, still reresent a large hoogeneous coute latfor. There is, therefore, a strong otivation for atteting to use this alost hoogeneous subset to the best of its otential for running PTGs, for instance, by using sohisticated task allocation algoriths with erforance guarantees. Each cluster uses a Gigabit interconnect (GigaEthernet or Myrinet) internally, and all clusters are interconnected together by the widearea RENATER Education and Research Network, a 10 Gigabit/sec network. Table 2 shows the intersite latencies as easured on Grid 5000, in seconds. The siulations in this aer are based on the values given in Tables 1 and Alication Model A PTG alication is odeled as a Directed Acyclic Grah (DAG) G¼ðV; EÞ, where V¼fv i j i ¼ 1;...;Vg is a set of vertices reresenting dataarallel tasks, or tasks for short, and E¼fe i;j jði; jþ 2f1;...;Vgf1;...;Vgg is a set of edges between vertices, reresenting counication between tasks. Each edge e i;j has a weight, which is the aount of data (in bytes) that task v i ust send to task v j (we call v j a successor of v i and v i a redecessor of v j ). Note that in addition to data counication itself, there ay be an overhead for data redistribution, e.g., when task v i is executed on a different nuber of rocessors than task v j.
3 DUTOT ET AL.: SCHEDULING PARALLEL TASK GRAPHS ON (ALMOST) HOMOGENEOUS MULTICLUSTER PLATFORMS 3 Without loss of generality, we assue that G has a single entry task and a single exit task. Since dataarallel tasks can be executed on various nubers of rocessors, we denote by T k ðv; Þ the execution tie of task v if it were to be executed on rocessors of cluster C k. In ractice, T k ðv; Þ can be easured via bencharking on each cluster for several values of, or it can be calculated via a erforance odel. In this work, we assue that T k ðv; Þ does not increases as increases, i.e., using ore rocessors for a task does not lengthen its execution tie. The overall execution tie of G, orakesan, is defined as the tie between the beginning of G s entry task and the coletion of G s exit task. We take a sile aroach for odeling dataarallel tasks. We assue that a task oerates on ffiffiffi a data ffiffiffi set of d double recision eleents (for instance, a d d square atrix). We arbitrarily assue that rocessors have at ost 1 GByte of eory, and thus d 121 M. We also assue that d is above 4 M (if d is too sall, the dataarallel task should ost likely be fused with its redecessor or successor). The volue of data counicated between two tasks is equal to 8 d bytes. We odel the coutational colexity of a task, in nuber of oerations, with one of the three following exressions, which are reresentative of coon ffiffiffi ffiffiffi alications: a d (e.g., a stencil coutation on a d d doain), a d log d (e.g., sorting ffiffiffi an ffiffiffi array of d eleents), d 3=2 (e.g., ultilication of d d atrices). For the first two tyes of colexity, a is icked randoly between 2 6 and 2 9 to cature the fact that soe of these tasks often erfor ultile iterations. We consider four scenarios: three in which all tasks have one of the three coutational colexities above, and one in which task coutational colexities are chosen randoly aong the three. The above odel leads to a range of counicationcoutation ratios that corresond to any tyical coutational tasks and realworld alications. More secifically, assue a 1Gbit/sec network (as the internal switches in our clusters) and GFlo/sec rocessors (as the fastest rocessors in our latfor). Our synthetic PTGs corresond to situations in which the total (sequential) tie for erforing all coutations is between 1.1 and 42.8 ties larger than the total (sequential) tie for erforing all data counications. Therefore, our exerients san the range fro counicationintensive to coutationintensive alications. While the above rovides a odel for sequential task execution, we also need to account for arallel executions, i.e., for how task execution tie varies with the nuber of rocessors. We use a sile odel that is used extensively in the literature, thus allowing our results to be coared with reviously ublished results consistently. This odel is based on Adahl s law [16] and secifies that a fraction of a task s sequential execution tie is nonarallelizable. We ick rando values uniforly between 0 and 25 ercent. With this Adahl odel, an alication task exhibits different execution ties for different nubers of rocessors. We denote by! i the work of task v i, i.e., the roduct of its execution tie and the nuber of rocessors allocated to it. We consider alications that consist of 10, 20, or 30 dataarallel tasks. We use four oular araeters to define the shae of the DAG: width, regularity, density, and jus. The width deterines the axiu arallelis in the DAG, that is, the nuber of tasks in the largest level. A sall value leads to chain grahs and a large value leads to forkjoin grahs. The regularity denotes the unifority of the nuber of tasks in each level. A low value eans that levels contain very dissiilar nubers of tasks, while a high value eans that all levels contain siilar nubers of tasks. The density denotes the nuber of edges between two levels of the DAG, with a low value leading to few edges and a large value leading to any edges. These three araeters take values between 0 and 1. In our exerients, we use values 0.2, 0.5, and 0.8 for width, and 0.2 and 0.8 for regularity and density. Finally, we add rando jus edges that go fro level l to level l þ ju, for ju ¼ 1; 2; 4 (the case ju ¼ 1 corresonds to layered DAGs [6]). We refer the reader to our DAG generation rogra and its docuentation for ore details [17]. Note that our DAG generation rocedure is siilar to ones used reviously in the literature, for instance, in [18]. It was also used to evaluate the HCPA scheduling heuristic [12], to which we coare the algorith roosed in this aer. Overall, we have ¼ 432 different DAG tyes. Since soe DAG characteristics are rando, for each DAG tye, we generate three sale DAGs, for a total of 1;296 DAGs. 3 RELATED WORK In this section, we review related work, categorizing it with resect to the underlying latfor odel, and referring both to theoretical results, such as guaranteed algoriths, and to ragatic nonguaranteed heuristics, whenever alicable. The objective of all these algoriths is to iniize alication akesan. This is the erforance etric we use in this work as well. 3.1 Single Hoogeneous Cluster Early work in the area [19], [20] rooses an algorith to coute otial PTG schedules under strong assutions, naely that all dataarallel tasks exhibit the sae articular arallel erforance behavior and rocessor allocations can be fractional rather than integral. In the general case, a seinal result in the area of PTG scheduling fro a theoretical standoint is the guaranteed twoste algorith roosed in [4]. In a first ste, the algorith decides how any rocessors should be allocated to each task, which is done via a relaxed linear rogra iniization and which also results in fractional rocessor allocations. A rounding rocedure is then used to obtain integral allocations [21]. In ste two, the algorith uses a sile list scheduling aroach to a tasks to sets of rocessors. The guaranteed erforance ratio is defined as the axiu ratio between the roduced akesan and the otial akesan. In [4], it is shown that the guaranteed erforance ratio of this algorith is 2:62 in the secific case of treeshaed PTGs and 5:24 in the general case. This result was iroved in [5], leading to a 4:73 erforance ratio in the general case. One of our contributions is that to
4 4 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 20, NO. X, XXX 2009 the best of our knowledge, we rovide the first exeriental evaluation of the aroach in [4]. Several ractical PTG scheduling algoriths based on heuristics have been roosed in the literature [6], [8], [9], [10], [11]. Like the guaranteed algoriths discussed earlier, the algoriths in [6], [8], [9], [10] roceed in two hases. A roinent algorith is Critical Path and Area (CPA)based scheduling [8], which ais at finding the best coroise between two quantities. The first quantity is the length of the critical ath, i.e., the ath in the PTG on which the su of the edge and vertex weights is axial. We denote the length of the critical ath by C ax. The second quantity is the ratio of the total work, i.e., W ¼ P V i¼0! i, and of the total nuber of rocessors,. This ratio is then the average work er rocessor. The rincile of the CPA algorith is to start by allocating only one rocessor to each task. Therefore, initially C ax is larger than W. Then, at each iteration, CPA adds one ore rocessor to the task belonging to the critical ath that benefits the ost fro this onerocessor allocation increase. The allocation rocess stos when C ax becoes saller than W. Indeed, the case C ax ¼ W corresonds to an otial tradeoff because both these quantities are lower bounds of the alication akesan. Deending on alication and latfor characteristics, CPA ay lead to excessively large allocations that can revent the concurrent execution of indeendent tasks. Two algoriths address this liitation. MCPA [6], which is only alicable to layered PTGs, liits rocessor allocations to ensure that all the tasks in a level of the PTG can be executed concurrently. HCPA [12] eloys a odified definition of the average work er rocessor to reove the bias induced by a large nuber of available rocessors and is alicable to any PTG. These last three algoriths all use a list schedulingbased task aing hase by which tasks are aed to rocessors in order of decreasing botto level (i.e., distance to the PTG s exit task), accounting for data counication and data redistribution costs. The icaslb oneste algorith in [11] was shown to lead to better erforance than soe twoste algoriths, including CPA, while aintaining reasonable colexity. This algorith erfors allocation and aing siultaneously by iteratively increasing the allocations of tasks on the critical ath, with a lookahead echanis to avoid being traed in local inia, and a backfilling aroach to irove the schedule. 3.2 Multile Hoogeneous Clusters Scheduling algoriths with erforance guarantees have been studied for a hierarchy of hoogeneous clusters, which is, in fact, a single cluster with identical nodes, where each node is an SMP and thus is a cluster of rocessors, where each cluster has the sae nuber of rocessors. Using the work in [22] as a basis, Dutot [13] has roosed extensions to the aroach in [4] to accoodate ultile clusters. The key difficulty is that the execution tie of a dataarallel task on a set of rocessors deends on the reartition of these rocessors aong clusters. This difficulty is alleviated by enforcing a laceent rule, which works as follows. Let be the nuber of rocessors to be allocated to a dataarallel task, s be the nuber of rocessors er cluster, and q and r be the quotient and the reainder of the integer division of by s ( ¼ q s þ r). An allowable laceent ust use q full clusters and r rocessors in a single cluster. This rule sily iniizes the nuber of different clusters used to run a dataarallel task. The algorith in [13] uses an allocation ste siilar to that in [4] for a single cluster and a secialized list scheduling in the second ste. However, the transition between the two stes is based on a differentiation between sall and large tasks (deending on the nuber of allocated rocessors) and on constraints on the nuber of clusters that can run sall tasks siultaneously. With the otial choice for this differentiation and this constraint, the algorith has an overall guarantee no worse than 5:64. This result holds for PTGs structured as trees, and a guarantee twice as large can be easily obtained in the general case. Note, however, that a lower guarantee in the general case could be achieved by leveraging the techniques roosed in [5]. To the best of our knowledge, no (nonguaranteed) scheduling heuristics were develoed secifically for the case of ultile hoogeneous clusters. However, the heuristics for ultile heterogeneous clusters reviewed in the next section are certainly alicable to ultile hoogeneous clusters. 3.3 Multile Heterogeneous Clusters To the best of our knowledge, no PTG scheduling algorith with erforance guarantees has been develoed for heterogeneous (ulticluster) latfors. Two heuristics have been recently roosed: Heterogeneous CPA (HCPA) [23] and MHEFT [24]. HCPA extends the CPA algorith [8] to heterogeneous latfors by using the concet of a reference cluster. Allocations on the reference cluster are translated into allocations on clusters containing rocessors with various seeds. MHEFT extends the wellknown HEFT algorith for scheduling taskarallel DAGs [25]. MHEFT erfors list scheduling by reasoning on average dataarallel task execution ties for onerocessor allocations on all ossible clusters. Weaknesses in both HCPA and MHEFT were identified and reedied in [12]. In that aer, the authors erfor a thorough coarison of both iroved algoriths and find that although no algorith is overwhelingly better than the other, HCPA would ost likely lead to schedules that would be referred by the ajority of users. HCPA was shown to achieve a good tradeoff between alication akesan and arallel efficiency (i.e., how well resources are utilized). In this work, we coare our aroach to this iroved HCPA version. 4 A GUARANTEED ALGORITHM FOR HOMOGENEOUS MULTIcLUSTER PLATFORMS In Section 2.1, we noted that there are grid latfors, such as ulticluster grids, in which significant subsets of the resources are alost hoogeneous in ters of coute seed. This rovides the otivation for this aer, naely, the investigation of a guaranteed algorith for a hoogeneous ulticluster latfor. Our work draws insiration fro the work in [13]. Recall that in that work, the target latfor is a hoogeneous cluster of SMP nodes, while we consider a hoogeneous collection of clusters. There are
5 DUTOT ET AL.: SCHEDULING PARALLEL TASK GRAPHS ON (ALMOST) HOMOGENEOUS MULTICLUSTER PLATFORMS 5 thus two key differences between the latfor odel in that aer and ours. 1. In [13], all nodes have the sae nuber of rocessors (because they are hoogeneous SMP nodes), but in this aer, clusters can have different nubers of nodes (there are sall clusters and large clusters). 2. In [13], dataarallel tasks are allowed to run over ultile nodes. By contrast, in this work, we restrict a dataarallel task to run within a single cluster. Disallowing dataarallel tasks running over ultile clusters is sensible because of the cost of intercluster counication and this restriction is enforced in all revious work on the toic of PTG scheduling on ulticluster latfors. While the first difference above causes difficulties, the second one silifies the scheduling roble. Indeed, with tasks running over a single cluster, the laceent rule defined in [13] and outlined in Section 3 is no longer necessary. 4.1 Fundaental Previous Results Before resenting the details of our algorith, we recall two fundaental results that rovide the basis of our aroach. The first result, by Skutella [21], gives a linear rogra to find the task allocations that lead to the best ossible tradeoff between the length of the critical ath and the average work er rocessor. The second result, by Leère et al. [4], introduces the notion of bounding the nuber of rocessors allocated to each task to irove the erforance ratio of the list scheduling The TieCost TradeOff Proble The tiecost tradeoff roble (see [26]) is very siilar to the rocessor allocation roble we face when scheduling PTGs. As in the tiecost tradeoff roble, we have two lower bounds on the etric to be otiized: the length of the critical ath and the average work er rocessor. Reducing the nuber of rocessors allocated to any task affects this tradeoff in favor of a saller average work, while it ay increase the critical ath. Conversely, increasing the nuber of rocessors used to coute a task is likely to shorten the critical ath and increase the average work er rocessor. The goal is to achieve the best tradeoff between the two, i.e., iniizing their axiu. As a result, rovided we are in a onecluster scenario, we can reuse the aroach in [21] for solving the tiecost tradeoff roble directly for scheduling a PTG. We defined earlier T k ðv; Þ as the tie it takes to colete task v when using rocessors of cluster k. Since for now, we consider only one cluster, we shorten the notation to Tðv; Þ in this section. The work used for task v on rocessors is then T ðv; Þ. There are ossible allocations for each task: execution tie Tðv; Þ and a work of Tðv; Þ, for ¼ 1;...;. We thus have to solve a discrete otiization roble, i.e., finding the otial tradeoff by icking for each task a articular allocation aong a finite set of ossible allocations. As with any robles, solving the discrete roble is strongly NPhard, while solving a continuous version of roble is easy. The idea here is then to first solve a larger, but continuous roble, in which each task v is relaced by a set of 1 activities. Each activity has a continuous linear cost function defined based on execution tie. To each activity v i corresonds a variable x v;i verifying the following inequalities: Tðv; Þ x v;i Tðv; iþ. The cost of activity v i, denoted by! v;i, is set to! v;i ¼ Tðv; iþ x v;i ðði þ 1ÞTðv; i þ 1Þ itðv; iþþ: Tðv; iþ Tðv; Þ Since Tðv; Þ decreases as increases, for all i saller than, Tðv; iþ is no saller than Tðv; Þ. To kee track of the recedence constraints, we introduce variables s v to ensure that no task starts before all its redecessors end. In short, for all ðu; vþ 2E, s v ax i ðs u þ x u;i Þ. To suarize, for any given critical ath length C ax, the inial cost necessary to achieve it is found by solving the rational linear rogra defined by the following constraints: 8v; i; Tðv; Þ x v;i ; 8v; i; x v;i Tðv; iþ; 8ðu; vþ 2E; ax i ðs u þ x u;i Þs v ; 8v; i; s v þ x v;i C ax ; and iniizing the cost function X v2v X 1 i¼1! v;i : Note that a fixed cost corresonding to the sallest ossible total work, achieved when each task is allocated only one rocessor, has to be added to the above cost function in order to coute the true total work, which we denote by W. We refer the reader to [21] for all details and justifications about the construction of the above linear rogra. At this oint, we have a way to ick any C ax value and coute the corresonding W value, with the goal of finding the otial tradeoff. The C ax and W values are W continuous and not necessarily integers. Since and C ax have oosite behavior, there are two ossible scenarios. If one is always larger than the other, one can use straightfor W ward extree allocations (each task uses one rocessor if is always larger, or all rocessors if C ax is always larger). Otherwise, an otial tradeoff can be aroached by binary search. In the latter scenario, the values obtained are lower bounds of the otial discrete tradeoff, that is, of the discrete values of the critical ath length and of the total work, so that the axiu of the critical ath length and the average work er rocessor is iniized. We denote these discrete values by Cax and W, resectively. The work in [21] uses a rounding technique to turn the continuous solution ðc W ax ; Þ into a solution of the discrete roble with tie and cost values ðc ax ; W Þ, which are, resectively, lower than 1 1 C ax and 1 W, where is a araeter that can be chosen arbitrarily between 0 and 1. Choosing an allocation with a rocessors for task v in the original roble aounts to setting all the x v;i to Tðv; Þ with i lower than a and x v;i equals to Tðv; iþ for i larger than or equal to a. The cost incurred in the linear rogra for task i with these x v;i values is then X i<a ðði þ 1ÞTðv; i þ 1Þ itðv; iþþ ¼ at ðv; aþ Tðv; 1Þ;
6 6 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 20, NO. X, XXX 2009 which is the exected cost inus the aforeentioned fixed cost corresonding to the sallest total work. When rounding the linear rogra s otial solution, one ust choose which x v;i will be set to Tðv; Þ and which ones will be set to Tðv; iþ. This is done with resect to a threshold in T ðv;iþ x v;i Tðv;iÞ T ðv;þ the following way. If is lower than or equal to, then x v;i is set to Tðv; iþ, reducing the work contribution! v;i of the corresonding activity to 0, while increasing the tie dedicated to activity i Tðv; iþ x v;i ðtðv; iþ Tðv; ÞÞ Tðv; iþ: Therefore, x v;i ¼ Tðv; iþ 1 1 x v;i and the tie used for any 1 activity is not increased by ore than a factor 1 ; hence, C ax 1 1 C ax. Conversely, if is greater than, then x v;i is T ðv;iþ x v;i Tðv;iÞ T ðv;þ set to Tðv; Þ, reducing the tie needed for the activity, while increasing! v;i to ði þ 1ÞTðv; i þ 1Þ itðv; iþ. Since Tðv; iþ x v;i >ðtðv; iþ Tðv; ÞÞ, straightforwardly! v;i >! v;i, which eans that the total work is not increased by a factor of ore than 1 ; hence, W 1 W. We have thus couted discrete C ax and W values that 1 are at ost a factor 1 and 1 larger than the otial discrete values, resectively Scheduling Moldable Tasks on a Single Cluster Based on the linear rograing aroach in [21], the work in [4] focuses on how to schedule the tasks efficiently while reserving ost of the allocations so that one can obtain a erforance ratio derived fro the lower bounds on the critical ath and the average work er rocessor. The difficulty coes fro the fact that the allocations are couted in a setting where an infinite nuber of rocessors can be used at the sae tie since there are no constraints in the linear rogra on siultaneous execution of dataarallel tasks. With tasks with different execution ties, there is no sile geoetrical transforation to transfor a schedule for an unbounded nuber of rocessors into one for a fixed nuber of rocessors. The schedule has to be reconstructed fro scratch, only keeing the allocation inforation. The algorith roosed in [4] is derived fro the classical list scheduling algorith. However, list scheduling cannot be used directly as it can be arbitrarily far fro the otial schedule. Consider, for exale, an instance with airs of tasks where the first task has to be executed on all rocessors for a very short aount of tie, while the second task has to be scheduled afterwards on a single rocessor for a long tie. The worst case for list scheduling is to schedule all airs one after the other, while the otial is to schedule all the first tasks and then all the second tasks in arallel, resulting in a schedule without idle tie. To avoid the roble of having lots of ready tasks requiring too any rocessors, the solution roosed in [4] is to enforce a axiu nuber of rocessors er task. In the original article, this liit was noted ðþ, where is the nuber of rocessors of the cluster; however, to avoid confusions with the threshold of the linear rogra, this liit is noted b in the rest of this aer. Sily ut, the algorith inserts a bounding ste between allocation (derived fro the tiecost linear rogra with araeter set to 1 2 ) and laceent (according to a list scheduling algorith). This bounding ste ensures the desired erforance ratio of 3 þ ffiffi 5. The analysis can be suarized as follows. There are three different kinds of tie intervals in the outut schedule:. T 1 : intervals where at ost b 1 rocessors are used;. T 2 : intervals where at least b and at ost b rocessors are used; and. T 3 : intervals where at least b þ 1 rocessors are used. As in Graha s classical roof for the 2 1 erforance ratio [27], a bounding of the critical ath and of total work can be ade with resect to the length of these three intervals, as. during the first kind of interval, no tasks has seen its allocation reduced to exactly b rocessors (otherwise at least b rocessors would be used);. during the first and second tie interval, no task is ready to be scheduled since there are at least b idle rocessors and the laceent algorith is a list scheduling algorith;. we can give for each kind of tie interval a lower bound on the nuber of rocessors used, naely, 1 for T 1, b for T 2, and b þ 1 for T 3. A straightforward calculation yields the otial b deending on the total nuber of rocessors, and the erforance ratio. See [4] for all details. 4.2 The MCGAS Algorith We call our new algorith MultiCluster Guaranteed Allocation Scheduling (MCGAS). MCGAS, like the algorith in [13] for scheduling PTGs on clusters of SMPs, relies heavily on the works in [21] and [4]. The first ste of the algorith is the allocation hase fro [21], which rounds off the solution of a rational linear rogra corresonding to a tiecost tradeoff roble, as exlained in Section Let us denote by C ax and W the values of C ax and W that corresond to the otial tradeoff. The allocations roduced in this first hase ensure that C ax and W are at ost 1= and 1= as large as C ax and W, resectively, where is a araeter between 0 and 1. Note that this aroach has been reeatedly resented in the theoretical literature over the last decade. One of our contributions in this aer is that to the best of our knowledge, we resent the first ractical ileentation of the tiecost tradeoff linear rogra. Therefore, for the first tie, we are able to evaluate its efficacy for alication scheduling in ractice. Once the initial rocessor allocation is deterined, the schedule is roduced via a odified list scheduling algorith as in [4]. As exlained in Section 4.1.2, we erfor a bounding of task allocations so that these allocations are at ost b, where the value of b is to be defined. A large value favors data arallelis, while a sall value favors task arallelis. The goal for setting b to a value lower than, say, the nuber of rocessors of the largest cluster is to avoid illadvised stalling of the critical ath. Indeed, the allocation hase of MCGAS, albeit leading to a erforance guarantee, does not attet to balance data and task arallelis, and ay thus lengthen the critical ath in ways that could be avoided. Once all allocations have been bounded, tasks can be
7 DUTOT ET AL.: SCHEDULING PARALLEL TASK GRAPHS ON (ALMOST) HOMOGENEOUS MULTICLUSTER PLATFORMS 7 Fig. 1. Main stes of the MCGAS algorith. then aed to rocessors using a list scheduling algorith. Fig. 1 suarizes the stes of the MCGAS algorith. The efficacy of the scheduling algorith and the erforance guarantee are both contingent uon a good choice for the values of the and b araeters, as seen in the next section. It is iortant to note that the erforance guarantee in MCGAS, which is detailed in the next section, does not account for counication between tasks. This is also the case for reviously roosed guaranteed algoriths for hoogeneous noncoosite latfors [2], [3], [4], [5]. We leave the investigation of a guarantee that takes counication into account outside the scoe of this aer. 4.3 Finding the Best Paraeters for MCGAS In this section, we coute MCGAS s erforance guarantee as a function of and b. We then show how to set the values of these two araeters so that the erforance guarantee is as tight as ossible. We consider a ulticluster hoogeneous latfor. Let n be the nuber of clusters in the latfor, and i, i ¼ 1;...; n, the nubers of rocessors in these clusters. We refer to i as the size of cluster i. Without loss of generality, we assue that the clusters are sorted by nonincreasing sizes ( 1 2 n ), so that 1 is the size of the largest cluster. Given the clusters sizes, we define S ¼ Xn i¼1 axð0; i b þ 1Þ; which is a quantity that we will use in what follows. Intuitively, S reresents the iniu nuber of allocated rocessors so that no cluster has b idle rocessors. For a given PTG, we can categorize each tie ste in the resulting MCGAS schedule into three kinds of tie intervals according to the following rules:. T 1 : intervals where at ost b 1 rocessors are used;. T 2 : intervals where at least b and at ost S 1 rocessors are used; and. T 3 : intervals where at least S rocessors are used. The definitions of these intervals are adated fro those used in [4] (see Section 4.1.2), and use our newly defined constant S. For the sake of silicity, we use t i to denote the su of the lengths of all intervals of tye T i. The goal of this classification is to bound the contribution of each tie ste to C ax and to W. We know fro [21] that after the rounding hase, C ax is at ost 1= larger than Cax and that W is at ost 1= larger than W. After the allocation bounding ste, during which tasks that were allocated ore than b rocessors are reduced to exactly b rocessors, W does not increase. Indeed, a saller rocessor allocation for a task does not increase the task s work because we assue that tasks have arallel efficiencies lower than 1. For the sae reason, the reduction to b rocessors causes C ax to increase by at ost a factor 1 =b. During intervals of tye T 1, no task has seen its allocation reduced to exactly b rocessors (since fewer than b rocessors are used). Therefore, for each interval T 1, there is a task that is on the critical ath and whose allocation has not been reduced during the allocation bounding ste. During intervals of tye T 2, there is at least one cluster where b rocessors are idle, which eans that no task is ready to be scheduled, which eans again that there is a task in each of these intervals that belongs to the critical ath. However, in this case, the task ay have seen its allocation reduced fro 1 rocessors to b rocessors. With this reduced allocation, the task s contribution to C ax is at least b= 1. For intervals of tye T 3, there is no cluster with at least b idle rocessors, which eans that there ight be an unscheduled ready task that is on the critical ath. Consequently, on the one hand, C ax is not saller than t 1 þ b= 1 t 2, and on the other hand, W is larger than t 1 þ b t 2 þ S t 3. Since the schedule length C ax is the su t 1 þ t 2 þ t 3, we can now write a colete set of inequalities leading to the erforance guarantee for our algorith, using Cax to denote the otial schedule length C ax ¼ t 1 þ t 2 þ t 3 ; Cax 1 C ax t 1 þ b t 2 ; 1 Cax X n i i¼1 W t 1 þ bt 2 þ St 3 : Let us define ¼ P n i¼1 i and introduce a new araeter 2½0; 1Š. This araeter does not have any concrete interretation, but is used as an algebraic device to cobine the two above inequalities. More secifically, ultilying the first inequality by, the second by, and adding the together, we obtain Cax Þ þð1 t 1 þ b þ 1 t 2 þ S t 3 : Let us now define as the iniu of the three following quantities: 1 ð; ; bþ ¼þ ; 2 ð; ; bþ ¼b 3 ð; ; bþ ¼ þ 1 S : ;
8 8 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 20, NO. X, XXX 2009 Using the fact that C ax ¼ t 1 þ t 2 þ t 3, we obtain C ax C ax: The guaranteed erforance ratio is thus equal to 1=, which is iniized when is axiized. Finding a closed for for the , b, and values that axiize given the ð 1 ;...; n Þ values sees very challenging. But it turns out that it is ossible to deterine a good aroxiation of the solution. The three quantities 1, 2, and 3 are of the for AX þ BY with A equal to, B equal to, and with both X and Y greater than or equal to zero. Lea 1. Given two nubers X and Y greater than or equal to zero, and two araeters and with values in ½0; 1Š, the function fð; Þ ¼X þy reaches its axiu when ¼ 1. Proof 1. Let t ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi. We rove that ð1 tþ 2 is larger than or equal to ð1 tþ 2 ¼ 1 þ t 2 2t ¼ 1 þ 2 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 1 þ 2 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 1 þ þ 2 ffiffiffi ffiffiffiffiffiffiffiffiffiffiffi ¼ þð 1 Þ 2 : Since t 2 ¼ and X and Y are greater than or equal to zero, we have fðt; 1 tþ fð; Þ. Therefore, for any coule ð; Þ, we can find another coule ð 0 ; 0 Þ that verifies 0 ¼ 1 0 and fð 0 ; 0 Þfð; Þ, which coletes the roof. tu Using Lea 1, we can reove the araeter fro the equations ¼ inð 1 ð; bþ; 2 ð; bþ; 3 ð; bþþ ¼ in 2 þ 2 ;b 2 þ 2 1! ; 2 S! : Let us coute the values of b and that axiize this quantity (recall that and 1 are characteristics of the latfor and are fixed, while S is a iecewise linear function of b). Note that 1 ð; bþ deends only on. Also, 1 decreases fro 1 to 1= when increases fro 0 to 1, and 3 increases fro 0 to S= when increases fro 0 to 1. Therefore, the largest iniu of 1 and 3 is achieved when 1 ð; bþ ¼ 3 ð; bþ ¼ 1;3, that is, when 2 ¼ 2 ðs 1Þ=. is then axiized when 2 ð; bþ is equal to 1;3, that is, when bðs 1 þ 1 Þ¼S 1. We obtain the best value for b as an integer aroxiation of the noninteger solution of this sile equation and can then coute the best value of. Recall that by best values we ean the values that lead to the tightest erforance guarantee. With our articular target latfor, described in Section 2.1, the above coutation yields that the best is reached when b ¼ 87. This value is the sallest integer such that bðs þq 1 ffiffiffiffiffiffiffiffi 1Þ >S 1. In this case, the best value for S 1 is ¼ 1=ð1 þ Þ 0:662. The value of is then ffiffiffiffiffiffiffiffiffiffiffi S=ð S 1 þ ffiffiffiffi Þ 2 0:115, which corresonds to a erforance ratio 1= close to Fig. 2. 2D rojections of all ð; b; 1=Þ trilets with erforance ratio 1= lower than 10. For given values of ð 1 ;...; n Þ, in our case, the values corresonding to the subset of the Grid 5000 latfor described in Table 1, we can easily lot the different guaranteed erforance ratios 1=, each for given values of b and. For our latfor configuration, 1= takes values in the interval ½8:696; þ1½ deending on b and. Fig. 2 shows the two rojections of all trilets ðb; ; 1=Þ along the  and the baxes for erforance ratios at ost 10. The to grah shows the rojection along the baxis. The botto grah shows the rojection along the axis. In both grahs, we see that the erforance ratio increases ore sharly as b or becoe larger than their otial values, and ore oderately when they becoe saller than their otial values. Fig. 3 shows the doain of the and b values in which one is guaranteed that the erforance ratio is lower than 10. These grahs, or siilar grahs obtained for other latfor configurations, rovide good guidance for tuning the values of and b. Indeed, the values of and b that lead to the tightest erforance guarantee ay not lead to the best average alication erforance in ractice. Therefore, one ay wish to tune the and b values to ensure a reasonable erforance guarantee while leading to good average observed erforance over a range of relevant alication configurations. 5 SIMULATION RESULTS We use siulation for evaluating our roosed algorith and coaring it to reviously roosed heuristics. Siulation allows us to erfor a statistically significant nuber of exerients for a wide range of alication configurations (in a reasonable aount of tie). We use the
9 DUTOT ET AL.: SCHEDULING PARALLEL TASK GRAPHS ON (ALMOST) HOMOGENEOUS MULTICLUSTER PLATFORMS 9 Fig. 3. Doain of b and values for which MCGAS s erforance ratio is lower than 10, for our articular latfor configuration. SIMGRID toolkit [28], [29] as the basis for our siulator. SIMGRID rovides the required fundaental abstractions for the discreteevent siulation of arallel alications in distributed environents and was secifically designed for the evaluation of scheduling algoriths. We use SIMGRID v3.3r5,668. Our siulations are for the Grid 5000 latfor, as described in Section 2.1. They account for tie taken by coutation, data counication, and data redistribution oerations (even though soe of our algoriths ay ignore data counication and data redistribution overheads when aking scheduling decisions). 5.1 Processor Allocation The ain strength of MCGAS, as discussed at length in Section 4, is that it coutes a sound allocation of rocessors to dataarallel tasks, which is the basis for its erforance guarantee. In this section, we evaluate the quality of this allocation when coared to that of the allocation rocedure used by the HCPA algorith [12], which iroves uon that used by the seinal CPA algorith [8]. We coare the quality of allocations as follows. For our 1,296 alication configurations (see Section 2.2), we coute a rocessor allocation using MCGAS and HCPA. For each allocation, we coute the length of its critical ath (lower values ean better erforance) and its total work (lower values ean lower resource consution). For each alication configuration, Fig. 4 shows the length of the critical ath achieved by the MCGAS allocation relative to that achieved by the HCPA allocation. We use the values b ¼ 87 and ¼ 0:66 for MCGAS, which lead to the best erforance guarantee. We show three curves, with a curve for each set of alication configurations with a given nuber of tasks (10, 20, and 30). For each curve, the data oints are sorted by increasing value of the relative akesan. We see that across our alication configurations, the length of the critical ath of the MCGAS allocation is at ost ercent of that achieved by HCPA. As the nuber of tasks increases, the relative critical ath increases: MCGAS leads to critical aths 12 ercent, 9 ercent, and 8 ercent shorter than HCPA, on average, across alication configurations with 10, 20, and 30 tasks, resectively. Fig. 5 is siilar to Fig. 4, but lots the total work of the allocations couted by MCGAS relative to that of allocations couted by HCPA. We see that overall Fig. 4. Relative critical ath length of MCGAS ( ¼ 0:66 and b ¼ 87) coared to HCPA, for alications with 10, 20, and 30 tasks. MCGAS leads to allocations that consue ore resources than HCPA. This is because HCPA was designed to liit resource consution exlicitly, as exlained in [12]. Therefore, unlike MCGAS, HCPA tends to trade off shorter critical ath length for lower resource consution (which is the reason for the trends in Fig. 4). However, we can see that MCGAS consues fewer resources relatively to HCPA as the nuber of alication tasks increases: MCGAS consues 110 ercent, 57 ercent, and 35 ercent ore resources than HCPA, on average, across alication configurations with 10, 20, and 30 tasks, resectively. Siilar lots for alication araeters other than the nuber of tasks (i.e., width, density, regularity, jus, and colexity, which iacts the counicationcoutation ratio), not included here, show that the critical ath length and total work of MCGAS, relative to that of HCPA, do not deend significantly on these araeters. Ultiately, we wish to assess whether allocations roduced by MCGAS are inherently better than those roduced by HCPA in a view to iniizing alication akesan. The length of the critical ath and the total work divided by the nuber of rocessors are two lower bounds of alication akesan. Therefore, for a given alication configuration, we coute the axiu of these two lower bounds, which we ter M, as couted by MCGAS and by HCPA. A lower value of M indicates a better oortunity to achieve a lower akesan. We have seen that the akesan for MCGAS is lower than that of HCPA, and Fig. 5. Relative total work of MCGAS ( ¼ 0:66 and b ¼ 87) coared to HCPA, for alications with 10, 20, and 30 tasks.
10 10 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 20, NO. X, XXX 2009 the total work for MCGAS is higher than that of HCPA. Therefore, it is not clear what the trend would be for the axiu of the two. It turns out that MCGAS achieves a lower M value in all cases across our exerients (9.63 ercent lower, on average, and at ost ercent lower). This eans that when MCGAS exhibits higher total work than HCPA, then the akesan is larger than the average work er rocessor anyway, and thus deterines the value of M. We also coared the allocations roduced by MCGAS and HCPA for PTGs fro real ixed arallel alications. We used PTGs fro the Strassen atrix ultilication and fro the Fast Fourier Transfor (FFT) alication. Both are classical test cases for PTG scheduling algoriths [19], [30] and we refer the reader, for instance, to [31] for details on their PTGs. These PTGs are ore regular than our synthetic PTGs, which are ore reresentative of workflow alications with the coosition of arbitrary oerators in arbitrary ways. For each alication, we considered 10 different PTGs configurations. We found that, on average, MCGAS roduces M values that are 39 ercent and 21 ercent shorter than HCPA for the Strassen atrix ultilication and the FFT alication, resectively. The advantage of MCGAS is thus even larger than with our synthetic PTGs. 5.2 Task Maing The task allocation couted by MCGAS rovides a erforance guarantee as long as aing tasks to rocessors is done using list scheduling. In the theoretical literature, the default is to use a naïve First Fit (FF) strategy: for each ready task, sily a the task to one of the clusters that can start executing the task the earliest. In our case, these candidate clusters are deterined accounting for revious task aing decisions but ignoring the (slight) heterogeneity of the latfor and the cost of data counication/redistribution between tasks. In ractice, however, one is better advised to use a ore sohisticated aing strategy which, while not roviding a tighter erforance guarantee, should lead to better average erforance. Coaring the average erforance of FF and of this better strategy with a set of exerients is interesting. Let x denote the erforance guarantee (i.e., the roduced schedule is at ost a factor x worse than otial). Then if one observes that the better strategy is, on average, a factor y better than the FF strategy, then it can be concluded that it is, on average, at ost a factor x=y fro otial. The worstcase erforance is the sae for both aing strategies. We exeriented with a oular task aing heuristic, Earliest Finish Tie (EFT): for each ready task, a the task to one of the clusters that can colete the task the earliest. This coutation accounts for revious task aing decisions, for the (slight) heterogeneity of the latfor, and for the cost of data counication and redistribution between tasks. We conducted exerients over all our alication configurations for b ¼ 87 and ¼ 0:66, values which lead to the best erforance guarantee, i.e., As exected, we find that the average erforance of EFT is better than that of FF for all our exeriental scenarios. The advantage of EFT increases for wider PTGs as the FF algorith akes allocation decisions that end u hindering task arallelis, and we discuss average results based on PTG width. Secifically, we found that for PTGs with width 0.2, 0.5, and 0.8, EFT outerfors FF, on average, by 3.7 ercent, 20.5 ercent, and 34.2 ercent, resectively. Therefore, we can conclude that, on average, EFT is at ost a factor 8:695 0:963 ¼ 8:373, 8:695 0:795 ¼ 6:913, and 8:695 0:658 ¼ 5:721 away fro otial for PTGs with width 0.2, 0.5, and 0.8, resectively. In the rest of the aer, all results are resented using the EFT task aing heuristic. As in the revious section, we evaluated MCGAS and HCPA on PTGs fro real alications, naely, the Strassen atrix ultilication and the FFT alication. For the Strassen atrix ultilication, we found that MCGAS roduces akesans that are, on average, 22 ercent shorter than those roduced by HCPA. But for the FFT alication, HCPA roduces better results than MCGAS by 13 ercent on average. This is in site of the MCGASroduced allocation being inherently better than that roduced by HCPA, as seen in the revious section. This henoenon highlights a weakness of the EFT task aing algorith for the articular FFT PTGs. These PTGs are well balanced in ters of counication and coutation. An iortant characteristic is that soe tasks roduce large outut data. However, EFT reasons only about task coletion tie, without looking ahead to see what aount of data will need to be counicated by a task after it has coleted. Therefore, it can a tasks on different clusters, thereby forcing the to engage later in costly intercluster counications for their outut data. When using the HCPAcouted allocations this weakness of EFT is not as noticeable because tasks are allocated fewer rocessors. Therefore, ore tasks can fit on fewer clusters, thus saving counications. This observation is iortant because it otivates the develoent of ore sohisticated task aing heuristics to irove average alication akesan, even when starting with good allocation decisions. 5.3 Trading off Guarantee for Perforance In this section, we describe how the behavior of MCGAS can be tuned to trade off its erforance guarantee (i.e., obtain a looser guarantee) for average erforance (i.e., obtain a lower average akesan over our range of PTG configurations). Fig. 6 lots the average akesan for different values of as b varies. A striking trend seen in Fig. 6 is the iroveent in average akesan as increases beyond 0.66, value for which the erforance guarantee of the allocation is inial. The best value of leads to the tightest guarantee in ters of worstcase erforance, while Fig. 6 shows average erforance over a articular oulation of PTG configurations. Nevertheless, one ay have exected that values of that are far fro its best value would roduce allocations that are detriental to the average akesan. One ossible exlanation is that our PTG configuration oulations, although reresentative of realworld alications, is biased. In ters of DAG structure, our PTGs do san a wide range of characteristics. However, all our dataarallel tasks have reasonable arallel efficiencies ( <0:25), which are tyical in realworld alications, but which could lead to the aforeentioned bias. For this reason, we
11 DUTOT ET AL.: SCHEDULING PARALLEL TASK GRAPHS ON (ALMOST) HOMOGENEOUS MULTICLUSTER PLATFORMS 11 Fig. 6. Evolution of the average akesan obtained with the MCGAS algorith as b varies. Each curve is for a different value. also ran siulations for PTGs whose dataarallel tasks have oor arallel efficiencies ( >0:5), but we obtained siilar results. We conclude that indeed, large values irove the average akesan, but one ust be careful to consider the behavior of the average total work as well. We find that the average total work increases with. More secifically, in our exerients, we find that each tie increases by 0.15, the average total work increases by 10 ercent to 20 ercent. Therefore, while at first glance it sees that the best aroach is to ick a value as large as ossible, it leads to a less efficient use of the latfor (ilying higher cost in ractice). Therefore, one should ick the largest value so that the erforance guarantee is under soe desirable value, thus atteting to strike a coroise between average akesan and efficiency while bounding the worstcase erforance. In all that follows we ick the largest value so that the erforance guarantee is lower than 10. Based on Fig. 3, this value is ¼ 0:81. Unlike for, icking values of b that are resectively larger or lower than the value that leads to the tightest erforance guarantee is not desirable. Indeed, we see in Fig. 6 that the best value of b (b ¼ 87) leads to the best average akesan in ractice. Using saller values lengthens the critical ath, while using larger values forces the allocation of ost tasks to the largest cluster, reventing fruitful use of the other clusters. 5.4 Coarison of MCGAS and HCPA In this section, we coare the average akesan achieved by MCGAS and HCPA. Fig. 7 shows average akesans relative to the HCPA algorith, over our range of alication configurations. To the best of our knowledge, HCPA is the best reviously ublished ragatic algorith for scheduling PTGs in ulticluster latfors. The figure shows individual averages for PTGs with width 0.2, 0.5, and 0.8, and the overall average. The ain observation is that the MCGAS algorith outerfors HCPA across the board. Grouing the PTGs by width, we see that this advantage is, on average, at ost 17 ercent, and 13 ercent when averaged over all PTGs. We conclude that although a erforance guarantee does not necessarily ily good average erforance in ractice, in this case, configuring the algorith with its best b value of 87 and with a high value so that the Fig. 7. MCGAS versus HCPA using ¼ 0:81, b ¼ 87. erforance guarantee is under a articular bound (see revious section) leads to better average erforance than that achieved by the best reviously ublished nonguaranteed algorith. 5.5 Coarison of Scheduling Ties While the results in the revious section deonstrate the sueriority of MCGAS over HCPA, they are to be ut in ersective with the tie to coute the schedule. MCGAS involves solving a linear rogra, which can be tieconsuing. In this section, we coare the execution tie of MCGAS to that of HCPA. We ran both algoriths on an Intel Xeon 1.86GHz rocessor, using the GNU Linear Prograing Kit (GLPK) library for solving the linear rogra. Table 3 shows the average execution ties, in seconds, for different PTG sizes (i.e., nubers of tasks), each averaged over 10 runs for 10 different PTG configurations. The table also shows the average sequential alication akesan and the average akesan roduced by both algoriths. Fro the table, we can see that, exectedly, the execution ties of both algoriths increase with PTG size. More striking is the fact that MCGAS is uch ore exensive than HCPA. For instance, for PTGs with 30 tasks, MCGAS takes about 1,600 ties longer to coute a schedule than HCPA. Most of the MCGAS execution tie is due to solving a rational linear rogra. The linear rogra ust be solved several ties to ileent the binary search described in Section The nuber of stes of the binary search is constant for a given recision. Polynoialtie algoriths are known for solving rational linear rogras (e.g., Kararkar s algorith [32]) and, thus, the colexity of MCGAS is also olynoial. Note that in this work, we use the GLPK toolkit to solve linear rogras. GLPK uses the silex ethod. While this ethod is known to exhibit TABLE 3 Average Execution Ties and Resulting Alication Makesans of the MCGAS and the HCPA Algoriths, in Seconds, for Different PTG Sizes
12 12 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 20, NO. X, XXX 2009 low olynoialtie colexity in ractice, in site of a theoretical exonentialtie colexity [33]. And indeed, in our case, at least u to 30 dataarallel tasks, the execution tie aears roughly quadratic. While the results in Table 3 see to indicate that MCGAS ay not be alicable to large PTGs, there are three sile reasons why its relatively high execution tie can be reduced or tolerated. First, coercial solvers, such as CPLEX [34], are known to coute results in ties u to several orders of agnitude shorter than GLPK, and their use would thus greatly reduce the MCGAS execution tie. Second, the results in the revious section and those in the table show that running HCPA instead would lead to a erforance loss 13 ercent, on average, deending on the PTG s width. The scheduling tie is thus to be ut in ersective with the exected order of agnitude of the akesan. Iortantly, the tie to coute the schedule does not deend on the duration of the alication tasks (only on the structure of the PTG and the nuber of tasks). Therefore, the higher MCGAS execution tie ay be well worth the additional exense if the alication akesan is large due to long tasks. In this case, the tie to coute the schedule is negligible coared to the alication akesan, and the akesan iroveent when using MCGAS instead of HCPA can be significant. For instance, consider the last row of Table 3 and assue that the alication tasks were 100 ties ore tieconsuing. The average alication turnaround tie, i.e., the su of the tie to coute the schedule and the alication akesan, would be aroxiately 237:05 þ 139: ;199 for MCGAS and 0:15 þ 156: ;626 for HCPA. Third, it is iortant to note that in any roduction scenarios, a (good) schedule is reused for any executions of an alication, thereby aortizing the cost of couting the schedule in the first lace. A sile technique to reduce the execution tie of MCGAS is to liit the nuber of ossible task allocations that are considered in the tiecost tradeoff roble. This technique reduces the flexibility of the allocation rocedure and degrades the initial erforance guarantee of MCGAS. As an exale, Table 4 shows average akesans roduced by and execution tie of MCGAS for PTGs with 50 and 100 tasks, for the original algorith and a odified version of it that considers only 20 ossible configurations for each task (corresonding to 1, 2, 3, 4, 5, 7, 9, 11, 14, 17, 23, 29, 38, 48, 62, 79, 102, 130, 168, or 215 rocessors). We see on the table that the odified algorith achieves an average akesan that is less than 2 ercent larger than that achieved by the original algorith, while exhibiting an average execution tie roughly 117 faster for 50task PTGs, and 52 faster for 100task PTGs. We conclude that this odification akes MCGAS usable, in ractice, for large PTGs at the exense of the erforance guarantee. In this aer, we have focused on the original MCGAS algorith, which rovides a better guarantee that the odified algoriths. Consequently, we have only resented results for PTGs with at ost 30 tasks, for which MCGAS runs in under 4 inutes on an Intel Xen 1.86GHz rocessor. TABLE 4 Evaluation of MCGAS When Liiting the Nuber of Possible Allocations for Each Task 6 CONCLUSION Guaranteed algoriths for scheduling alications structured as arallel task grahs (PTGs) have been develoed in the case of a single hoogeneous arallel couting latfor, such as a cluster [2], [3], [4], [5]. However, PTGs are articularly well suited to execution on ulticluster latfors. In this context, to the best of our knowledge, the only reviously roosed algorith is the nonguaranteed, but ragatic, HCPA algorith [12]. In this aer, we set out to develo MCGAS, a PTG scheduling algorith that rovides a erforance guarantee for ulticluster latfors. These latfors ose two challenges: they are heterogeneous and coosite. Indeed, they consist of rocessors with different seeds located in clusters of difference sizes. We have for now sidesteed the first challenge by noting that there are realworld ulticlusters latfors that corise hoogeneous or aroxiately hoogeneous subsets. Therefore, schedules couted assuing that a hoogeneous latfor can be effective on such subsets. To address the second challenge, we have develoed a guaranteed task allocation rocedure that is alicable to a latfor that consists of clusters with different nubers of rocessors. This guarantee is tunable via two araeters. We have deterined the values of these araeters that lead to the tightest erforance guarantee both analytically and in ractice. While having a erforance guarantee is always desirable, it does not ean that the algorith leads to good average erforance in ractice. Our key finding, however, is that MCGAS outerfors the nonguaranteed HCPA algorith, on average, over a large range of alication configurations. ACKNOWLEDGMENTS The work of H. Casanova is suorted in art by the US National Science Foundation under award nuber REFERENCES [1] S. Chakrabarti, J. Deel, and K. Yelick, Modeling the Benefits of Mixed Data and Task Parallelis, Proc. Sy. Parallel Algoriths and Architectures (SPAA 95), , [2] J. Turek, J. Wolf, and P. Yu, Aroxiate Algoriths for Scheduling Parallelizable Tasks, Proc. Sy. Parallel Algoriths and Architectures (SPAA 92), , [3] W. Ludwig and P. Tiwari, Scheduling Malleable and Nonalleable Tasks, Proc. Sy. Discrete Algoriths, , [4] R. Leere, D. Trystra, and G. Woeginger, Aroxiation Algoriths for Scheduling Malleable Tasks under Precedence Constraints, Proc. Ninth Ann. Euroean Sy. Algoriths ESA 2001, , [5] K. Jansen and H. Zhang, An Aroxiation Algorith for Scheduling Malleable Tasks under General Precedence Constraints, ACM Trans. Algoriths, vol. 2, no. 3, , 2006.
13 DUTOT ET AL.: SCHEDULING PARALLEL TASK GRAPHS ON (ALMOST) HOMOGENEOUS MULTICLUSTER PLATFORMS 13 [6] S. Bansal, P. Kuar, and K. Singh, An Iroved TwoSte Algorith for Task and Data Parallel Scheduling in Distributed Meory Machines, Parallel Couting, vol. 32, no. 10, , [7] V. Boudet, F. Desrez, and F. Suter, OneSte Algorith for Mixed Data and Task Parallel Scheduling without Data Relication, Proc. 17th Int l Parallel and Distributed Processing Sy., [8] A. Radulescu and A. van Geund, A LowCost Aroach Toward Mixed Task and Data Parallel Scheduling, Proc. 15th Int l Conf. Parallel Processing (ICPP 01), Set [9] S. Raaswany, Siultaneous Exloitation of Task and Data Parallelis in Regular Scientific Alications, PhD dissertation, Univ. of Illinois, [10] T. Rauber and G. Rünger, Coiler Suort for Task Scheduling in Hierarchical Execution Models, J. Systes Architecture, vol. 45, , [11] N. Vydyanathan, S. Krishnaoorthy, G. Sabin, U. Catalyurek, T. Kurc, P. Sadayaan, and J. Saltz, An Integrated Aroach for Processor Allocation and Scheduling of MixedParallel Alications, Proc. 35th Int l Conf. Parallel Processing (ICPP), [12] T. N také, F. Suter, and H. Casanova, A Coarison of Scheduling Aroaches for MixedParallel Alications on Heterogeneous Platfors, Proc. Sixth Int l Sy. Parallel and Distributed Couting, July [13] P.F. Dutot, Hierarchical Scheduling for Moldable Tasks, Proc. 11th Int l EuroPar Conf., , [14] R. Bolze, F. Caello, E. Caron, M. Daydé, F. Desrez, E. Jeannot, Y. Jégou, S. Lanteri, J. Leduc, N. Melab, G. Mornet, R. Nayst, P. Priet, B. Quetier, O. Richard, E.G. Talbi, and T. Irena, Grid 5000: A Large Scale and Highly Reconfigurable Exeriental Grid Testbed, Int l J. High Perforance Couting Alications, vol. 20, no. 4, , Nov [15] Grid5000, htt://www.grid5000.org, [16] G. Adahl, Validity of the Single Processor Aroach to Achieving Large Scale Couting Caabilities, Proc. A. Federation of Inforation Processing Soc. (AFIPS) 1967 Sring Joint Couter Conf., vol. 30, , Ar [17] F. Suter, DAG Generation Progra, htt://www.loria.fr/ suter/dags.htl, [18] Y.K. Kwok and I. Ahad, Bencharking and Coarison of the Task Grah Scheduling Algoriths, J. Parallel and Distributed Couting, vol. 59, no. 3, , [19] G.N.S. Prasanna and B.R. Musicus, The Otial Control Aroach to Generalized Multirocessor Scheduling, Algorithica, vol. 15, no. 1, , [20] G.N.S. Prasanna and B.R. Musicus, Generalized Multirocessor Scheduling and Alications to Matrix Coutations, IEEE Trans. Parallel and Distributed Systes, vol. 7, no. 6, , June [21] M. Skutella, Aroxiation Algoriths for the Discrete Tie Cost Tradeoff Proble, Math. of Oerations Research, vol. 23, no. 4, , [22] P.F. Dutot and D. Trystra, Scheduling on Hierarchical Clusters Using Malleable Tasks, Proc. Sy. Parallel Algoriths and Architectures (SPAA 01), , [23] T. N Také and F. Suter, Critical Path and Area Based Scheduling of Parallel Task Grahs on Heterogeneous Platfors, Proc. 12th Int l Conf. Parallel and Distributed Systes (ICPADS 06),. 310, July [24] H. Casanova, F. Desrez, and F. Suter, Fro Heterogeneous Task Scheduling to Heterogeneous Mixed Parallel Scheduling, Proc. 10th Int l EuroPar Conf., , Aug [25] H. Tocuoglu, S. Hariri, and M.Y. Wu, PerforanceEffective and LowColexity Task Scheduling for Heterogeneous Couting, IEEE Trans. Parallel and Distributed Systes, vol. 13, no. 3, , Mar [26] M. Vanhoucke and D. Debels, The Discrete Tie/Cost Trade off Proble: Extensions and Heuristic Procedures, J. Scheduling, vol. 10, nos. 4/5, [27] L.R. Graha, Bounds on Multirocessing Tiing Anoalies, SIAM J. Alied Math., vol. 2, , [28] H. Casanova, A. Legrand, and M. Quinson, SiGrid: A Generic Fraework for LargeScale Distributed Exerients, Proc. 10th Int l Conf. Couter Modeling and Siulation, Mar [29] SiGrid, htt://sigrid.gforge.inria.fr, [30] H. Zhao and R. Sakellariou, Scheduling Multile DAGs onto Heterogeneous Systes, Proc. 15th Heterogeneous Couting Worksho (HCW 06), Ar [31] T.H. Coren, C.E. Leiserson, and R.L. Rivest, Introduction to Algoriths. MIT Press/McGrawHill, [32] N. Kararkar, A New Polynoial Tie Algorith for Linear Prograing, Cobinatorica, vol. 4, no. 4, , [33] D.G. Luenberger and Y. Ye, Linear and Nonlinear Prograing, third ed. Sringer, [34] CPLEX, htt://www.ilog.co/roducts/clex/, PierreFrançois Dutot received the MS degree fro the Ecole Norale Suérieure de Lyon, France, in 2000, and the PhD degree fro the Institut National Polytechnique of Grenoble in He is an assistant rofessor at the University Pierre Mendès France, Grenoble, France. His research interests include theoretical asects of scheduling. Tchiou N Také received the MS degree in engineering fro Ecole Nationale d Electricité et de Mécanique, Nancy, France, in He is currently working toward the PhD degree at Nancy University, France, within the AlGorille INRIA roject tea. His current research interest is scheduling ixed arallel alications. Frédéric Suter received the MS degree fro the Université Jules Verne, Aiens, France, in 1999, and the PhD degree fro the Ecole Norale Suérieure de Lyon, France, in He is a CNRS junior researcher at the IN2P3 Couting Center in Lyon, France, since October His research interests include scheduling, grid couting, and latfor and alication siulation. Prior to joining the CNRS, he was an assistant rofessor at Nancy University and a eber of the AlGorille INRIA roject tea. He is a eber of the IEEE. Henri Casanova received the BS degree fro the Ecole Nationale Suérieure d Electronique, d Electrotechnique, d Inforatique et d Hydraulique de Toulouse, France, in 1993, the MS degree fro the Universite Paul Sabatier, Toulouse, France, in 1994, and the PhD degree fro the University of Tennessee, Knoxville, in He is an associate rofessor in the Inforation and Couter Science Deartent, University of Hawaii, Manoa. His research interests are in the area of arallel and distributed couting. In articular, his research ehasizes the odeling and the siulation of latfors and alications, as well as both the theoretical and ractical asects of scheduling robles.. For ore inforation on this or any other couting toic, lease visit our Digital Library at
GRADUAL OPTIMIZATION OF URBAN FIRE STATION LOCATIONS BASED ON GEOGRAPHICAL NETWORK MODEL
GRADUAL OPTIMIZATION OF URBAN FIRE STATION LOCATIONS BASED ON GEOGRAPHICAL NETWORK MODEL Yu Yan a, b, Guo Qingsheng a, Tang Xining c a School of Resource and Environent Science, Wuhan University, 29 Luoyu
More information1 Adaptive Control. 1.1 Indirect case:
Adative Control Adative control is the attet to redesign the controller while online, by looking at its erforance and changing its dynaic in an autoatic way. Adative control is that feedback law that looks
More informationA multi objective virtual machine placement method for reduce operational costs in cloud computing by genetic
International Journal of Couter Networs and Counications Security VOL. 2, NO. 8, AUGUST 2014, 250 259 Available online at: www.icncs.org ISSN 23089830 C N C S A ulti obective virtual achine laceent ethod
More informationCLOUD computing is quickly becoming an effective and
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 6, YEAR 203 Otial Multiserver Configuration for Profit Maxiization in Cloud Couting Junwei Cao, Senior Meber, IEEE, Kai Hwang, Fellow,
More informationLAB VIII. LOW FREQUENCY CHARACTERISTICS OF JUNCTION FIELD EFFECT TRANSISTORS
LAB V. LOW FREQUENCY CHARACTERSTCS OF JUNCTON FELD EFFECT TRANSSTORS 1. OBJECTVE n this lab, you will study the V characteristics and sallsignal odel of Junction Field Effect Transistors (JFET).. OVERVEW
More informationDynamic Price Quotation in a Responsive Supply Chain for OneofaK ind Production
Dynaic Price Quotation in a Resonsive Suly Chain for OneofaK ind Production Jian (Ray) Zhang Deartent of Mechanical and Manufacturing Engineering, University of Calgary, 5 University Drive NW, Calgary,
More informationAnalysis of a Secure Software Upload Technique in Advanced Vehicles using Wireless Links
Proceedings of the 007 IEEE Intelligent Transortation Systes Conference Seattle, WA, USA, Set. 30  Oct. 3, 007 WeC4.3 Analysis of a Secure Uload Technique in Advanced Vehicles using Wireless Links Irina
More informationTransient Performance of PacketScore for blocking DDoS attacks
Transient Perforance of PacketScore for blocking DDoS attacks Mooi Choo Chuah 1 Wing Cheong Lau Yoohwan Ki H. Jonathan Chao CSE Deartent Bell Laboratories EECS Deartent ECE Deartent Lehigh Univ. Lucent
More informationTOPIC T3: DIMENSIONAL ANALYSIS AUTUMN 2013
TOPIC T3: DIMENSIONAL ANALYSIS AUTUMN 013 Objectives (1 Be able to deterine the diensions of hysical quantities in ters of fundaental diensions. ( Understand the Princile of Diensional Hoogeneity and its
More informationarxiv:0805.1434v1 [math.pr] 9 May 2008
Degreedistribution stability of scalefree networs Zhenting Hou, Xiangxing Kong, Dinghua Shi,2, and Guanrong Chen 3 School of Matheatics, Central South University, Changsha 40083, China 2 Departent of
More informationESTIMATION OF THE DEMAND FOR RESIDENTIAL WATER IN A STONE GEARY FORM AND THE CHOICE OF THE PRICE VARIABLE MARIEESTELLE BINET. Associate Professor
ESTIMATION OF THE DEMAND FOR RESIDENTIAL WATER IN A STONE GEARY FORM AND THE CHOICE OF THE PRICE VARIABLE MARIEESTELLE BINET Associate Professor arieestelle.binet@univrennes1.fr CREM (UMR 611 CNRS),
More informationThe Online Freezetag Problem
The Online Freezetag Problem Mikael Hammar, Bengt J. Nilsson, and Mia Persson Atus Technologies AB, IDEON, SE3 70 Lund, Sweden mikael.hammar@atus.com School of Technology and Society, Malmö University,
More informationDynamic Placement for Clustered Web Applications
Dynaic laceent for Clustered Web Applications A. Karve, T. Kibrel, G. acifici, M. Spreitzer, M. Steinder, M. Sviridenko, and A. Tantawi IBM T.J. Watson Research Center {karve,kibrel,giovanni,spreitz,steinder,sviri,tantawi}@us.ib.co
More informationReliability Constrained Packetsizing for Linear Multihop Wireless Networks
Reliability Constrained acketsizing for inear Multihop Wireless Networks Ning Wen, and Randall A. Berry Departent of Electrical Engineering and Coputer Science Northwestern University, Evanston, Illinois
More informationReal Time Target Tracking with Binary Sensor Networks and Parallel Computing
Real Tie Target Tracking with Binary Sensor Networks and Parallel Coputing Hong Lin, John Rushing, Sara J. Graves, Steve Tanner, and Evans Criswell Abstract A parallel real tie data fusion and target tracking
More informationPoint Location. Preprocess a planar, polygonal subdivision for point location queries. p = (18, 11)
Point Location Prerocess a lanar, olygonal subdivision for oint location ueries. = (18, 11) Inut is a subdivision S of comlexity n, say, number of edges. uild a data structure on S so that for a uery oint
More informationAn Innovate Dynamic Load Balancing Algorithm Based on Task
An Innovate Dynaic Load Balancing Algorith Based on Task Classification Hongbin Wang,,a, Zhiyi Fang, b, Guannan Qu,*,c, Xiaodan Ren,d College of Coputer Science and Technology, Jilin University, Changchun
More informationNear Light Correction for Image Relighting and 3D Shape Recovery
Near Light Correction for Iage Relighting and 3D Shae Recovery Xiang Huang, Marc Walton, Greg Bearan and Oliver Cossairt Northwestern University, Evanston, IL, USA ANE Iage, Pasadena, CA, USA xianghuang@gailco
More informationFrom Simulation to Experiment: A Case Study on Multiprocessor Task Scheduling
From to Exeriment: A Case Study on Multirocessor Task Scheduling Sascha Hunold CNRS / LIG Laboratory Grenoble, France sascha.hunold@imag.fr Henri Casanova Det. of Information and Comuter Sciences University
More informationENFORCING SAFETY PROPERTIES IN WEB APPLICATIONS USING PETRI NETS
ENFORCING SAFETY PROPERTIES IN WEB APPLICATIONS USING PETRI NETS Liviu Grigore Comuter Science Deartment University of Illinois at Chicago Chicago, IL, 60607 lgrigore@cs.uic.edu Ugo Buy Comuter Science
More informationApplying Multiple Neural Networks on Large Scale Data
0 International Conference on Inforation and Electronics Engineering IPCSIT vol6 (0) (0) IACSIT Press, Singapore Applying Multiple Neural Networks on Large Scale Data Kritsanatt Boonkiatpong and Sukree
More informationExploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2
Exploiting Hardware Heterogeneity within the Sae Instance Type of Aazon EC2 Zhonghong Ou, Hao Zhuang, Jukka K. Nurinen, Antti YläJääski, Pan Hui Aalto University, Finland; Deutsch Teleko Laboratories,
More informationReconnect 04 Solving Integer Programs with Branch and Bound (and Branch and Cut)
Sandia is a ultiprogra laboratory operated by Sandia Corporation, a Lockheed Martin Copany, Reconnect 04 Solving Integer Progras with Branch and Bound (and Branch and Cut) Cynthia Phillips (Sandia National
More informationAn Integrated Approach for Monitoring Service Level Parameters of SoftwareDefined Networking
International Journal of Future Generation Counication and Networking Vol. 8, No. 6 (15), pp. 1974 http://d.doi.org/1.1457/ijfgcn.15.8.6.19 An Integrated Approach for Monitoring Service Level Paraeters
More informationSemiinvariants IMOTC 2013 Simple semiinvariants
Seiinvariants IMOTC 2013 Siple seiinvariants These are soe notes (written by Tejaswi Navilarekallu) used at the International Matheatical Olypiad Training Cap (IMOTC) 2013 held in Mubai during AprilMay,
More informationA Modified Measure of Covert Network Performance
A Modified Measure of Covert Network Performance LYNNE L DOTY Marist College Deartment of Mathematics Poughkeesie, NY UNITED STATES lynnedoty@maristedu Abstract: In a covert network the need for secrecy
More informationSynopsys RURAL ELECTRICATION PLANNING SOFTWARE (LAPER) Rainer Fronius Marc Gratton Electricité de France Research and Development FRANCE
RURAL ELECTRICATION PLANNING SOFTWARE (LAPER) Rainer Fronius Marc Gratton Electricité de France Research and Develoment FRANCE Synosys There is no doubt left about the benefit of electrication and subsequently
More informationAlgorithmica 2001 SpringerVerlag New York Inc.
Algorithica 2001) 30: 101 139 DOI: 101007/s0045300100030 Algorithica 2001 SpringerVerlag New York Inc Optial Search and OneWay Trading Online Algoriths R ElYaniv, 1 A Fiat, 2 R M Karp, 3 and G Turpin
More informationSQUARE GRID POINTS COVERAGED BY CONNECTED SOURCES WITH COVERAGE RADIUS OF ONE ON A TWODIMENSIONAL GRID
International Journal of Comuter Science & Information Technology (IJCSIT) Vol 6, No 4, August 014 SQUARE GRID POINTS COVERAGED BY CONNECTED SOURCES WITH COVERAGE RADIUS OF ONE ON A TWODIMENSIONAL GRID
More informationEnergy Efficient VM Scheduling for Cloud Data Centers: Exact allocation and migration algorithms
Energy Efficient VM Scheduling for Cloud Data Centers: Exact allocation and igration algoriths Chaia Ghribi, Makhlouf Hadji and Djaal Zeghlache Institut MinesTéléco, Téléco SudParis UMR CNRS 5157 9, Rue
More informationON SELFROUTING IN CLOS CONNECTION NETWORKS. BARRY G. DOUGLASS Electrical Engineering Department Texas A&M University College Station, TX 778433128
ON SELFROUTING IN CLOS CONNECTION NETWORKS BARRY G. DOUGLASS Electrical Engineering Departent Texas A&M University College Station, TX 7788 A. YAVUZ ORUÇ Electrical Engineering Departent and Institute
More informationMeasuring Bottleneck Bandwidth of Targeted Path Segments
Measuring Bottleneck Bandwidth of Targeted Path Segents Khaled Harfoush Deartent of Couter Science North Carolina State University Raleigh, NC 27695 harfoush@cs.ncsu.edu Azer Bestavros Couter Science Deartent
More informationImplementation of Active Queue Management in a Combined Input and Output Queued Switch
pleentation of Active Queue Manageent in a obined nput and Output Queued Switch Bartek Wydrowski and Moshe Zukeran AR Special Research entre for UltraBroadband nforation Networks, EEE Departent, The University
More informationModeling Parallel Applications Performance on Heterogeneous Systems
Modeling Parallel Applications Perforance on Heterogeneous Systes Jaeela AlJaroodi, Nader Mohaed, Hong Jiang and David Swanson Departent of Coputer Science and Engineering University of Nebraska Lincoln
More informationData Set Generation for Rectangular Placement Problems
Data Set Generation for Rectangular Placeent Probles Christine L. Valenzuela (Muford) Pearl Y. Wang School of Coputer Science & Inforatics Departent of Coputer Science MS 4A5 Cardiff University George
More informationInformation Processing Letters
Inforation Processing Letters 111 2011) 178 183 Contents lists available at ScienceDirect Inforation Processing Letters www.elsevier.co/locate/ipl Offline file assignents for online load balancing Paul
More informationIntroduction to NPCompleteness Written and copyright c by Jie Wang 1
91.502 Foundations of Comuter Science 1 Introduction to Written and coyright c by Jie Wang 1 We use timebounded (deterministic and nondeterministic) Turing machines to study comutational comlexity of
More informationLargeScale IP Traceback in HighSpeed Internet: Practical Techniques and Theoretical Foundation
LargeScale IP Traceback in HighSeed Internet: Practical Techniques and Theoretical Foundation Jun Li Minho Sung Jun (Jim) Xu College of Comuting Georgia Institute of Technology {junli,mhsung,jx}@cc.gatech.edu
More informationPartitioned EliasFano Indexes
Partitioned Eliasano Indexes Giuseppe Ottaviano ISTICNR, Pisa giuseppe.ottaviano@isti.cnr.it Rossano Venturini Dept. of Coputer Science, University of Pisa rossano@di.unipi.it ABSTRACT The Eliasano
More informationAn Efficient Method for Improving Backfill Job Scheduling Algorithm in Cluster Computing Systems
The International ournal of Soft Comuting and Software Engineering [SCSE], Vol., No., Secial Issue: The Proceeding of International Conference on Soft Comuting and Software Engineering 0 [SCSE ], San Francisco
More informationOnline Bagging and Boosting
Abstract Bagging and boosting are two of the ost wellknown enseble learning ethods due to their theoretical perforance guarantees and strong experiental results. However, these algoriths have been used
More informationThe Benefit of SMT in the MultiCore Era: Flexibility towards Degrees of ThreadLevel Parallelism
The enefit of SMT in the MultiCore Era: Flexibility towards Degrees of ThreadLevel Parallelis Stijn Eyeran Lieven Eeckhout Ghent University, elgiu Stijn.Eyeran@elis.UGent.be, Lieven.Eeckhout@elis.UGent.be
More informationEnergy Proportionality for Disk Storage Using Replication
Energy Proportionality for Disk Storage Using Replication Jinoh Ki and Doron Rote Lawrence Berkeley National Laboratory University of California, Berkeley, CA 94720 {jinohki,d rote}@lbl.gov Abstract Energy
More informationThe fast Fourier transform method for the valuation of European style options inthemoney (ITM), atthemoney (ATM) and outofthemoney (OTM)
Comutational and Alied Mathematics Journal 15; 1(1: 16 Published online January, 15 (htt://www.aascit.org/ournal/cam he fast Fourier transform method for the valuation of Euroean style otions inthemoney
More informationExtendedHorizon Analysis of Pressure Sensitivities for Leak Detection in Water Distribution Networks: Application to the Barcelona Network
2013 European Control Conference (ECC) July 1719, 2013, Zürich, Switzerland. ExtendedHorizon Analysis of Pressure Sensitivities for Leak Detection in Water Distribution Networks: Application to the Barcelona
More informationImpact of Processing Costs on Service Chain Placement in Network Functions Virtualization
Ipact of Processing Costs on Service Chain Placeent in Network Functions Virtualization Marco Savi, Massio Tornatore, Giacoo Verticale Dipartiento di Elettronica, Inforazione e Bioingegneria, Politecnico
More informationWeb Application Scalability: A ModelBased Approach
Coyright 24, Software Engineering Research and Performance Engineering Services. All rights reserved. Web Alication Scalability: A ModelBased Aroach Lloyd G. Williams, Ph.D. Software Engineering Research
More informationSoftware Quality Characteristics Tested For Mobile Application Development
Thesis no: MGSE201502 Software Quality Characteristics Tested For Mobile Application Developent Literature Review and Epirical Survey WALEED ANWAR Faculty of Coputing Blekinge Institute of Technology
More informationA Scalable Application Placement Controller for Enterprise Data Centers
W WWW 7 / Track: Perforance and Scalability A Scalable Application Placeent Controller for Enterprise Data Centers Chunqiang Tang, Malgorzata Steinder, Michael Spreitzer, and Giovanni Pacifici IBM T.J.
More informationSearching strategy for multitarget discovery in wireless networks
Searching strategy for ultitarget discovery in wireless networks Zhao Cheng, Wendi B. Heinzelan Departent of Electrical and Coputer Engineering University of Rochester Rochester, NY 467 (585) 75{878,
More informationAn Approach to Combating Freeriding in PeertoPeer Networks
An Approach to Cobating Freeriding in PeertoPeer Networks Victor Ponce, Jie Wu, and Xiuqi Li Departent of Coputer Science and Engineering Florida Atlantic University Boca Raton, FL 33431 April 7, 2008
More informationEfficient Key Management for Secure Group Communications with Bursty Behavior
Efficient Key Manageent for Secure Group Counications with Bursty Behavior Xukai Zou, Byrav Raaurthy Departent of Coputer Science and Engineering University of NebraskaLincoln Lincoln, NE68588, USA Eail:
More informationEffect Sizes Based on Means
CHAPTER 4 Effect Sizes Based on Means Introduction Raw (unstardized) mean difference D Stardized mean difference, d g Resonse ratios INTRODUCTION When the studies reort means stard deviations, the referred
More informationAnalyzing Spatiotemporal Characteristics of Education Network Traffic with Flexible Multiscale Entropy
Vol. 9, No. 5 (2016), pp.303312 http://dx.doi.org/10.14257/ijgdc.2016.9.5.26 Analyzing Spatioteporal Characteristics of Education Network Traffic with Flexible Multiscale Entropy Chen Yang, Renjie Zhou
More informationThe Application of Bandwidth Optimization Technique in SLA Negotiation Process
The Application of Bandwidth Optiization Technique in SLA egotiation Process Srecko Krile University of Dubrovnik Departent of Electrical Engineering and Coputing Cira Carica 4, 20000 Dubrovnik, Croatia
More informationCooperative Caching for Adaptive Bit Rate Streaming in Content Delivery Networks
Cooperative Caching for Adaptive Bit Rate Streaing in Content Delivery Networs Phuong Luu Vo Departent of Coputer Science and Engineering, International University  VNUHCM, Vietna vtlphuong@hciu.edu.vn
More informationFDA CFR PART 11 ELECTRONIC RECORDS, ELECTRONIC SIGNATURES
Document: MRM1004GAPCFR11 (0005) Page: 1 / 18 FDA CFR PART 11 ELECTRONIC RECORDS, ELECTRONIC SIGNATURES AUDIT TRAIL ECO # Version Change Descrition MATRIX 449 A Ga Analysis after adding controlled documents
More informationMedia Adaptation Framework in Biofeedback System for Stroke Patient Rehabilitation
Media Adaptation Fraework in Biofeedback Syste for Stroke Patient Rehabilitation Yinpeng Chen, Weiwei Xu, Hari Sundara, Thanassis Rikakis, ShengMin Liu Arts, Media and Engineering Progra Arizona State
More informationResearch Article Performance Evaluation of Human Resource Outsourcing in Food Processing Enterprises
Advance Journal of Food Science and Technology 9(2): 964969, 205 ISSN: 20424868; eissn: 20424876 205 Maxwell Scientific Publication Corp. Subitted: August 0, 205 Accepted: Septeber 3, 205 Published:
More informationMethod of supply chain optimization in Ecommerce
MPRA Munich Personal RePEc Archive Method of supply chain optiization in Ecoerce Petr Suchánek and Robert Bucki Silesian University  School of Business Adinistration, The College of Inforatics and Manageent
More informationUse of extrapolation to forecast the working capital in the mechanical engineering companies
ECONTECHMOD. AN INTERNATIONAL QUARTERLY JOURNAL 2014. Vol. 1. No. 1. 23 28 Use of extrapolation to forecast the working capital in the echanical engineering copanies A. Cherep, Y. Shvets Departent of finance
More information6.042/18.062J Mathematics for Computer Science December 12, 2006 Tom Leighton and Ronitt Rubinfeld. Random Walks
6.042/8.062J Mathematics for Comuter Science December 2, 2006 Tom Leighton and Ronitt Rubinfeld Lecture Notes Random Walks Gambler s Ruin Today we re going to talk about onedimensional random walks. In
More informationAn Optimal Task Allocation Model for System Cost Analysis in Heterogeneous Distributed Computing Systems: A Heuristic Approach
An Optial Tas Allocation Model for Syste Cost Analysis in Heterogeneous Distributed Coputing Systes: A Heuristic Approach P. K. Yadav Central Building Research Institute, Rooree 247667, Uttarahand (INDIA)
More informationFactor Model. Arbitrage Pricing Theory. Systematic Versus NonSystematic Risk. Intuitive Argument
Ross [1],[]) presents the aritrage pricing theory. The idea is that the structure of asset returns leads naturally to a odel of risk preia, for otherwise there would exist an opportunity for aritrage profit.
More informationCRM FACTORS ASSESSMENT USING ANALYTIC HIERARCHY PROCESS
641 CRM FACTORS ASSESSMENT USING ANALYTIC HIERARCHY PROCESS Marketa Zajarosova 1* *Ph.D. VSB  Technical University of Ostrava, THE CZECH REPUBLIC arketa.zajarosova@vsb.cz Abstract Custoer relationship
More informationRECURSIVE DYNAMIC PROGRAMMING: HEURISTIC RULES, BOUNDING AND STATE SPACE REDUCTION. Henrik Kure
RECURSIVE DYNAMIC PROGRAMMING: HEURISTIC RULES, BOUNDING AND STATE SPACE REDUCTION Henrik Kure Dina, Danish Inforatics Network In the Agricultural Sciences Royal Veterinary and Agricultural University
More informationAN ALGORITHM FOR REDUCING THE DIMENSION AND SIZE OF A SAMPLE FOR DATA EXPLORATION PROCEDURES
Int. J. Appl. Math. Coput. Sci., 2014, Vol. 24, No. 1, 133 149 DOI: 10.2478/acs20140011 AN ALGORITHM FOR REDUCING THE DIMENSION AND SIZE OF A SAMPLE FOR DATA EXPLORATION PROCEDURES PIOTR KULCZYCKI,,
More informationASIC Design Project Management Supported by Multi Agent Simulation
ASIC Design Project Manageent Supported by Multi Agent Siulation Jana Blaschke, Christian Sebeke, Wolfgang Rosenstiel Abstract The coplexity of Application Specific Integrated Circuits (ASICs) is continuously
More informationApproximatelyPerfect Hashing: Improving Network Throughput through Efficient Offchip Routing Table Lookup
ApproxiatelyPerfect ing: Iproving Network Throughput through Efficient Offchip Routing Table Lookup Zhuo Huang, JihKwon Peir, Shigang Chen Departent of Coputer & Inforation Science & Engineering, University
More informationForensic Science International
Forensic Science International 214 (2012) 33 43 Contents lists available at ScienceDirect Forensic Science International jou r nal h o me age: w ww.els evier.co m/lo c ate/fo r sc iin t A robust detection
More informationA MOST PROBABLE POINTBASED METHOD FOR RELIABILITY ANALYSIS, SENSITIVITY ANALYSIS AND DESIGN OPTIMIZATION
9 th ASCE Secialty Conference on Probabilistic Mechanics and Structural Reliability PMC2004 Abstract A MOST PROBABLE POINTBASED METHOD FOR RELIABILITY ANALYSIS, SENSITIVITY ANALYSIS AND DESIGN OPTIMIZATION
More informationResource Allocation in Wireless Networks with Multiple Relays
Resource Allocation in Wireless Networks with Multiple Relays Kağan Bakanoğlu, Stefano Toasin, Elza Erkip Departent of Electrical and Coputer Engineering, Polytechnic Institute of NYU, Brooklyn, NY, 0
More informationAn important observation in supply chain management, known as the bullwhip effect,
Quantifying the Bullwhi Effect in a Simle Suly Chain: The Imact of Forecasting, Lead Times, and Information Frank Chen Zvi Drezner Jennifer K. Ryan David SimchiLevi Decision Sciences Deartment, National
More informationPREDICTION OF POSSIBLE CONGESTIONS IN SLA CREATION PROCESS
PREDICTIO OF POSSIBLE COGESTIOS I SLA CREATIO PROCESS Srećko Krile University of Dubrovnik Departent of Electrical Engineering and Coputing Cira Carica 4, 20000 Dubrovnik, Croatia Tel +385 20 445739,
More informationEvaluating the Effectiveness of Task Overlapping as a Risk Response Strategy in Engineering Projects
Evaluating the Effectiveness of Task Overlapping as a Risk Response Strategy in Engineering Projects Lucas Grèze Robert Pellerin Nathalie Perrier Patrice Leclaire February 2011 CIRRELT201111 Bureaux
More informationCLOSEDLOOP SUPPLY CHAIN NETWORK OPTIMIZATION FOR HONG KONG CARTRIDGE RECYCLING INDUSTRY
CLOSEDLOOP SUPPLY CHAIN NETWORK OPTIMIZATION FOR HONG KONG CARTRIDGE RECYCLING INDUSTRY Y. T. Chen Departent of Industrial and Systes Engineering Hong Kong Polytechnic University, Hong Kong yongtong.chen@connect.polyu.hk
More informationLocal Connectivity Tests to Identify Wormholes in Wireless Networks
Local Connectivity Tests to Identify Wormholes in Wireless Networks Xiaomeng Ban Comuter Science Stony Brook University xban@cs.sunysb.edu Rik Sarkar Comuter Science Freie Universität Berlin sarkar@inf.fuberlin.de
More informationThe Velocities of Gas Molecules
he Velocities of Gas Molecules by Flick Colean Departent of Cheistry Wellesley College Wellesley MA 8 Copyright Flick Colean 996 All rights reserved You are welcoe to use this docuent in your own classes
More informationGenerating Certification Authority Authenticated Public Keys in Ad Hoc Networks
SECURITY AND COMMUNICATION NETWORKS Published online in Wiley InterScience (www.interscience.wiley.co). Generating Certification Authority Authenticated Public Keys in Ad Hoc Networks G. Kounga 1, C. J.
More informationConcurrent Program Synthesis Based on Supervisory Control
010 American Control Conference Marriott Waterfront, Baltimore, MD, USA June 30July 0, 010 ThB07.5 Concurrent Program Synthesis Based on Suervisory Control Marian V. Iordache and Panos J. Antsaklis Abstract
More informationPERFORMANCE METRICS FOR THE IT SERVICES PORTFOLIO
Bulletin of the Transilvania University of Braşov Series I: Engineering Sciences Vol. 4 (53) No.  0 PERFORMANCE METRICS FOR THE IT SERVICES PORTFOLIO V. CAZACU I. SZÉKELY F. SANDU 3 T. BĂLAN Abstract:
More informationCOMMONWEALTH OF VIRGINIA STATE CORPORATION COMMISSION AT RICHMOND, OCTOBER 30, 2015 ~ ORDER FOR NOTICE AND HEARING
APPLICATION OF COMMONWEALTH OF VIRGINIA STATE CORPORATION COMMISSION AT RICHMOND, OCTOBER 30, 2015 ~ (a i/i A. CO VIRGINIA ELECTRIC AND POWER COMPANY CASE NO. PUE201500104 For aroval and certification
More informationThe Research of Measuring Approach and Energy Efficiency for Hadoop Periodic Jobs
Send Orders for Reprints to reprints@benthascience.ae 206 The Open Fuels & Energy Science Journal, 2015, 8, 206210 Open Access The Research of Measuring Approach and Energy Efficiency for Hadoop Periodic
More informationTimeCost TradeOffs in ResourceConstraint Project Scheduling Problems with Overlapping Modes
TimeCost TradeOffs in ResourceConstraint Proect Scheduling Problems with Overlaing Modes François Berthaut Robert Pellerin Nathalie Perrier Adnène Hai February 2011 CIRRELT201110 Bureaux de Montréal
More informationHW 2. Q v. kt Step 1: Calculate N using one of two equivalent methods. Problem 4.2. a. To Find:
HW 2 Proble 4.2 a. To Find: Nuber of vacancies per cubic eter at a given teperature. b. Given: T 850 degrees C 1123 K Q v 1.08 ev/ato Density of Fe ( ρ ) 7.65 g/cc Fe toic weight of iron ( c. ssuptions:
More information2. FINDING A SOLUTION
The 7 th Balan Conference on Operational Research BACOR 5 Constanta, May 5, Roania OPTIMAL TIME AND SPACE COMPLEXITY ALGORITHM FOR CONSTRUCTION OF ALL BINARY TREES FROM PREORDER AND POSTORDER TRAVERSALS
More informationConsiderations on Distributed Load Balancing for Fully Heterogeneous Machines: Two Particular Cases
Considerations on Distributed Load Balancing for Fully Heterogeneous Machines: Two Particular Cases Nathanaël Cheriere Departent of Coputer Science ENS Rennes Rennes, France nathanael.cheriere@ensrennes.fr
More informationBeoSound 9000. Reference book
BeoSound 9000 Reference book CAUTION: To reduce the risk of electric shock, do not reove cover (or back). No Userserviceable arts inside. Refer servicing to qualified service ersonnel. WARNING: To revent
More informationThis paper studies a rental firm that offers reusable products to price and qualityofservice sensitive
MANUFACTURING & SERVICE OPERATIONS MANAGEMENT Vol., No. 3, Suer 28, pp. 429 447 issn 523464 eissn 5265498 8 3 429 infors doi.287/so.7.8 28 INFORMS INFORMS holds copyright to this article and distributed
More informationA framework for performance monitoring, load balancing, adaptive timeouts and quality of service in digital libraries
Int J Digit Libr (2000) 3: 9 35 INTERNATIONAL JOURNAL ON Digital Libraries SpringerVerlag 2000 A fraework for perforance onitoring, load balancing, adaptive tieouts and quality of service in digital libraries
More information( C) CLASS 10. TEMPERATURE AND ATOMS
CLASS 10. EMPERAURE AND AOMS 10.1. INRODUCION Boyle s understanding of the pressurevolue relationship for gases occurred in the late 1600 s. he relationships between volue and teperature, and between
More informationAn Improved Decisionmaking Model of Human Resource Outsourcing Based on Internet Collaboration
International Journal of Hybrid Inforation Technology, pp. 339350 http://dx.doi.org/10.14257/hit.2016.9.4.28 An Iproved Decisionaking Model of Huan Resource Outsourcing Based on Internet Collaboration
More informationMINIMUM VERTEX DEGREE THRESHOLD FOR LOOSE HAMILTON CYCLES IN 3UNIFORM HYPERGRAPHS
MINIMUM VERTEX DEGREE THRESHOLD FOR LOOSE HAMILTON CYCLES IN 3UNIFORM HYPERGRAPHS JIE HAN AND YI ZHAO Abstract. We show that for sufficiently large n, every 3unifor hypergraph on n vertices with iniu
More informationLoad balancing over redundant wireless sensor networks based on diffluent
Load balancing over redundant wireless sensor networks based on diffluent Abstract Xikui Gao Yan ai Yun Ju School of Control and Coputer Engineering North China Electric ower University 02206 China Received
More informationA Virtual Machine Dynamic Migration Scheduling Model Based on MBFD Algorithm
International Journal of Comuter Theory and Engineering, Vol. 7, No. 4, August 2015 A Virtual Machine Dynamic Migration Scheduling Model Based on MBFD Algorithm Xin Lu and Zhuanzhuan Zhang Abstract This
More informationService Network Design with Asset Management: Formulations and Comparative Analyzes
Service Network Design with Asset Management: Formulations and Comarative Analyzes Jardar Andersen Teodor Gabriel Crainic Marielle Christiansen October 2007 CIRRELT200740 Service Network Design with
More informationA quantum secret ballot. Abstract
A quantu secret ballot Shahar Dolev and Itaar Pitowsky The Edelstein Center, Levi Building, The Hebrerw University, Givat Ra, Jerusale, Israel Boaz Tair arxiv:quantph/060087v 8 Mar 006 Departent of Philosophy
More informationThe impact of metadata implementation on webpage visibility in search engine results (Part II) q
Information Processing and Management 41 (2005) 691 715 www.elsevier.com/locate/inforoman The imact of metadata imlementation on webage visibility in search engine results (Part II) q Jin Zhang *, Alexandra
More informationBranchandPrice for Service Network Design with Asset Management Constraints
BranchandPrice for Servicee Network Design with Asset Management Constraints Jardar Andersen Roar Grønhaug Mariellee Christiansen Teodor Gabriel Crainic December 2007 CIRRELT200755 BranchandPrice
More informationA magnetic Rotor to convert vacuumenergy into mechanical energy
A agnetic Rotor to convert vacuuenergy into echanical energy Claus W. Turtur, University of Applied Sciences BraunschweigWolfenbüttel Abstract Wolfenbüttel, Mai 21 2008 In previous work it was deonstrated,
More information