How To Make A Co-Ocaton Work For Free

Size: px
Start display at page:

Download "How To Make A Co-Ocaton Work For Free"

Transcription

1 Enabng Far Prcng on HPC Systems wth Node Sharng Aex D. Bresow Ananta Twar Martn Schuz Laura Carrngton Lngja Tang Jason Mars Unversty of Caforna, San Dego, CA, USA, San Dego Supercomputer Center, La Joa, CA, USA, {twar, Lawrence Lvermore Natona Laboratory, Lvermore, CA, USA, Unversty of Mchgan, Ann Arbor, MI, USA, {ngja, ABSTRACT Co-ocaton, where mutpe jobs share compute nodes n arge-scae HPC systems, has been shown to ncrease aggregate throughput and energy effcency by 1 to 2%. However, system operators dsaow co-ocaton due to far-prcng concerns,.e., a prcng mechansm that consders performance nterference from co-runnng jobs. In the current prcng mode, appcaton executon tme determnes the prce, whch resuts n unfar prces pad by the mnorty of users whose jobs suffer from co-ocaton. Ths paper presents POPPA, a runtme system that enabes far prcng by deverng precse onne nterference detecton and factates the adopton of supercomputers wth co-ocatons. POPPA everages a nove shutter mechansm a cycc, fne-graned nterference sampng mechansm to accuratey deduce the nterference between co-runners to provde unbased prcng of jobs that share nodes. POPPA s abe to quantfy nter-appcaton nterference wthn 4% mean absoute error on a varety of co-ocated benchmark and rea scentfc workoads. Keywords Onne Prcng, Supercomputer Accountng, Resource Sharng, Chp Mutprocessor, Contenton 1. INTRODUCTION Supercomputers typcay have hundreds to thousands of users and consst of tens to thousands of ndvdua servers connected over a hgh-speed optca nterconnect. At any one tme, many users concurrenty utze the system. The current approach has been to gve each user a non-overappng set of compute nodes on whch to run hs or her appcaton. Whe ths approach prevents jobs from dfferent users from cobberng one another, t eads to a mssed performance opportunty. In fact, recent work has shown that co-ocaton, where a set of jobs from dfferent users runs on a shared set of compute nodes, can ncrease mean appcaton perfor- Permsson to make dgta or hard copes of a or part of ths work for persona or cassroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commerca advantage and that copes bear ths notce and the fu ctaton on the frst page. Copyrghts for components of ths work owned by others than the author(s) must be honored. Abstractng wth credt s permtted. To copy otherwse, or repubsh, to post on servers or to redstrbute to sts, requres pror specfc permsson and/or a fee. Request permssons from Permssons@acm.org. SC13 November 17-21, 213, Denver, CO, USA Copyrght 213 ACM /13/11 Scaed to Runnng Aone amg gems ammps Co-runners mc Prce SOP Prce Far Prce Far Mean namd taubench Fgure 1: Performance of GTC, a pasma physcs code, when co-ocated wth the appcatons on the x-axs. The current prcng mechansm penazes the user for co-ocatng ther job by chargng them more when ther job degrades more. mance and system energy effcency by 2% by reducng contenton for shared resources n the memory subsystem and nter-node network [38, 33, 2]. In addton, current archtectura trends and exascae computng studes suggest that the beneft of co-ocaton s key to ncrease. The studes project that compute nodes w have hundreds to thousands of cores [16]. For some appcatons, t may not be possbe to use a of these cores effcenty. In partcuar, 8% of a XSEDE jobs use ess than 512 cores [11, 45], whch means co-ocaton w key be necessary to utze a of a node s cores. Co-ocaton seems nevtabe for arger jobs as we. Projected scang trends suggest an ncrease n the number of cores per node that outpaces ncreases n memory bandwdth and cache capacty, whch w reduce the resources avaabe per core [16]. To mtgate contenton, resource-hungry jobs w have to be spread out over more compute nodes and pared wth resource-ght jobs to mantan hgh system utzaton [2]. Athough co-ocaton s benefca to performance and energy effcency, t aso creates a new set of chaenges, one of whch s far prcng. Far prcng s a concern because athough there s a net beneft from co-ocaton, some parngs can cause one of the appcatons to sow down. When ths happens, we argue that the user shoud be dscounted. However, f we appy the current state-of-practce (SOP) n HPC nfrastructures, where users are bed proportonay to the

2 tme to execute ther job, we fnd there s gross nequty users whose jobs beneft from co-ocaton pay comparatvey ess whe users whose jobs do not beneft pay more. Fgure 1 ustrates the chaenge. Under the current stateof-practce, a user runnng GTC[41], a pasma physcs code, pays 6% more when co-ocated wth LAMMPS[3], a moecuar dynamcs code, versus AMG[15, 1], a parae agebrac mutgrd sover. To remedy ths probem, we suggest dscountng a user based on the nterference caused by the other co-runnng appcatons. The greater the nterference, the greater the dscount. The green bars show one such scheme. Because co-ocaton ncreases machne throughput per unt tme, these dscounts can be vewed as passng the effcency savngs from co-ocaton back to the end user when ther expectaton of servce s voated. Athough the concept of progressve dscounts s smpe, the reazaton of such a pocy on rea systems poses a number of practca chaenges. In partcuar, a far prcng mode of ths nature requres precsey quantfyng the nterference due to shared resource contenton. Whe there has been sgnfcant research nto predctng cross-core nterference, many of the technques make heavy use of statc profng or have been taored to specfc machnes or appcatons [42, 25, 26]. Even though ths work has yeded consderabe nsght nto the probem of shared resource contenton, we argue that n practce, t s not practca for precse prcng on a rea HPC custer. In ths doman, statc profng and machne- or appcaton-specfc approaches are not sutabe as jobs may run very shorty after submsson and ther characterzatons may not be known a pror. Athough appcaton profng may enrch the souton space, we note that aterng even a snge nput parameter for an appcaton can vasty change ts characterstcs. For exampe, doubng a snge array dmenson can often radcay transform an appcaton s senstvty to and aggressveness on the memory subsystem. Thus, an nstantaneous and dynamc mechansm s needed to contnuousy montor and quantfy the nterference jobs suffer to drve precse prcng. In addton to beng dynamc and precse, the fundamenta prcng mechansm must aso be ghtweght. The underyng prcng agent has to be mosty nvsbe to the appcaton and therefore must have a neggbe overhead, beow the system nose threshod. These objectves ead us to the two key nsghts of the work ony a software system that uses emprca, onne tests s sutabe for ths probem doman, and such an approach must be agnostc to the underyng software and hardware. In ths paper, we present such a souton: the Persstent Onne Precse Prcng Agent (POPPA). POPPA s a ghtweght runtme system that utzes a cycc, fne-gran, nterference sampng mechansm to accuratey deduce the nterference between co-runners. The key desgn feature of POPPA s a dynamc contenton detecton technque we ca shutterng. For bref perods of executon, POPPA pauses a appcatons but one and measures how the seected appcaton s performance changes versus runnng co-ocated. From the dsparty between the appcaton s rate of forward progress made whe runnng co-ocated versus shuttered, POPPA s abe to precsey determne the mpact of nterference resutng from co-ocaton and use these measurements to drve far prcng for a users jobs. The contrbutons of ths work are as foows: We ntroduce POPPA, a ghtweght, workoad and machne agnostc runtme system that enabes far prcng for HPC custers. POPPA functons entrey n software, requres no changes to the system stack n current HPC custers, and s ready depoyabe. We present the desgn of precse shutterng, a mechansm for the precse onne measurement of the performance mpact of cross-core nterference. Our precse shutterng approach functons dynamcay and requres no a pror knowedge or profng of the appcatons. We present a new prcng mode for HPC custers based on POPPA to provde far prcng to users. We provde a thorough evauaton of POPPA s effcacy and robustness as the centra accountng mechansm on HPC custers wth a mx of MPI benchmarks and rea workoads. POPPA predcts co-ocated appcaton run tme wth 4% mean absoute error and ncurs ess than 1% overhead. Usng POPPA, we are abe to dscount the average user by 7.4% and dever a prcng dstrbuton that cosey resembes that of an omnscent orace. 2. BACKGROUND AND MOTIVATION In order to better understand why far prcng s of such mportance, we must frst expore the current state-of-practce n accountng on supercomputers. We start by examnng the accountng and aocaton mode found n the Unted States Department of Energy Offce of Scence INCITE program [12] and the Natona Scence Foundaton XSEDE program [11], two of the argest U.S. programs that provde resources to the genera HPC research communty. Each of these programs factates access to a number of arge scae computng nfrastructures. To successfuy obtan an aocaton, researchers submt grant proposas and, after revews, are awarded tme on those systems as a fnte number of servce unts (SUs). When a user runs a job on a system, they depete ther bank of SUs at a rate proportona to the ength of ther programs executon and the number of compute nodes that they request. In ths mode, users need strong guarantees that the vaue of an SU w not be negatvey affected by other users jobs runnng on the same computng resources. Smary, supercomputer admnstrators care about user satsfacton and are ncentvzed to provde users wth the best possbe experence because ndvdua supercomputng centers are awarded funds argey based on the success and popuarty of ther factes. Consequenty, we observe that throughout a eves of the fundng adder, far prcng and accountng are cruca concerns. Regardess of what mechansms are mpemented to mprove supercomputer performance, energy effcency or faut toerance, they must not pervert the farness of the prcng scheme. 2.1 MPI Programmng Mode Most arge scae scentfc appcatons utze the Message Passng Interface (MPI) as the core abstracton to factate workoad dstrbuton across a custer. Two man characterstcs of MPI programs are as foows: 1) Snge Program Mutpe Data (SPMD): MPI processes execute the same statc program bnary and use unque dentfers caed ranks to dctate communcaton patterns as we as whch bocks of code get executed by dfferent processes. Whe ths aows for a arge amount of potenta

3 dversty between processes, n practce most MPI programs are Snge Program Mutpe Data (SPMD): a processes execute the same core agorthm on dfferent data. Thus wthn an MPI program, a the processes have hgh smarty, e.g., they a compete for the same resources. 2) Tghty couped communcaton synchronzaton: The vast majorty of MPI programs exhbt tghty couped communcaton synchronzaton. Because of ths tght synchronzaton, processes must execute n reatve ock-step. If a process reaches an expct or mpct barrer before the other necessary partes, t must wat unt a others make smar progress before proceedng. 2.2 Co-ocaton of MPI programs When we reason about the nature of MPI programs, t qucky becomes evdent that executng a snge MPI program across a prvate set of compute nodes s an neffcent use of system resources. The homogenety between MPI processes and the fact that they are tghty couped mean that many processes w execute the same program regons wth hgh concurrency. When ths happens, there s hgh rsk for resource contenton and performance degradaton homogeneous processes have hgh propensty to evct one another s data n the shared ast eve cache (LLC), contend for the memory controer, saturate off-chp bandwdth to man memory, and cause a backog of messages for nternode communcaton. Prevous research shows that homogeneous MPI processes can degrade one another s performance by more than 2x [2, 38]. In addton, these works show that ntroducng heterogenety n workoads by co-ocatng mutpe MPI programs on dsjont cores can drastcay mprove performance and energy effcency. In fact, both studes fnd that aggregate throughput ncreases by 12 to 23% on average over the current state of practce, and [2] shows that system energy effcency ncreases by 11 to 22%. In concuson, gven the hgh cost of arge supercomputers and the great performance and effcency beneft of coocaton, t s essenta that we provde far prcng mechansms to make co-ocaton practca. 3. POPPA OVERVIEW In ths secton, we present the overvew of the Persstent Onne Precse Prcng Agent (POPPA) framework. Our prmary desgn objectve for POPPA s to provde accurate performance nterference estmates for parae appcatons wth neggbe overhead. As shown n Fgure 2, POPPA conssts of a man montorng agent caed the Controer and a seres of Executon Managers. Executon Manager: Each Executon Manager s responsbe for aunchng and overseeng the entre executon of a parae appcaton on a gven machne. The Executon Managers read from the centra job queue and seect the next job to run accordng to the job prorty and ts resource needs. An Executon Manager aunches the seected job and attaches a performance montorng context (PMC) to the job. The PMC montors the job performance by readng and evauatng approprate hardware performance counters. Durng executon, the Executon Manager updates and reports the current status and performance data of the job to the Controer. Controer: The Controer s the man component of POPPA. Its prncpe responsbty s to conduct shutter- Job Queue Pop Job Persstent Onne Precse Prcng Agent Prcer Controer Executon Managers PMC Send Performance Data Spawn, Manage, K Start Appcaton Send Prce Log Read Account Manager Fgure 2: Interacton between POPPA components and other enttes ng, a mechansm to measure and quantfy the performance nterference among the co-runnng appcatons. In essence, the Controer perodcay pauses each appcaton but one for a very short perod and montors the performance mpact on the one runnng appcaton. To measure ths mpact, the Controer probes the PMCs of each actve job to acqure the performance data and ogs t. We present more detas of the shutterng mechansm ncudng our agorthms and poces n Secton 5 and evauate ts accuracy and overhead n Secton 8. Fgure 2 presents how POPPA can be used for prcng. After executon of a job has competed, the Prcer thread anayzes the raw performance data ogged by the Controer and quantfes the performance nterference and degradaton. More detas of the anayss and prcng are presented n Sectons 4 and 6. Based on the quantfcaton, the Prcer produces the prce to be charged and propagates t to the Account Manager, whch then deducts the prce from the user s bank of SUs. 4. PRICING MODEL In ths secton, we dscuss the key ssues reated to prcng and accountng on current supercomputers and extend those notons to a supercomputer wth job co-ocatons. 4.1 Prcng Wthout Co-ocaton For purposes of ths dscusson, assume that a user wants to run a job on a supercomputer and that P denotes the prce that the user s charged for runnng. In present day systems, P s gven by Equaton 1, where L s a rate constant n terms of servce unts per core per tme quanta, C s the number of cores that a job uses n whoe compute node ncrements, and T s the run tme of the program. P = L C T (1) From ths equaton, we can see that the prce varabe P s neary proportona to both the cores varabe C and the tme varabe T. 4.2 Prcng Wth Co-ocaton In ths secton, we propose how one coud modfy the exstng prcng mode to more fary prce appcatons when co-ocatons are present. In partcuar, f we have a job that s co-ocated wth a set of jobs J, we want a formua

4 that w produce a reasonabe prce P co(j), whch takes nto account the net nterference from a appcatons n J. To ths end, we repace L wth a rate functon F, yedng Equaton 2, where F : R R R. T soo s the run tme when the job gets a compute nodes to tsef and T co(j) s the run tme of the job when s co-ocated wth the set J of other jobs. Executon Tme KEY Pared Executon Pared + IPC Measure P co(j) = F (T soo, T co(j) ) C T soo (2) Ideay, F s monotoncay non-ncreasng so that the more degradaton an appcaton suffers from co-ocaton, the more the user s dscounted. For the purposes of ths paper, we assume utty s proportona to 1 mnus the ratona degradaton. Therefore f we equate utty to farness, then we seect F such that users are dscounted at a rate proportona to the degradaton that each of ther jobs experences due to contenton from co-runners. degradaton, then we want P co(j) Consequenty we defne F as foows: F (T soo, T co(j) ) = L T soo T co(j) Thus f D co(j) = (1 D co(j) s the ) P soo. = L (1 D co(j) ) (3) By substtutng Equaton 3 nto Equaton 2 we see that we acheve the specfc prcng mode shown n Equaton 4. P co(j) = L T soo C T co(j) T soo (4) Whe Equaton 4 s good for the user, we acknowedge that t s an deastc mode. Its smpcty makes t easy for end users to understand; however, we note other factors such as resource manager queue wat tmes, job prorty, workoad composton, the rato of each shared resource a job consumes, machne archtecture, and schedung pocy,.e. capabty versus capacty are aso mportant factors when determnng a far prce. Thus supercomputng factes w have to decde what F makes sense for each of ther systems. 5. PRECISE SHUTTER MECHANISM As prevousy mentoned, POPPA s chef desgn objectve s to produce far prces wth hgh precson, ow overhead, and wthout the need for a pror knowedge. To acheve these goas we have desgned precse shutterng, an onne co-runner nterference maskng approach. Essentay, the precse shutterng mechansm functons by aternatng an appcaton s executon envronment between one where corunners are executng and another where they are effectvey absent. Fgure 3 shows shutterng n acton on two appcatons A and B that are co-ocated. The shutterng agorthm aternates between executon regons where A and B co-execute, A executes whe B seeps, A and B co-execute, and B executes whe A seeps. We repeat ths pattern throughout the executon of the programs. To gan nsght from shutterng, we must measure the performance of each appcaton before, durng, and after shutter regons. Durng each shutter of duraton S, we everage hardware performance montors va bpfm4 [7, 27] to measure the nstructons per cyce of the soe non-seepng appcaton. To nfer the degradaton due to co-runners, we aso measure the nstructons per cyce (IPC) of a actve appcatons S mcroseconds before the shutter and S mcroseconds drecty after t. Program A Program B Soo + IPC Measure Shutter Perod Fgure 3: Shown here s shutterng n acton on two separate jobs. Durng a shutter, one job executes whe a others seep. Snce we are prmary concerned by how performance changes wth the presence or absence of contenton, we ony need to montor the performance durng sma wndows around shutters. We aso perform each shutter nfrequenty to mnmze the perturbaton of appcaton executon and parameterze the rate of shutter sampes to contro POPPA s overhead. As we show n ths work, frequent shutters are not requred to produce an accurate predctve mode. Agorthm 1 Measure(, S, K) 1: Intaze array perfvaue of ength A[] 2: for k = to K 1 do 3: for each thread t that s part of A[] do 4: perfvaue[t] = ReadCounters(t) 5: end for 6: Seep for S µs 7: for each thread t that s part of A[] do 8: perfdct[t].append(readcounters(t)-perfvaue[t]) 9: end for 1: end for Agorthm 2 Shutter Core(j, S, K) 1: for = to A 1, where j do 2: for each thread t that s part of A[] do 3: Pause t 4: end for 5: end for 6: Measure(j, S, K) 7: for = to A 1, where j do 8: for each thread t that s part of A[] do 9: Resume t 1: perfdct[t].append(thread ASLEEP) 11: end for 12: end for 5.1 Agorthms In ths secton, we present the ogc of the shutter mechansm, whose core parts are shown n Agorthms 1, 2 and 3. Beow we defne a st of common data structures and constants used by the agorthms: A, an array of co-ocated appcatons perfdct, a ookup tabe that stores the measured IPC vaues of each appcaton

5 Agorthm 3 POPPA Core 1: j = 2: whe true do 3: for = to A 1 n parae do 4: Measure(, S, K) 5: end for 6: Shutter Core(j, S, K) 7: for = to A 1 n parae do 8: Measure(, S, K) 9: end for 1: j = (j + 1) mod A 11: Seep P µs 12: end whe K, the number of IPC measurements to make n a row n a specfc regon 1 S, the ength of the each measurement n µs P, the ength of tme between groups of measurements,.e. the norma executon perod, n µs S, the ength of a shutter, approxmatey K S The core routne s Agorthm 3. At each teraton, we frst measure the IPC of each appcaton whe co-ocated (nes 3-5). We then shutter appcaton j by cang Shutter Core (ne 6), whch subsequenty cas Measure to measure the IPC whe j s runnng aone. After that, we measure the IPC of a appcatons and ncrement j (nes 7-1). Then the shutter component of POPPA goes to seep for P µs of norma executon (ne 11). Snce POPPA s persstent, ths process repeats contnuay as appcatons end and new appcatons enter the appcaton poo. 5.2 Tunng the Shutter Mechansm The shutter mpementaton presents a number of chaenges. In partcuar, seectng the correct granuarty to shutter at s key to accuratey quantfyng nterference wthout notceaby addng to t. The frst parameter s the gap between shutters P. As P s decreased, the amount of tme that POPPA s actve ncreases, consequenty aso ncreasng overhead. Snce utzaton n supercomputers s often above 95%, we assume that each core has an appcaton thread assgned to t. Due to ths fact, POPPA must tme sce wth appcaton threads. If POPPA s actve for x% of a snge core s executon tme, then assumng POPPA threads do not mgrate, one of the co-runnng appcatons s key to suffer at east an x% ht to performance due to synchronzaton between processes. Snce the POPPA runtme nevtaby has overhead, we expermented wth conductng round-robn mgraton of the POPPA threads to dstrbute the performance mpact of tme scng across a appcaton threads; however, we determned that a better souton was to seect vaues for K, P and S that make POPPA s CPU utzaton very ow, as mgraton s not guaranteed to be fne-gran enough to mtgate the effect of tme scng. Another mportant parameter s S the duraton of a shutter. In our mpementaton, ths quantty s equa to the base cost of dong a shutter on 8 MPI processes, approxmatey 12 to 2µs (see Fgure 4 n Secton 8.1), pus K S, where K S s the product of the number of consecutve measurements and the ength of each such measurement. Durng 1 We fx K = 1 for experments and anayses n Secton 8. a shutter, the paused appcaton makes no progress, thus keepng shutter duraton very short reatve to P s a prmary concern. An unexpected fnd reatng to the shutter mechansm s that n certan cases, POPPA actuay sghty mproves the performance of co-ocated appcatons. Durng shutters, appcatons that seep sacrfce a sma amount of forward progress and the one runner receves a performance boost from reduced contenton. When the net performance boost from runnng n soaton offsets the net performance oss from seepng, appcatons speed up reatve to the basene co-schedue performance. For pars of two appcatons, speedup occurs when a co-schedue ncreases one appcaton s run tme by more than 2x reatve to runnng wth haf the cores de per socket. Ths phenomenon s demonstrated emprcay n Secton ESTIMATING DEGRADATION In ths secton, we present our method for nkng the raw data that POPPA produces to the actua prces we charge. 6.1 Ideazed Mode for Degradaton Our prcng mode assumes that for an appcaton, we know the degradaton D co(j) that suffers as a resut of coocaton wth a set J of appcatons. In our prcng mode dscusson, we formuated 1 D co(j) as T soo. Whe ths T co(j) gves us a precse way to cacuate degradaton, POPPA cannot drecty measure T soo. Thus, we modfy the formuaton such that t s amenabe to the IPC data that POPPA produces. On modern chp mutprocessors, f we are gven an executon tme n seconds, we can convert ths to a vaue n cock cyces. Thus f we know the cock tcks per second, we can wrte the performance of normazed to runnng aone as the rato of cock cyces C soo P erf norm = 1 D co(j) and C co(j) = Csoo C co(j) (see beow). Addtonay, f we assume to be a truy sera program, then t s the case that s dynamc nstructons I do not change. Thus I soo = I co(j), and consequenty we can transform Equaton 5 nto a rato of IPCs by mutpyng by I co(j), yedng the foowng: I soo P erf norm = IP Cco(J) IP C soo 6.2 Known Chaenges wth Parae Programs For parae programs, however, t turns out that Equaton 6 s often mprecse. Many parae programs contan mutexes, semaphores, and other ockng mechansms to enforce program correctness by preventng data races. When a oad mbaance occurs, that s, one parae process advances faster than ts sbngs, these ockng mechansms can dstort both dynamc nstructon count and CPU cock cyces. Wth MPI, ths ssue s qute prevaent. If a communcaton routne s mpemented as bockng, then t s common practce to have the thread that ntated the routne to po for a certan number of cyces and then seep. Durng ths pong perod, the thread executes a whe oop where t contnuay tests whether the communcaton operaton has (5) (6)

6 competed. If the thread fas to fnsh the communcaton operaton wthn a certan nterva, t s put to seep and sgnaed to wake up when the operaton has competed. Because contenton and background nose on the system can cause ths pong perod to change n duraton, the number of dynamc nstructons attrbuted to these communcaton regons s varabe. Wth MVAPICH2, the MPI-2 mpementaton, the maxmum pong perod can be adjusted [52]. Whe we were tempted to dsabe pong, we knew that dong so woud be dsadvantageous. In partcuar, pong greaty ncreases ndvdua appcaton performance because the bockng thread avods the performance ht assocated wth gong to seep and wakng back up, as t can proceed as soon as communcaton has fnshed. Thus, we decded to keep the parameters that maxmzed performance even though t made precse predcton more chaengng. 6.3 Fterng Even though Equaton 6 s mprecse n the presence of varabe executon, we fnd that n practce, t s st suffcent for producng reasonabe degradaton estmates. We aso assume that the average over the N IPC sampes that we coect s roughy equvaent to the actua average IPC durng shutters (IP C soo ) and durng norma pared executon (IP C co ). These assumptons are presented beow n Equatons 7 and 8. IP C soo P erf norm N soo j= IP C,j soo N soo IP Cco(J) IP C soo and IP C co (7) N co j= IP Cco,j N co (8) POPPA gves us data n the form of a stream of bocks of IPC measurements, each consstng of K IPC measurements just before a shutter, K measurements durng a shutter, and K afterward. We denote ths stream of bocks as B and the th such bock as B ; wthn each bock B, the K IPC vaues n B before the shutter are denoted as IP C before, the K IPC vaues durng a shutter as IP C durng, and the K IPC vaues after a shutter as IP C after (IP C before. Thus B =, IP C durng, IP C after ). We denote the arth-, IP C durng metc means of each of these vaues as IP C before and IP C after. Usng ths notaton, we present the fterng agorthm (Agorthm 4) that aows us to ncrease the precson of the performance estmate. Agorthm 4 Ftered Predcton(IPC Tupes B) 1: Intaze IP C co and IP C soo to 2: for each (IP C before, IP C durng, IP C after ) n B do 3: f IP C before IP C after < δ and IP C before IP C durng and IP C after 4: IP C co = +.5(IP C before 5: IP C soo = + IP C durng 6: end f 7: end for 8: Return ( IP Csoo IP C co ) IP C soo < IP C durng then + IP C after ) < Agorthm 4 ams to reduce nose from sampng IPC. It removes groups of IPC vaues where the IPC durng a shutter s not greater than the IPC drecty before and after. Snce a shutter can ony reeve shared resource contenton, the IPC durng a shutter shoud aways exceed the IPC before and after a shutter f a measurements occur durng the same computatona phase. The second mechansm, whch states that the absoute dfference n IPC before and after cannot exceed δ works to ensure that custers that cross phase boundares are removed. We emprcay determned δ =.5 to be a reasonabe vaue. 7. EXPERIMENTAL SETUP Ths secton descrbes our methodoogy. We ran our experments on the Gordon Supercomputer [32, 49]. Each node s dua-socket. For each socket, there s an 8-core Inte EM64T Xeon E5 (Sandy Brdge) processor. Smutaneous mutthreadng s dsabed [61]. The CPU frequency s 2.6Ghz, and each core has prvate 32KB nstructon and data L1 caches, a prvate 256KB L2 cache, and each socket has 2MB of L3. There are 64GB of DRAM. Compute nodes run CentOS nux wth kerne verson The nterconnect s QDR InfnBand wth 8GB/s of bdrectona bandwdth, and the topoogy s a 3D torus of swtches [1, 57]. Our appcatons and benchmarks are shown n the tabe that foows. These benchmarks and appcatons encompass a wde varety of scentfc domans such as subatomc partce physcs [5], pasma physcs [41], moecuar dynamcs [3], ocean modeng [2], computatona fud dynamcs [6, 8], shock hydrodynamcs [36], fnte eement methods [4] aong wth varous other numerca methods that are of hgh nterest to the HPC communty. We aso note that GTC and MILC, n partcuar, use a substanta number of dedcated aocaton hours on many eadershp cass machnes. Benchmarks, Mnapps and Appcatons Swm [9], ADVECT3D [51], pcubed [39] NAS Parae Benchmarks: CG, FT, LU, MG [14, 47] Luesh [36], MnGhost [4], MnFE [4], NekBone [6, 8] GTC [41], LAMMPS [3], MILC [5], POP [2] We compe GTC, LAMMPS, MILC, POP, CG, FT, LU and MG wth GNU compers verson 4.7 and MVAPICH2 verson 1.7. LULESH, MnGhost, MnFE, and NekBone are comped wth PGI compers verson 11.9 and OpenMPI verson 1.6. In our experments, we co-ocate two 8 process MPI appcatons together on the same set of sockets. Each socket has haf ts cores run one appcaton and the other haf run the other. Appcatons co-run together for a mnmum of 5 teratons of both appcatons. As soon as one appcaton ends, we mmedatey restart t. Data coecton stops once both appcatons have competed 5 teratons. For the shutter mechansm, we fx K = 1 and P = 2ms. 8. EVALUATION In ths secton, we evauate the accuracy, overhead, and the prcng farness of POPPA. 8.1 Quantfyng POPPA s Base Overhead In ths secton, we quantfy the mnmum tme to execute components wthn the man oop of the POPPA daemon. The man oop conssts of the three core operatons of Agorthm 3 measurng the IPC of the appcaton just pror to the shutter, ssung the shutter and measurng the IPC of

7 Tme (us) Man Loop Resdua Read Counters Pre/Post Shutter Shutter Resdua Read Counters Shutter Program Executon Tme (s) CG-NULL FT-NULL LU-NULL MG-NULL CG-FIT FT-FIT LU-FIT MG-FIT Number of Threads per Job Fgure 4: Breakdown of base overhead to execute a snge teraton of POPPA s core agorthm, where readng PMC vaues domnates tota tme the appcaton durng that wndow, and measurng the IPC of the appcaton mmedatey foowng the shutter. For these experments, we co-ocate two MPI benchmarks, an auto-generated oop from the pcubed benchmark sute and a busy oop, caed the NULL co-runner, that runs for the duraton of the pcubed oop. In POPPA, we set a of the seep parameters to, so we can measure the mnmum executon tme for a subcomponents of the oop. Durng each teraton of the man oop, we measure ts tota executon tme, the tme to measure the IPC both before and after the shutter, the tota executon tme of the shutter, the tme to send the SIGSTOP and SIGCONT sgnas, and the tme to make the IPC measurements durng the shutter. Fgure 4 presents the resuts. On the x-axs we vary the number of threads n each job. So 4 corresponds to four pcubed tasks bound to cores, 2, 4, and 6 and four busy oop tasks bound to cores 1, 3, 5, and 7. The y-axs shows the tota tme n µs to execute the man oop. When studyng ths fgure, severa nterestng trends emerge. Not surprsngy, addng more threads ncreases the mnmum oop executon tme. Executon tme s domnated by IPC measurement n the form of cas to bpfm, partcuary those outsde the shutter regon. In fact, we spend about 4x as much tme measurng the IPC outsde of shutter regons compared to wthn them. Ths dfference n overhead resuts from 1) we ony measure actve threads wthn a shutter, whch s an optmzaton decson that we made, so the overhead to read the performance counters doubes outsde of a shutter, and 2) we make two sets of IPC measurements outsde of a shutter (before and after) versus a snge set of measurements durng one. We see that the mean tme to shutter does not exceed 13µs and the mean tme to execute the man oop does not exceed 5µs. Thus, our mechansm s fne graned enough to measure the IPC at sub-msecond ntervas for thread counts that are representatve of contemporary mut-socket systems. In addton to the mnmum deays ncurred by shutterng, we quantfy the effect of enargng the amount of tme spent n a shutter. For ths experment, we fx the seep tme at the end of the man oop, P (see Secton 5.1), to 2,µs and ncrease the shutter duraton, S (see Secton 5.1), mutpcatvey by factors of 2 from 2µs to 49,6µs. We separatey co-run each of the NAS Parae Benchmarks (NPB) wth the busy oop NULL. Snce NULL generates no nterference, any daton n run tme s a drect resut of ncreasng Shutter Duraton (us) Fgure 5: The reatve overhead of expandng the duraton of a shutter, where ponts correspond to measurements and nes correspond to nstantatons of the mode the shutter wndow. Fgure 5 presents the resuts. A four benchmarks exhbt a smar trend. When S s sma reatve to P, the overhead s sma, but as the rato S : P ncreases, so does the overhead. However, the overhead begns to fatten out as S approaches and exceeds the vaue of P. We need to formuate an anaytca mode for the overhead that a prcng shutter creates for an arbtrary co-ocated poo of n jobs. To do so, we examne the overhead from n consecutve shutters. Over the course of n shutters, each job w run n soaton once and seep n 1 tmes whe a snge other job enjoys the prvege. Each such shutter has duraton S. Thus each job w seep for (n 1) S seconds. The tota tme for n teratons of the man oop of the daemon s aso mportant for the anayss. Measurng the IPC before, durng and after a shutter s 3S, as each takes S tme. After ths, the daemon seeps P seconds. Ths pattern s cycc, so the combned tme s n (3S + P ). Equaton 9 shows rato of seep tme to tota tme. seep tme Z(S, P ) = tota tme = (n 1) S (9) n (3S + P ) The mode for the executon tme of the jobs n Fgure 5 s shown beow: 1 n (3S + P ) T (S, P ) = T = T (1) 1 Z(S, P ) 2nS + np + S Here T s the run tme of appcaton when co-ocated wth the NULL co-runner. When we examne the mode ft to the data n Fgure 5, we observe that CG-FIT, FT-FIT, LU-FIT, MG-FIT amost exacty predct the actua overhead of the shutter for a S n {1 2 k µs 1 <= k <= 12} and a fxed P of 2ms. Ths mode ncorporates S, P, and T; f we know any two of these quanttes, we can sove for the thrd. Thus admnstrators can decde on a system by system bass what s exacty an acceptabe amount of degradaton due to the prcng shutter and choose vaues of S and P accordngy. 8.2 Determnng the Sampng Rate In ths secton, we evauate the precson and overhead of the POPPA daemon for dfferent shutter engths (S vaues) whe keepng P fxed to 2ms. We saw n the prevous secton, that the overhead due to the shutterng mechansm has an anaytca upper bound gven by Equaton 1. Usng ths equaton, we seected vaues of S wth ess than 5% overhead: 2, 4, 8, 16, 32, 64, 128, and 256µs.

8 Percent Degradaton Percent Degradaton Measured Deg. Measured Deg. Predcted Deg. Predcted Deg. Co-ocaton Ony Co-ocaton and POPPA No Fterng Fterng Percent Degradaton Percent Degradaton Duraton of Each IPC Measurement (us) Duraton of Each IPC Measurement (us) Duraton of Each IPC Measurement (us) Duraton of Each IPC Measurement (us) (a) CG (b) FT (c) LU (d) MG Fgure 6: Effect of shutter duraton on accuracy and overhead for each NPB co-run wth ADVECT3D-256 Percent Degradaton Percent Degradaton Measured Deg. Measured Deg. Predcted Deg. Predcted Deg. Co-ocaton Ony Co-ocaton and POPPA No Fterng Fterng Percent Degradaton Percent Degradaton Duraton of Each IPC Measurement (us) Duraton of Each IPC Measurement (us) Duraton of Each IPC Measurement (us) Duraton of Each IPC Measurement (us) (a) CG (b) FT (c) LU (d) MG Fgure 7: Effect of shutter duraton on accuracy and overhead for each NPB co-run wth Swm-15 We ran two sets of parwse experments. In the frst, we co-ocated the NPBs wth a contentous co-runner (AD- VECT3D wth a grd sze of ), and n the other we co-schedued the NPBs wth a moderatey contentous corunner (Swm wth a grd dmenson of 15 3 ). Fgures 6a, 6b, 6c, and 6d show the performance predcton accuracy of the POPPA daemon for CG, FT, LU, and MG when they are co-ocated wth ADVECT3D. Both the accuraces of the unftered and ftered predctors are shown. For carty, we opt not to present the resuts for 4, 16 and 64µs. In ths set of experments, we are abe to very accuratey predct the contenton wth neggbe overhead. Fterng mproves predcton performance. Our predctors have the argest error for FT. S = 2µs gves the hghest accuracy, but as S ncreases, so does the error. Ths error resuts from FT s very fne gran phases, whch coarser granuarty shutters have troube capturng. Fgures 7a, 7b, 7c, and 7d show the predcton accuracy for the NPBs pared wth Swm. Agan, our predcton accuracy s very precse. In ths case, we note that the ftered predcton s sometmes overy zeaous when predctng contenton. However, ths resut s unsurprsng gven that fterng removes custers of IPC measurements where the IPC measured durng a shutter does not exceed the IPC drecty before and after. A contrastng fndng between the experments wth AD- VECT3D and Swm concerns daemon overhead as a functon of S. In the experments wth ADVECT3D, overhead s fat regardess of S whereas t sharpy ncreases wth Swm. Ths dvergence s caused by the fact that ADVECT3D s confgured to be contentous whereas Swm s not. Durng a shutter, the one runnng appcaton receves a respte from the contenton generated by the other appcaton. In the case of the NPBs wth ADVECT3D, ths causes each NPB to speed up by approxmatey 2x, whch offsets the ost throughput from seepng durng aternate shutters. By contrast, Swm degrades each NPB by at most 15%, so the tme spent seepng cannot be masked. Overhead as a Percentage of Executon Tme Duraton of Each IPC Measurement (us) Fgure 8: Overhead of POPPA on NAS benchmarks co-ocated wth ADVECT-256 These experments show that the shutter duraton S s argey rreevant for accuracy. Thus when seectng S, t makes sense to seect a vaue that nduces mnma overhead and run tme varaton. Fgures 8 and 9 present both the daemon s overhead and ts dstrbuton for the surveyed vaues of S. In Fgure 8, regardess of the vaue of S, overhead due to the prcng shutter never exceeds 2%. However, n Fgure 9, ths vaue exceeds 4%, whch s ceary too costy. S = 32µs devers an overhead of ess than 1% and wth the smaest varaton. For ths reason, we use S = 32µs for the remander of our experments. 8.3 Parwse Evauaton In ths secton, we evauate the precson of POPPA on parwse co-ocatons. Snce our ftered predcton was better n aggregate n our prevous experments, we appy that predcton mechansm rather than the smpe one. We run co-schedues of a possbe combnatons of our 12 benchmarks and rea appcatons. Fgure 1 shows the accuracy of our ftered predctor at quantfyng degradaton. The x-axs sts the names of the benchmarks, and the y-axs sts the co-runners. Indvd-

9 mean mean mg mg u ft u ft cg nekproxy cg nekproxy mnghost mnghost mnfe uesh mnfe uesh pop mc pop mc ammps ammps gtc gtc gtc ammps mc pop uesh mnfe mnghost nekproxy cg ft u mg mean gtc ammps mc pop uesh mnfe mnghost nekproxy cg ft u mg mean Fgure 1: Run tme predcton accuracy (%) for jobs on the x-axs co-ocated wth jobs on the y-axs Overhead as a Percentage of Executon Tme Duraton of Each IPC Measurement (us) Fgure 9: Overhead of POPPA on NAS benchmarks co-ocated wth Swm-15 ua ces present the percentage dfference n predcted run tme versus actua, where negatve vaues represent underpredcton and postve vaues represent overpredcton. The top row mean presents the mean absoute error across the apps, and the rght most coumn mean presents the mean absoute error that an appcaton creates n the predcton accuracy for the other codes. Fgure 11 presents the degradaton of each appcaton as a percentage of run tme reatve to runnng wth the NULL co-runner,.e haf the cores vacant on each socket. The top row presents the mean degradaton of each scentfc code on the x-axs and the rght most coumn presents the mean degradaton each appcaton on the y-axs causes to ts corunners. If we study Fgures 1 and 11 n concert, a number of nterestng trends emerge. POPPA does we at quantfyng degradaton for a parngs consstng excusvey of our rea appcatons, GTC, LAMMPS, MILC, and POP. Our mean absoute error s 2.5% and absoute error never exceeds 5.8%. We accuratey characterze both ends of the spectrum. We predct hgh degradaton for MILC pared wth tsef and we nether sgnfcanty underpredct or overpredct for parngs wth ow mutua contenton such as GTC-LAMMPS and LAMMPS-POP. For parngs of rea apps wth benchmarks, Fgure 11: Performance degradaton (%) for jobs on the x-axs co-ocated wth jobs on the y-axs the predcton accuracy s generay qute good except for when MILC s co-ocated wth MnFE and FT. For our proxy apps LULESH, MnFE, MnGhost and NekProxy (NekBone), the resuts are more mxed. We are abe to predct ther performance wth a mean absoute error of 3.8%. MnFE s a partcuary nterestng because n each case we overpredct the degradaton for ts co-runner (mean of 7.5%). Ths overpredcton s an artfact of the fterng agorthm. When we use our unftered predctor, we overpredct by at most 1.5% for MnFE s co-runners. MnGhost, by contrast causes us to underpredct contenton for some of ts co-runners. On the NPBs, our predcton error s sghty hgher. If we excude FT, our mean absoute predcton error s wthn 5.3%. FT however, poses chaenges both for ts predcton and appcatons t s co-ocated wth. In both cases, we underpredct the actua degradaton. Ths underpredcton s due to the duraton S of the shutter. If we reexamne Fgure 6b, we observe that S = 2µs yeds the hghest accuracy when FT s co-ocated wth a contentous co-runner. We aso observe n Fgure 7b that out of the possbe vaues for S, S = 32µs prognostcates the owest contenton. On the whoe, our system s generous and tends towards modesty underpredctng contenton. Our mean absoute error across a parngs s 4.%. 8.4 Prcng Farness In ths secton, we show POPPA s prcng farness versus the state-of-practce and the orace. Fgure 12 shows the dstrbuton of reatve SUs charged for each appcaton usng the dfferent prcng schemes. On average, the state-ofpractce woud charge users 14% more as resut of co-ocatng ther jobs. Jobs that degrade more, pay more. POPPA on the other hand dscounts users by an average of 7.4%, whch s cose to the 11.5% dscount that the orace woud offer. When we examne the mnmum and maxmum reatve SUs charged, we aso see favorabe resuts for POPPA. The maxmum dscount gven by POPPA s 4.8%, whch s cose to the orace s 38.3%. The max normazed prce pad by a user usng POPPA s counse s 13.8% of the spread basene versus the orace s 99.8%. In the mnorty of cases where

10 Prce Normazed to Runnng Soo SOP POPPA Orace.6 cg ft gtc ammps u uesh mg Appcatons Fgure 12: The dstrbuton of prces a user woud pay for a gven appcaton when usng ether the state-of-practce (SOP), POPPA, or the maxmay far Orace mc mnfe mnghost nekproxy pop POPPA charges more than the spread basene (23/144), t s usuay smaer than run-to-run varaton, wth a mean surcharge of 1.3%. In addton, the mean prce pad for each appcaton never exceeds 99.2% of the basene, and thus over tme, a users w receve a dscount. Contrast ths wth the state-of-practce, where a user runnng MILC n the worst case can pay up to 62.1% more and on average woud expect to pay 24.9% more as a resut of cross-appcaton nterference. If we consder the mpact of POPPA s dscounts, we fnd they are entrey tenabe. Reca that the job strpng study [2] found that co-ocatng MPI benchmarks and fu-scae appcatons at scae ncreased mean system throughput by 12 to 23%. Thus dscountng users by a mean 7.4% does not nfate the purchasng power of SUs, and so SU aocaton need not be changed. 9. RELATED WORK There are a number of works that nvestgate prcng or dentfy prcng as a key ssue for arge scae grd and coud nfrastructures [13, 63, 48, 54]. Our work dffers from these works n that we address the prcng ssue n supercomputers wth co-ocatons. To the best of our knowedge, our work s the frst to expore ths probem space. Athough ths work addresses chaenges reated to far prcng, t shares smartes wth research that addresses dentfyng and mtgatng contenton n mutcore systems. Eary work on smutaneous mut-threadng processors nvestgated co-schedung of heterogeneous threads [55, 56, 21] as a way to ncrease throughput by reducng contenton. Cross core contenton has aso been extensvey studed [22, 65, 44, 43]. A mechansm smar to the prcng shutter s expored n [44] but dffers n that t s n the commerca data center space and n that t focuses on L3 mss rates wth and wthout the presence of contenton. Another souton to mtgatng contenton has been cache parttonng both n software and n hardware [46, 58, 53, 23]. Core fuson s an archtectura desgn that heps reduce the cross core contenton probem by dynamcay combnng smper cores nto arger cores [34, 59]. Others have examned usng schedung to mtgate contenton [64, 29, 28, 18, 17] and [5, 6] nvestgate schedung consderatons n mapreduce envronments. There are aso studes that evauate the effectveness of anaytca and statstca modes to sove probems reated to contenton [4, 62, 24, 31]. The computatona compexty, heurstcs and approxmaton agorthms for optma mutprocessor schedung are expored n [3, 19, 37, 35]. 1. CONCLUSION We have provded a mechansm to enabe far prcng on HPC systems, one of the fundamenta roadbocks to enabe node sharng on HPC systems. By empoyng POPPA, we can accuratey measure performance degradaton across a range of MPI appcatons. Usng ths data, we prce users n a fashon that approaches the optma farness provded by the orace, and our mean absoute predcton error s 4% across a combnatons of 12 appcaton codes. POPPA s not a defntve souton to the prcng probem but a key part of a more hostc souton. Gong forward, the deveopment of addtona, ght-weght technques for appcaton ntrospecton w become essenta. By harnessng ths dynamc nformaton, further optmzaton opportuntes w arse. Through combnng these soutons, the road to exascae supercomputers ooks brght. 11. ACKNOWLEDGEMENTS The authors thank the anonymous revewers for ther tme and feedback. In addton, they thank Professor Mke Norman of UCSD, Professor Leo Porter of Skdmore Coege, and Terr Qunn of LLNL. Some of the deas n ths paper were nspred by dscussons wth the ate Aan Snavey. Part of ths work was performed under the auspces of the U.S. Department of Energy by Lawrence Lvermore Natona Laboratory under Contract DE-AC52-7NA27344 (LLNL- CONF ). Ths work was supported n part by the DOE Offce of Scence, Advanced Scentfc Computng Research, under award number Beyond the Standard Mode Towards an Integrated Modeng Methodoogy for the Performance and Power ; PNNL ead nsttuton; Program Manager Sona Sachs. The authors acknowedge Natona Scence Foundaton support under CCF

11 12. REFERENCES [1] Asc sequoa benchmark codes. [2] Cesm1.: Parae ocean program (pop2). [3] Large-scae Atomc/Moecuar Massvey Parae Smuator. [4] Mantevo sute. [5] MIMD Lattce Computaton (MILC) Coaboraton. [6] Nek5 project. https: //nek5.mcs.an.gov/ndex.php/man_page. [7] perfmon 2: mprovng performance montorng on nux. [8] Proxy-apps for therma hydraucs. therma_hydraucs. [9] Spec cpu 2 benchmark sute. [1] Gordon user gude [11] Extreme Scence and Engneerng Dscovery Envronment [12] Innovatve and Nove Computatona Impact on Theory and Experment. doeeadershpcomputng.org/ncte-program/, 213. [13] M. Armbrust, A. Fox, R. Grffth, A. D. Joseph, R. Katz, A. Konwnsk, G. Lee, D. Patterson, A. Rabkn, I. Stoca, et a. A vew of coud computng. Communcatons of the ACM, 53(4), 21. [14] D. H. Baey, E. Barszcz, J. T. Barton, D. S. Brownng, R. L. Carter, L. Dagum, R. A. Fatooh, P. O. Frederckson, T. A. Lasnsk, R. S. Schreber, H. D. Smon, V. Venkatakrshnan, and S. K. Weeratunga. The nas parae benchmarks summary and premnary resuts. In Proceedngs of the 1991 ACM/IEEE conference on Supercomputng, Supercomputng 91, New York, NY, USA, ACM. [15] A. H. Baker, T. Gambn, M. Schuz, and U. M. Yang. Chaenges of scang agebrac mutgrd across modern mutcore archtectures. In Parae & Dstrbuted Processng Symposum (IPDPS), 211 IEEE Internatona. IEEE, 211. [16] K. Bergman, S. Borkar, D. Campbe, W. Carson, W. Day, M. Denneau, P. Franzon, W. Harrod, J. Her, S. Karp, S. Kecker, D. Ken, R. Lucas, M. Rchards, A. Scarpe, S. Scott, A. Snavey, T. Sterng, R. S. Wams, and K. Yeck. Exascae computng study: Technoogy chaenges n achevng exascae systems [17] S. Bagodurov and A. Fedorova. Towards the contenton aware schedung n hpc custer envronment. In Journa of Physcs: Conference Seres, voume 385. IOP Pubshng, 212. [18] S. Bagodurov, S. Zhuravev, and A. Fedorova. Contenton-aware schedung on mutcore systems. ACM Transactons on Computer Systems, 28, 21. [19] J. Bazewcz, J. K. Lenstra, and A. Kan. Schedung subject to resource constrants: cassfcaton and compexty. Dscrete Apped Mathematcs, 5(1), [2] A. D. Bresow, L. Porter, A. Twar, M. Laurenzano, L. Carrngton, D. M. Tusen, and A. E. Snavey. The case for coocaton of hpc workoads. Concurrency and Computaton: Practce and Experence, 213. [21] F. J. Cazora, P. M. Knjnenburg, R. Sakearou, E. Fernández, A. Ramrez, and M. Vaero. Predctabe performance n SMT processors. In 1st Conference on Computng Fronters, 24. [22] D. Chandra, F. Guo, S. Km, and Y. Sohn. Predctng nter-thread cache contenton on a chp mut-processor archtecture. In 11th Internatona Symposum on Hgh-Performance Computer Archtecture, 25. [23] J. Chang and G. S. Soh. Cooperatve cache parttonng for chp mutprocessors. In Proceedngs of the 21st annua nternatona conference on Supercomputng. ACM, 27. [24] T. Dwyer, A. Fedorova, S. Bagodurov, M. Roth, F. Gaud, and J. Pe. A practca method for estmatng performance degradaton on mutcore processors, and ts appcaton to hpc workoads. In Proceedngs of the Internatona Conference on Hgh Performance Computng, Networkng, Storage and Anayss. IEEE Computer Socety Press, 212. [25] D. Ekov, N. Nkoers, D. Back-Schaffer, and E. Hagersten. Cache pratng: Measurng the curse of the shared cache. In Parae Processng (ICPP), 211 Internatona Conference on. IEEE, 211. [26] D. Ekov, N. Nkoers, D. Back-Schaffer, and E. Hagersten. Bandwdth bandt: Understandng memory contenton. In Performance Anayss of Systems and Software (ISPASS), 212 IEEE Internatona Symposum on. IEEE, 212. [27] S. Eranan. Perfmon: Lnux performance montorng for a-64. Downoadabe software wth documentaton, hp. hp. com/research/nux/perfmon, 23. [28] A. Fedorova, M. Setzer, and M. D. Smth. Cache-far thread schedung for mutcore processors. Dvson of Engneerng and Apped Scences, Harvard Unversty, Tech. Rep. TR-17-6, 26. [29] A. Fedorova, M. Setzer, and M. D. Smth. Improvng performance soaton on chp mutprocessors va an operatng system scheduer. In 16th Internatona Conference on Parae Archtecture and Compaton Technques, 27. [3] M. R. Garey and D. S. Johnson. Compexty resuts for mutprocessor schedung under resource constrants. SIAM Journa on Computng, 4(4), [31] S. Govndan, J. Lu, A. Kansa, and A. Svasubramanam. Cuanta: quantfyng effects of shared on-chp resource nterference for consodated vrtua machnes. In Proceedngs of the 2nd ACM Symposum on Coud Computng. ACM, 211. [32] J. He, A. Jagatheesan, S. Gupta, J. Bennett, and A. Snavey. Dash: a recpe for a fash-based data ntensve supercomputer. In 21 ACM/IEEE Internatona Conference for Hgh Performance Computng, Networkng, Storage and Anayss, 21. [33] C. Iancu, S. Hofmeyr, F. Bagojevc, and Y. Zheng. Oversubscrpton on mutcore processors. In Parae Dstrbuted Processng (IPDPS), 21 IEEE Internatona Symposum on, apr 21. [34] E. Ipek, M. Krman, N. Krman, and J. F. Martnez. Core fuson: accommodatng software dversty n chp mutprocessors. ACM SIGARCH Computer Archtecture News, 35(2), 27. [35] Y. Jang, X. Shen, J. Chen, and R. Trpath. Anayss and approxmaton of optma co-schedung on chp mutprocessors. In Proceedngs of the 17th nternatona conference on Parae archtectures and compaton technques. ACM, 28. [36] I. Karn, A. Bhatee, J. Keaser, B. L. Chamberan, J. Cohen, Z. DeVto, R. Haque, D. Laney, E. Luke, F. Wang, et a. Exporng tradtona and emergng parae programmng modes usng a proxy appcaton. 27th IEEE Internatona Parae & Dstrbuted Processng Symposum (IEEE IPDPS 213), Boston, USA, 213. [37] H. Kasahara and S. Narta. Practca mutprocessor

An Efficient Job Scheduling for MapReduce Clusters

An Efficient Job Scheduling for MapReduce Clusters Internatona Journa of Future Generaton ommuncaton and Networkng, pp. 391-398 http://dx.do.org/10.14257/jfgcn.2015.8.2.32 An Effcent Job Schedung for MapReduce usters Jun Lu 1, Tanshu Wu 1, and Mng We Ln

More information

Approximation Algorithms for Data Distribution with Load Balancing of Web Servers

Approximation Algorithms for Data Distribution with Load Balancing of Web Servers Approxmaton Agorthms for Data Dstrbuton wth Load Baancng of Web Servers L-Chuan Chen Networkng and Communcatons Department The MITRE Corporaton McLean, VA 22102 chen@mtreorg Hyeong-Ah Cho Department of

More information

An Ensemble Classification Framework to Evolving Data Streams

An Ensemble Classification Framework to Evolving Data Streams Internatona Journa of Scence and Research (IJSR) ISSN (Onne): 39-7064 An Ensembe Cassfcaton Framework to Evovng Data Streams Naga Chthra Dev. R MCA, (M.Ph), Sr Jayendra Saraswathy Maha Vdyaaya, Coege of

More information

Multi-agent System for Custom Relationship Management with SVMs Tool

Multi-agent System for Custom Relationship Management with SVMs Tool Mut-agent System for Custom Reatonshp Management wth SVMs oo Yanshan Xao, Bo Lu, 3, Dan Luo, and Longbng Cao Guangzhou Asan Games Organzng Commttee, Guangzhou 5063, P.R. Chna Facuty of Informaton echnoogy,

More information

Asymptotically Optimal Inventory Control for Assemble-to-Order Systems with Identical Lead Times

Asymptotically Optimal Inventory Control for Assemble-to-Order Systems with Identical Lead Times Asymptotcay Optma Inventory Contro for Assembe-to-Order Systems wth Identca ead Tmes Martn I. Reman Acate-ucent Be abs, Murray H, NJ 07974, marty@research.be-abs.com Qong Wang Industra and Enterprse Systems

More information

Off-line and on-line scheduling on heterogeneous master-slave platforms

Off-line and on-line scheduling on heterogeneous master-slave platforms Laboratore de Informatque du Paraésme Écoe Normae Supéreure de Lyon Unté Mxte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Off-ne and on-ne schedung on heterogeneous master-save patforms Jean-Franços

More information

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign PAS: A Packet Accountng System to Lmt the Effects of DoS & DDoS Debsh Fesehaye & Klara Naherstedt Unversty of Illnos-Urbana Champagn DoS and DDoS DDoS attacks are ncreasng threats to our dgtal world. Exstng

More information

Clustering based Two-Stage Text Classification Requiring Minimal Training Data

Clustering based Two-Stage Text Classification Requiring Minimal Training Data OI: 10.2298/CSIS120130044Z Custerng based Two-Stage Text Cassfcaton Requrng Mnma Tranng ata Xue Zhang 1,2 and Wangxn Xao 3,4 1 Key Laboratory of Hgh Confdence Software Technooges, Mnstry of Educaton, Pekng

More information

Predictive Control of a Smart Grid: A Distributed Optimization Algorithm with Centralized Performance Properties*

Predictive Control of a Smart Grid: A Distributed Optimization Algorithm with Centralized Performance Properties* Predctve Contro of a Smart Grd: A Dstrbuted Optmzaton Agorthm wth Centrazed Performance Propertes* Phpp Braun, Lars Grüne, Chrstopher M. Keett 2, Steven R. Weer 2, and Kar Worthmann 3 Abstract The authors

More information

SIMPLIFYING NDA PROGRAMMING WITH PROt SQL

SIMPLIFYING NDA PROGRAMMING WITH PROt SQL SIMPLIFYING NDA PROGRAMMING WITH PROt SQL Aeen L. Yam, Besseaar Assocates, Prnceton, NJ ABSRACf The programmng of New Drug Appcaton (NDA) Integrated Summary of Safety (ISS) usuay nvoves obtanng patent

More information

TCP/IP Interaction Based on Congestion Price: Stability and Optimality

TCP/IP Interaction Based on Congestion Price: Stability and Optimality TCP/IP Interacton Based on Congeston Prce: Stabty and Optmaty Jayue He Eectrca Engneerng Prnceton Unversty Ema: jhe@prncetonedu Mung Chang Eectrca Engneerng Prnceton Unversty Ema: changm@prncetonedu Jennfer

More information

Predicting Advertiser Bidding Behaviors in Sponsored Search by Rationality Modeling

Predicting Advertiser Bidding Behaviors in Sponsored Search by Rationality Modeling Predctng Advertser Bddng Behavors n Sponsored Search by Ratonaty Modeng Hafeng Xu Centre for Computatona Mathematcs n Industry and Commerce Unversty of Wateroo Wateroo, ON, Canada hafeng.ustc@gma.com Dy

More information

A Simple Congestion-Aware Algorithm for Load Balancing in Datacenter Networks

A Simple Congestion-Aware Algorithm for Load Balancing in Datacenter Networks A Smpe Congeston-Aware Agorthm for Load Baancng n Datacenter Networs Mehrnoosh Shafee, and Javad Ghader, Coumba Unversty Abstract We study the probem of oad baancng n datacenter networs, namey, assgnng

More information

Dynamic Virtual Network Allocation for OpenFlow Based Cloud Resident Data Center

Dynamic Virtual Network Allocation for OpenFlow Based Cloud Resident Data Center 56 IEICE TRANS. COMMUN., VOL.E96 B, NO. JANUARY 203 PAPER Speca Secton on Networ Vrtuazaton, and Fuson Patform of Computng and Networng Dynamc Vrtua Networ Aocaton for OpenFow Based Coud Resdent Data Center

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

Increasing Supported VoIP Flows in WMNs through Link-Based Aggregation

Increasing Supported VoIP Flows in WMNs through Link-Based Aggregation Increasng Supported VoIP Fows n WMNs through n-based Aggregaton J. Oech, Y. Hamam, A. Kuren F SATIE TUT Pretora, South Afrca oechr@gma.com T. Owa Meraa Insttute Counc of Scentfc and Industra Research (CSIR)

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts Power-of-wo Polces for Sngle- Warehouse Mult-Retaler Inventory Systems wth Order Frequency Dscounts José A. Ventura Pennsylvana State Unversty (USA) Yale. Herer echnon Israel Insttute of echnology (Israel)

More information

A Resources Allocation Model for Multi-Project Management

A Resources Allocation Model for Multi-Project Management A Resources Aocaton Mode for Mut-Proect Management Hamdatou Kane, Aban Tsser To cte ths verson: Hamdatou Kane, Aban Tsser. A Resources Aocaton Mode for Mut-Proect Management. 9th Internatona Conference

More information

IWFMS: An Internal Workflow Management System/Optimizer for Hadoop

IWFMS: An Internal Workflow Management System/Optimizer for Hadoop IWFMS: An Internal Workflow Management System/Optmzer for Hadoop Lan Lu, Yao Shen Department of Computer Scence and Engneerng Shangha JaoTong Unversty Shangha, Chna lustrve@gmal.com, yshen@cs.sjtu.edu.cn

More information

Comparison of workflow software products

Comparison of workflow software products Internatona Conference on Computer Systems and Technooges - CompSysTech 2006 Comparson of worfow software products Krasmra Stoova,Todor Stoov Abstract: Ths research addresses probems, reated to the assessment

More information

XAC08-6 Professional Project Management

XAC08-6 Professional Project Management 1 XAC08-6 Professona Project anagement Ths Lecture: Tte s so manager shoud ncude a s management pan a document that gudes any experts agree Some faure project to Ba, ba, ba, ba Communcaton anagement Wee

More information

Cardiovascular Event Risk Assessment Fusion of Individual Risk Assessment Tools Applied to the Portuguese Population

Cardiovascular Event Risk Assessment Fusion of Individual Risk Assessment Tools Applied to the Portuguese Population Cardovascuar Event Rsk Assessment Fuson of Indvdua Rsk Assessment Toos Apped to the Portuguese Popuaton S. Paredes, T. Rocha, P. de Carvaho, J. Henrques, J. Moras*, J. Ferrera, M. Mendes Abstract Cardovascuar

More information

A Programming Model for the Cloud Platform

A Programming Model for the Cloud Platform Internatonal Journal of Advanced Scence and Technology A Programmng Model for the Cloud Platform Xaodong Lu School of Computer Engneerng and Scence Shangha Unversty, Shangha 200072, Chna luxaodongxht@qq.com

More information

Simple Interest Loans (Section 5.1) :

Simple Interest Loans (Section 5.1) : Chapter 5 Fnance The frst part of ths revew wll explan the dfferent nterest and nvestment equatons you learned n secton 5.1 through 5.4 of your textbook and go through several examples. The second part

More information

Efficient Bandwidth Management in Broadband Wireless Access Systems Using CAC-based Dynamic Pricing

Efficient Bandwidth Management in Broadband Wireless Access Systems Using CAC-based Dynamic Pricing Effcent Bandwdth Management n Broadband Wreless Access Systems Usng CAC-based Dynamc Prcng Bader Al-Manthar, Ndal Nasser 2, Najah Abu Al 3, Hossam Hassanen Telecommuncatons Research Laboratory School of

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

Expressive Negotiation over Donations to Charities

Expressive Negotiation over Donations to Charities Expressve Negotaton over Donatons to Chartes Vncent Contzer Carnege Meon Unversty 5000 Forbes Avenue Pttsburgh, PA 523, USA contzer@cs.cmu.edu Tuomas Sandhom Carnege Meon Unversty 5000 Forbes Avenue Pttsburgh,

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

HP Mission-Critical Services

HP Mission-Critical Services HP Msson-Crtcal Servces Delverng busness value to IT Jelena Bratc Zarko Subotc TS Support tm Mart 2012, Podgorca 2010 Hewlett-Packard Development Company, L.P. The nformaton contaned heren s subject to

More information

Fault tolerance in cloud technologies presented as a service

Fault tolerance in cloud technologies presented as a service Internatonal Scentfc Conference Computer Scence 2015 Pavel Dzhunev, PhD student Fault tolerance n cloud technologes presented as a servce INTRODUCTION Improvements n technques for vrtualzaton and performance

More information

Activity Scheduling for Cost-Time Investment Optimization in Project Management

Activity Scheduling for Cost-Time Investment Optimization in Project Management PROJECT MANAGEMENT 4 th Internatonal Conference on Industral Engneerng and Industral Management XIV Congreso de Ingenería de Organzacón Donosta- San Sebastán, September 8 th -10 th 010 Actvty Schedulng

More information

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing A Replcaton-Based and Fault Tolerant Allocaton Algorthm for Cloud Computng Tork Altameem Dept of Computer Scence, RCC, Kng Saud Unversty, PO Box: 28095 11437 Ryadh-Saud Araba Abstract The very large nfrastructure

More information

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS 21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS

More information

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE Yu-L Huang Industral Engneerng Department New Mexco State Unversty Las Cruces, New Mexco 88003, U.S.A. Abstract Patent

More information

Financial Mathemetics

Financial Mathemetics Fnancal Mathemetcs 15 Mathematcs Grade 12 Teacher Gude Fnancal Maths Seres Overvew In ths seres we am to show how Mathematcs can be used to support personal fnancal decsons. In ths seres we jon Tebogo,

More information

Prediction of Success or Fail of Students on Different Educational Majors at the End of the High School with Artificial Neural Networks Methods

Prediction of Success or Fail of Students on Different Educational Majors at the End of the High School with Artificial Neural Networks Methods Predcton of Success or Fa of on Dfferent Educatona Maors at the End of the Hgh Schoo th Artfca Neura Netors Methods Sayyed Mad Maznan, Member, IACSIT, and Sayyede Azam Aboghasempur Abstract The man obectve

More information

Branch-and-Price and Heuristic Column Generation for the Generalized Truck-and-Trailer Routing Problem

Branch-and-Price and Heuristic Column Generation for the Generalized Truck-and-Trailer Routing Problem REVISTA DE MÉTODOS CUANTITATIVOS PARA LA ECONOMÍA Y LA EMPRESA (12) Págnas 5 38 Dcembre de 2011 ISSN: 1886-516X DL: SE-2927-06 URL: http://wwwupoes/revmetcuant/artphp?d=51 Branch-and-Prce and Heurstc Coumn

More information

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy 4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

Lecture 3: Force of Interest, Real Interest Rate, Annuity

Lecture 3: Force of Interest, Real Interest Rate, Annuity Lecture 3: Force of Interest, Real Interest Rate, Annuty Goals: Study contnuous compoundng and force of nterest Dscuss real nterest rate Learn annuty-mmedate, and ts present value Study annuty-due, and

More information

The Dynamics of Wealth and Income Distribution in a Neoclassical Growth Model * Stephen J. Turnovsky. University of Washington, Seattle

The Dynamics of Wealth and Income Distribution in a Neoclassical Growth Model * Stephen J. Turnovsky. University of Washington, Seattle The Dynamcs of Weath and Income Dstrbuton n a Neocassca Growth Mode * Stephen J. Turnovsy Unversty of Washngton, Seatte Ceca García-Peñaosa CNRS and GREQAM March 26 Abstract: We examne the evouton of the

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm A New Task Schedulng Algorthm Based on Improved Genetc Algorthm n Cloud Computng Envronment Congcong Xong, Long Feng, Lxan Chen A New Task Schedulng Algorthm Based on Improved Genetc Algorthm n Cloud Computng

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

Rate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Priority-based scheduling. States of a process

Rate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Priority-based scheduling. States of a process Dsadvantages of cyclc TDDB47 Real Tme Systems Manual scheduler constructon Cannot deal wth any runtme changes What happens f we add a task to the set? Real-Tme Systems Laboratory Department of Computer

More information

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy Fnancal Tme Seres Analyss Patrck McSharry patrck@mcsharry.net www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton

More information

A Novel Auction Mechanism for Selling Time-Sensitive E-Services

A Novel Auction Mechanism for Selling Time-Sensitive E-Services A ovel Aucton Mechansm for Sellng Tme-Senstve E-Servces Juong-Sk Lee and Boleslaw K. Szymansk Optmaret Inc. and Department of Computer Scence Rensselaer Polytechnc Insttute 110 8 th Street, Troy, Y 12180,

More information

Loop Parallelization

Loop Parallelization - - Loop Parallelzaton C-52 Complaton steps: nested loops operatng on arrays, sequentell executon of teraton space DECLARE B[..,..+] FOR I :=.. FOR J :=.. I B[I,J] := B[I-,J]+B[I-,J-] ED FOR ED FOR analyze

More information

FORMAL ANALYSIS FOR REAL-TIME SCHEDULING

FORMAL ANALYSIS FOR REAL-TIME SCHEDULING FORMAL ANALYSIS FOR REAL-TIME SCHEDULING Bruno Dutertre and Vctora Stavrdou, SRI Internatonal, Menlo Park, CA Introducton In modern avoncs archtectures, applcaton software ncreasngly reles on servces provded

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications Methodology to Determne Relatonshps between Performance Factors n Hadoop Cloud Computng Applcatons Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng and

More information

SUPPORT VECTOR MACHINE FOR REGRESSION AND APPLICATIONS TO FINANCIAL FORECASTING

SUPPORT VECTOR MACHINE FOR REGRESSION AND APPLICATIONS TO FINANCIAL FORECASTING SUPPORT VECTOR MACHINE FOR REGRESSION AND APPICATIONS TO FINANCIA FORECASTING Theodore B. Trafas and Husen Ince Schoo of Industra Engneerng Unverst of Okahoma W. Bod Sute 4 Norman Okahoma 739 trafas@ecn.ou.edu;

More information

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng

More information

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1 Send Orders for Reprnts to reprnts@benthamscence.ae The Open Cybernetcs & Systemcs Journal, 2014, 8, 115-121 115 Open Access A Load Balancng Strategy wth Bandwdth Constrant n Cloud Computng Jng Deng 1,*,

More information

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1. HIGHER DOCTORATE DEGREES SUMMARY OF PRINCIPAL CHANGES General changes None Secton 3.2 Refer to text (Amendments to verson 03.0, UPR AS02 are shown n talcs.) 1 INTRODUCTION 1.1 The Unversty may award Hgher

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

Survey on Virtual Machine Placement Techniques in Cloud Computing Environment

Survey on Virtual Machine Placement Techniques in Cloud Computing Environment Survey on Vrtual Machne Placement Technques n Cloud Computng Envronment Rajeev Kumar Gupta and R. K. Paterya Department of Computer Scence & Engneerng, MANIT, Bhopal, Inda ABSTRACT In tradtonal data center

More information

Efficient Striping Techniques for Variable Bit Rate Continuous Media File Servers æ

Efficient Striping Techniques for Variable Bit Rate Continuous Media File Servers æ Effcent Strpng Technques for Varable Bt Rate Contnuous Meda Fle Servers æ Prashant J. Shenoy Harrck M. Vn Department of Computer Scence, Department of Computer Scences, Unversty of Massachusetts at Amherst

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

Research on Single and Mixed Fleet Strategy for Open Vehicle Routing Problem

Research on Single and Mixed Fleet Strategy for Open Vehicle Routing Problem 276 JOURNAL OF SOFTWARE, VOL 6, NO, OCTOBER 2 Research on Snge and Mxed Feet Strategy for Open Vehce Routng Probe Chunyu Ren Heongjang Unversty /Schoo of Inforaton scence and technoogy, Harbn, Chna Ea:

More information

Pricing Model of Cloud Computing Service with Partial Multihoming

Pricing Model of Cloud Computing Service with Partial Multihoming Prcng Model of Cloud Computng Servce wth Partal Multhomng Zhang Ru 1 Tang Bng-yong 1 1.Glorous Sun School of Busness and Managment Donghua Unversty Shangha 251 Chna E-mal:ru528369@mal.dhu.edu.cn Abstract

More information

A Secure Password-Authenticated Key Agreement Using Smart Cards

A Secure Password-Authenticated Key Agreement Using Smart Cards A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,

More information

J. Parallel Distrib. Comput. Environment-conscious scheduling of HPC applications on distributed Cloud-oriented data centers

J. Parallel Distrib. Comput. Environment-conscious scheduling of HPC applications on distributed Cloud-oriented data centers J. Parallel Dstrb. Comput. 71 (2011) 732 749 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. ournal homepage: www.elsever.com/locate/pdc Envronment-conscous schedulng of HPC applcatons

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

Section 5.4 Annuities, Present Value, and Amortization

Section 5.4 Annuities, Present Value, and Amortization Secton 5.4 Annutes, Present Value, and Amortzaton Present Value In Secton 5.2, we saw that the present value of A dollars at nterest rate per perod for n perods s the amount that must be deposted today

More information

Application of Multi-Agents for Fault Detection and Reconfiguration of Power Distribution Systems

Application of Multi-Agents for Fault Detection and Reconfiguration of Power Distribution Systems 1 Applcaton of Mult-Agents for Fault Detecton and Reconfguraton of Power Dstrbuton Systems K. Nareshkumar, Member, IEEE, M. A. Choudhry, Senor Member, IEEE, J. La, A. Felach, Senor Member, IEEE Abstract--The

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

DBA-VM: Dynamic Bandwidth Allocator for Virtual Machines

DBA-VM: Dynamic Bandwidth Allocator for Virtual Machines DBA-VM: Dynamc Bandwdth Allocator for Vrtual Machnes Ahmed Amamou, Manel Bourguba, Kamel Haddadou and Guy Pujolle LIP6, Perre & Mare Cure Unversty, 4 Place Jusseu 755 Pars, France Gand SAS, 65 Boulevard

More information

Swing-Free Transporting of Two-Dimensional Overhead Crane Using Sliding Mode Fuzzy Control

Swing-Free Transporting of Two-Dimensional Overhead Crane Using Sliding Mode Fuzzy Control Swng-Free Transportng of Two-Dmensona Overhead Crane Usng Sdng Mode Fuzzy Contro Dantong Lu, Janqang, Dongn Zhao, and We Wang Astract An adaptve sdng mode fuzzy contro approach s proposed for a two-dmensona

More information

Heuristic Static Load-Balancing Algorithm Applied to CESM

Heuristic Static Load-Balancing Algorithm Applied to CESM Heurstc Statc Load-Balancng Algorthm Appled to CESM 1 Yur Alexeev, 1 Sher Mckelson, 1 Sven Leyffer, 1 Robert Jacob, 2 Anthony Crag 1 Argonne Natonal Laboratory, 9700 S. Cass Avenue, Argonne, IL 60439,

More information

Dynamic Resource Allocation for MapReduce with Partitioning Skew

Dynamic Resource Allocation for MapReduce with Partitioning Skew Ths artcle has been accepted for publcaton n a future ssue of ths journal, but has not been fully edted. Content may change pror to fnal publcaton. Ctaton nformaton: DOI 1.119/TC.216.253286, IEEE Transactons

More information

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall SP 2005-02 August 2005 Staff Paper Department of Appled Economcs and Management Cornell Unversty, Ithaca, New York 14853-7801 USA Farm Savngs Accounts: Examnng Income Varablty, Elgblty, and Benefts Brent

More information

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers Journal of Computatonal Informaton Systems 7: 13 (2011) 4740-4747 Avalable at http://www.jofcs.com A Load-Balancng Algorthm for Cluster-based Mult-core Web Servers Guohua YOU, Yng ZHAO College of Informaton

More information

A heuristic task deployment approach for load balancing

A heuristic task deployment approach for load balancing Xu Gaochao, Dong Yunmeng, Fu Xaodog, Dng Yan, Lu Peng, Zhao Ja Abstract A heurstc task deployment approach for load balancng Gaochao Xu, Yunmeng Dong, Xaodong Fu, Yan Dng, Peng Lu, Ja Zhao * College of

More information

The Load Balancing of Database Allocation in the Cloud

The Load Balancing of Database Allocation in the Cloud , March 3-5, 23, Hong Kong The Load Balancng of Database Allocaton n the Cloud Yu-lung Lo and Mn-Shan La Abstract Each database host n the cloud platform often has to servce more than one database applcaton

More information

Credit Limit Optimization (CLO) for Credit Cards

Credit Limit Optimization (CLO) for Credit Cards Credt Lmt Optmzaton (CLO) for Credt Cards Vay S. Desa CSCC IX, Ednburgh September 8, 2005 Copyrght 2003, SAS Insttute Inc. All rghts reserved. SAS Propretary Agenda Background Tradtonal approaches to credt

More information

Traffic-light a stress test for life insurance provisions

Traffic-light a stress test for life insurance provisions MEMORANDUM Date 006-09-7 Authors Bengt von Bahr, Göran Ronge Traffc-lght a stress test for lfe nsurance provsons Fnansnspetonen P.O. Box 6750 SE-113 85 Stocholm [Sveavägen 167] Tel +46 8 787 80 00 Fax

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

7.5. Present Value of an Annuity. Investigate

7.5. Present Value of an Annuity. Investigate 7.5 Present Value of an Annuty Owen and Anna are approachng retrement and are puttng ther fnances n order. They have worked hard and nvested ther earnngs so that they now have a large amount of money on

More information

Addendum to: Importing Skill-Biased Technology

Addendum to: Importing Skill-Biased Technology Addendum to: Importng Skll-Based Technology Arel Bursten UCLA and NBER Javer Cravno UCLA August 202 Jonathan Vogel Columba and NBER Abstract Ths Addendum derves the results dscussed n secton 3.3 of our

More information

On the Interaction between Load Balancing and Speed Scaling

On the Interaction between Load Balancing and Speed Scaling On the Interacton between Load Balancng and Speed Scalng Ljun Chen, Na L and Steven H. Low Engneerng & Appled Scence Dvson, Calforna Insttute of Technology, USA Abstract Speed scalng has been wdely adopted

More information

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST) yaoqi.feng@yahoo.

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST) yaoqi.feng@yahoo. ICSV4 Carns Australa 9- July, 007 RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL Yaoq FENG, Hanpng QIU Dynamc Test Laboratory, BISEE Chna Academy of Space Technology (CAST) yaoq.feng@yahoo.com Abstract

More information

Lecture 3: Annuity. Study annuities whose payments form a geometric progression or a arithmetic progression.

Lecture 3: Annuity. Study annuities whose payments form a geometric progression or a arithmetic progression. Lecture 3: Annuty Goals: Learn contnuous annuty and perpetuty. Study annutes whose payments form a geometrc progresson or a arthmetc progresson. Dscuss yeld rates. Introduce Amortzaton Suggested Textbook

More information

Airport Investment Risk Assessment under Uncertainty

Airport Investment Risk Assessment under Uncertainty Word Academy of Scence, Engneerng and Technoogy Internatona Journa of Mathematca, Computatona, Physca, Eectrca and Computer Engneerng Vo:8, No:9, 2014 Arport Investment Rs Assessment under Uncertanty Eena

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

An ILP Formulation for Task Mapping and Scheduling on Multi-core Architectures

An ILP Formulation for Task Mapping and Scheduling on Multi-core Architectures An ILP Formulaton for Task Mappng and Schedulng on Mult-core Archtectures Yng Y, We Han, Xn Zhao, Ahmet T. Erdogan and Tughrul Arslan Unversty of Ednburgh, The Kng's Buldngs, Mayfeld Road, Ednburgh, EH9

More information

Using Series to Analyze Financial Situations: Present Value

Using Series to Analyze Financial Situations: Present Value 2.8 Usng Seres to Analyze Fnancal Stuatons: Present Value In the prevous secton, you learned how to calculate the amount, or future value, of an ordnary smple annuty. The amount s the sum of the accumulated

More information

GRADIENT METHODS FOR BINARY INTEGER PROGRAMMING

GRADIENT METHODS FOR BINARY INTEGER PROGRAMMING Proceedns of the 4st Internatona Conference on Computers & Industra Enneern GRADIENT METHODS FOR BINARY INTEGER PROGRAMMING Chen-Yuan Huan¹, Ta-Chun Wan² Insttute of Cv Avaton, Natona Chen Kun Unversty,

More information

Cloud Auto-Scaling with Deadline and Budget Constraints

Cloud Auto-Scaling with Deadline and Budget Constraints Prelmnary verson. Fnal verson appears In Proceedngs of 11th ACM/IEEE Internatonal Conference on Grd Computng (Grd 21). Oct 25-28, 21. Brussels, Belgum. Cloud Auto-Scalng wth Deadlne and Budget Constrants

More information

Dynamic Scheduling of Emergency Department Resources

Dynamic Scheduling of Emergency Department Resources Dynamc Schedulng of Emergency Department Resources Junchao Xao Laboratory for Internet Software Technologes, Insttute of Software, Chnese Academy of Scences P.O.Box 8718, No. 4 South Fourth Street, Zhong

More information

Enabling P2P One-view Multi-party Video Conferencing

Enabling P2P One-view Multi-party Video Conferencing Enablng P2P One-vew Mult-party Vdeo Conferencng Yongxang Zhao, Yong Lu, Changja Chen, and JanYn Zhang Abstract Mult-Party Vdeo Conferencng (MPVC) facltates realtme group nteracton between users. Whle P2P

More information

A Cost-Effective Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems

A Cost-Effective Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems A Cost-Effectve Strategy for Intermedate Data Storage n Scentfc Cloud Workflow Systems Dong Yuan, Yun Yang, Xao Lu, Jnjun Chen Faculty of Informaton and Communcaton Technologes, Swnburne Unversty of Technology

More information