Making State Explicit for Imperative Big Data Processing

Size: px
Start display at page:

Download "Making State Explicit for Imperative Big Data Processing"

Transcription

1 Makig State Explicit for Imperative Big Data Processig Raul Castro Feradez, Imperial College Lodo; Matteo Migliavacca, Uiversity of Ket; Evagelia Kalyviaaki, City Uiversity Lodo; Peter Pietzuch, Imperial College Lodo This paper is icluded i the Proceedigs of USENIX ATC 4: 24 USENIX Aual Techical Coferece. Jue 9 2, 24 Philadelphia, PA Ope access to the Proceedigs of USENIX ATC 4: 24 USENIX Aual Techical Coferece is sposored by USENIX.

2 Makig State Explicit for Imperative Big Data Processig Raul Castro Feradez, Matteo Migliavacca, Evagelia Kalyviaaki, Peter Pietzuch Imperial College Lodo, Uiversity of Ket, City Uiversity Lodo Abstract Data scietists ofte implemet machie learig algorithms i imperative laguages such as Java, Matlab ad R. Yet such implemetatios fail to achieve the performace ad scalability of specialised data-parallel processig frameworks. Our goal is to execute imperative Java programs i a data-parallel fashio with high throughput ad low latecy. This raises two challeges: how to support the arbitrary mutable state of Java programs without compromisig scalability, ad how to recover that state after failure with low overhead. Our idea is to ifer the dataflow ad the types of state accesses from a Java program ad use this iformatio to geerate a stateful dataflow graph (SDG). By explicitly separatig data from mutable state, SDGs have specific features to eable this traslatio: to esure scalability, distributed state ca be partitioed across odes if computatio ca occur etirely i parallel; if this is ot possible, partial state gives odes local istaces for idepedet computatio, which are recociled accordig to applicatio sematics. For fault tolerace, large imemory state is checkpoited asychroously without global coordiatio. We show that the performace of SDGs for several imperative olie applicatios matches that of existig data-parallel processig frameworks. Itroductio Data scietists wat to use ever more sophisticated implemetatios of machie learig algorithms, such as collaborative filterig [32], k-meas clusterig ad logistic regressio [2], ad execute them over large datasets while providig fresh, low latecy results. With the domiace of imperative programmig, such algorithms are ofte implemeted i laguages such as Java, Matlab or R. Such implemetatios though make it challegig to achieve high performace. O the other had, data-parallel processig frameworks, such as MapReduce [8], Spark [38] ad Naiad [26], ca scale computatio to a large umber of odes. Such frameworks, however, require developers to adopt particular fuctioal [37], declarative [3] or dataflow [5] programmig models. While early frameworks such as MapReduce [8] followed a restricted fuctioal model, resultig i wide-spread adoptio, recet more expressive frameworks such as Spark [38] ad Naiad [26] require developers to lear more complex programmig models, e.g. based o a richer set of higherorder fuctios. Our goal is therefore to traslate imperative Java implemetatios of machie learig algorithms to a represetatio that ca be executed i a data-parallel fashio. The executio should scale to a large umber of odes, achievig high throughput ad low processig latecy. This is challegig because Java programs support arbitrary mutable state. For example, a implemetatio of collaborative filterig [32] uses a mutable matrix to represet a model that is refied iteratively: as ew data arrives, the matrix is updated at a fie graularity ad accessed to provide up-to-date predictios. Havig stateful computatio raises two issues: first, the state may grow large, e.g. o the order of hudreds of GBs for a collaborative filterig model with tes of thousads of users. Therefore the state ad its associated computatio must be distributed across odes; secod, large state must be restored efficietly after ode failure. The failure recovery mechaism should have a low impact o performace. Curret data-parallel frameworks do ot hadle large state effectively. I stateless frameworks [8, 37, 38], computatio is defied through side-effect-free fuctioal tasks. Ay modificatio to state, such as updatig a sigle elemet i a matrix, must be implemeted as the creatio of ew immutable data, which is iefficiet. While recet frameworks [26, ] have recogised the eed for per-task mutable state, they lack abstractios for distributed state ad exhibit high overhead uder faulttolerat operatio with large state (see 6.). Imperative programmig model. We describe how, USENIX Associatio 24 USENIX Aual Techical Coferece 49

3 with the help of a few aotatios by developers, Java programs ca be executed automatically i a distributed data-parallel fashio. Our idea is to ifer the dataflow ad the types of state accesses from a Java program ad use this iformatio to traslate the program to a executable distributed dataflow represetatio. Usig program aalysis, our approach extracts the processig tasks ad state fields from the program ad ifers the variablelevel dataflow. Stateful dataflow graphs. This traslatio relies o the features of a ew fault-tolerat data-parallel processig model called stateful dataflow graphs (SDGs). A SDG explicitly distiguishes betwee data ad state: it is a cyclic graph of pipelied data-parallel tasks, which execute o differet odes ad access local i-memory state. SDGs iclude abstractios for maitaiig large state efficietly i a distributed fashio: if tasks ca process state etirely i parallel, the state is partitioed across odes; if this is ot possible, tasks are give local istaces of partial state for idepedet computatio. Computatio ca iclude sychroisatio poits to access all partial state istaces, ad istaces ca be recociled accordig to applicatio sematics. Data flows betwee tasks i a SDG, ad cycles specify iterative computatio. All tasks are pipelied this leads to low latecy, less itermediate data durig failure recovery ad simplified schedulig by ot havig to compute data depedecies. Tasks are replicated at rutime to overcome processig bottleecks ad stragglers. Failure recovery. Whe recoverig from failures, odes must restore potetially gigabytes of i-memory state. We describe a asychroous checkpoitig mechaism with log-based recovery that uses data structures for dirty state to miimise the iterruptio to tasks while takig local checkpoits. Checkpoits are persisted to multiple disks i parallel, from which they ca be restored to multiple odes, thus reducig recovery time. With a prototype system of SDGs, we execute Java implemetatios of collaborative filterig, logistic regressio ad a key/value store o a private cluster ad Amazo EC2. We show that SDGs execute with high throughput (comparable to batch processig systems) ad low latecy (comparable to streamig systems). Eve with large state, their failure recovery mechaism has a low performace impact, recoverig i secods. The paper cotributios ad its structure are as follows: based o a sample Java program ( 2.) ad the features of existig dataflow models ( 2.2), we motivate the eed for stateful dataflow graphs ad describe their properties ( 3); 4 explais the traslatio from Java to SDGs; 5 describes failure recovery; ad 6 presets evaluatio results, followed by related work ( 7). Algorithm : Olie collaborative Matrix useritem = ew Matrix(); Matrix coocc = ew Matrix(); 3 4 void addratig(it user, it item, it ratig) { 5 useritem.setelemet(user, item, ratig); 6 Vector userrow = useritem.getrow(user); 7 for (it i = ; i < userrow.size(); i++) 8 if (userrow.get(i) > ) { 9 it cout = coocc.getelemet(item, i); coocc.setelemet(item, i, cout + ); coocc.setelemet(i, item, cout + ); 2 } 3 } 4 Vector getrec(it user) { 5 Vector userrow = useritem.getrow(user); Vector userrec coocc.multiply( userrow); 7 Vector rec = merge(@global userrec); 8 retur rec; 9 } 2 Vector merge(@collectio Vector[] alluserrec) { 2 Vector rec = ew Vector(allUserRec[].size()); 22 for (Vector cur : alluserrec) 23 for (it i = ; i < alluserrec.legth; i++) 24 rec.set(i, cur.get(i) + rec.get(i)); 25 retur rec; 26 } 2 State i Data-Parallel Processig We describe a imperative implemetatio of a machie learig algorithm ad ivestigate how it ca execute i a data-parallel fashio o a set of odes, payig attetio to its use of mutable state ( 2.). Based o this aalysis, we discuss the features of existig data-parallel processig models for supportig such a executio ( 2.2). 2. Applicatio example Alg. shows a Java implemetatio of a olie machie learig algorithm, collaborative filterig (CF) [32]. It outputs up-to-date recommedatios of items to users (fuctio getrec) based o previous item ratigs (fuctio addratig). The algorithm maitais state i two data structures: the matrix useritem stores the ratigs of items made by users (lie ); the co-occurrece matrix coocc records correlatios betwee items that were rated together by multiple users (lie 2). For may users ad items, useritem ad coocc become large ad must be distributed: useritem ca be partitioed across odes based o the user idetifier as a access key; sice the access to coocc is radom, it caot be partitioed but oly replicated o multiple odes i order to parallelise updates. I this case, results from a sigle istace of coocc are partial, ad must be merged with other partial results to obtai a complete result, as described below. The fuctio addratig first adds a ew ratig to useritem (lie 5). It the icremetally updates coocc by icreasig the co-occurrece couts for the ewly-rated The aotatios (startig ) will be explaied i 4 ad should be igored for ow USENIX Aual Techical Coferece USENIX Associatio

4 State hadlig Dataflow Computatioal Programmig Systems model model Represe- Large Fie-graied Low Itertatio state size updates Executio latecy atio Failure recovery MapReduce [8] map/reduce as data /a scheduled recompute DryadLINQ [37] fuctioal as data /a scheduled recompute Stateless dataflow Spark [38] fuctioal as data /a hybrid recompute CIEL [25] imperative as data /a scheduled recompute HaLoop [5] map/reduce cache scheduled recompute Icremetal dataflow Icoop [4] map/reduce cache scheduled recompute Nectar [] fuctioal cache scheduled recompute CBP [9] dataflow loopback scheduled recompute Comet [2] fuctioal as data /a scheduled recompute Batched dataflow D-Streams [39] fuctioal as data /a hybrid recompute Naiad [26] dataflow explicit hybrid syc. global checkpoits Cotiuous dataflow Storm, S4 dataflow as data /a pipelied recompute SEEP [] dataflow explicit pipelied syc. local checkpoits Parallel i-memory Piccolo [3] imperative explicit /a asyc. global checkpoits Stateful dataflow SDG imperative explicit pipelied asyc. local checkpoits item ad existig items with o-zero ratigs (lie 7 2). This requires useritem ad coocc to be mutable, with efficiet fie-graied access. Sice useritem is partitioed based o the key user, ad coocc is replicated, addratig oly accesses a sigle istace of each. The fuctio getrec takes the ratig vector of a user, userrow (lie 5), ad multiplies it by the cooccurrece matrix to obtai a recommedatio vector userrec (lie 6). Sice coocc is replicated, this must be performed o all istaces of coocc, leadig to multiple partial recommedatio vectors. These partial vectors must be merged to obtai the fial recommedatio vector rec for the user (lie 7). The fuctio merge simply computes the sum of all partial recommedatio vectors (lies 2 24). Note that addratig ad getrec have differet performace goals whe hadlig state: addratig must achieve high throughput whe updatig coocc with ew ratigs; getrec must serve requests with low latecy, e.g. whe recommedatios are icluded i dyamically geerated web pages. 2.2 Desig space The above example highlights a umber of required features of a dataflow model to eable the traslatio of imperative olie machie learig algorithms to executable dataflows: (i) the model should support large state sizes (o the order of GBs), which should be represeted explicitly ad hadled with acceptable performace; i particular, (ii) the state should permit efficiet fie-graied updates. I additio, due to the eed for upto-date results, (iii) the model should process data with low latecy, idepedetly of the amout of iput data; (iv) algorithms such as logistic regressio ad k-meas clusterig also require iteratio; ad (v) eve with large state, the model should support fast failure recovery. I Table, we classify existig data-parallel processig models accordig to the above features. Table : Desig space of data-parallel processig frameworks State hadlig. Stateless dataflows, first made popular by MapReduce [8], defie a fuctioal dataflow graph i which vertices are stateless data-parallel tasks. They do ot distiguish betwee state ad data: e.g. i a wordcout job i MapReduce, the partial word couts, which are the state, are output by map tasks as part of the dataflow [8]. Dataflows i Spark, represeted as RDDs, are immutable, which simplifies failure recovery but requires a ew RDD for each state update [38]. This is iefficiet for olie algorithms such as CF i which oly part of a matrix is updated each time. Stateless models also caot treat data differetly from state. They caot use custom idex data structures for state access, or cache oly state i memory: e.g. Shark [36] eeds explicit hits which dataflows to cache. Icremetal dataflow avoids reruig etire jobs after updates to the iput data. Such models are fudametally stateful because they maitai results from earlier computatio. Icoop [4] ad Nectar [] treat state as a cache of past results. Sice they caot ifer which data will be reused, they cache all. CBP trasforms batch jobs automatically for icremetal computatio [9]. Our goals are complemetary: SDGs do ot ifer icremetal computatio but support stateful computatio efficietly, which ca realise icremetal algorithms. Existig models that represet state explicitly, such as SEEP [] ad Naiad [26], permit tasks to have access to i-memory data structures but face challeges related to state sizes: they assume that state is small compared to the data. Whe large state requires distributed processig through partitioig or replicatio, they do ot provide abstractios to support this. I cotrast, Piccolo [3] supports scalable distributed state with a key/value abstractio. However, it does ot offer a dataflow model, which meas that it caot execute a iferred dataflow from a Java program but requires computatio to be specified as multiple kerels. Latecy ad iteratio. Tasks i a dataflow graph ca 3 USENIX Associatio 24 USENIX Aual Techical Coferece 5

5 be scheduled for executio or materialised i a pipelie, each with differet performace implicatios. Some frameworks follow a hybrid approach i which tasks o the same ode are pipelied but ot betwee odes. Sice tasks i stateless dataflows are scheduled to process coarse-graied batches of data, such systems ca exploit the full parallelism of a cluster but they caot achieve low processig latecy. For lower latecy, batched dataflows divide data ito small batches for processig ad use efficiet, yet complex, task schedulers to resolve data depedecies. They have a fudametal trade-off betwee the lower latecy of smaller batches ad the higher throughput of larger oes typically they burde developers with makig this trade-off [39]. Cotiuous dataflow adopts a streamig model with a pipelie of tasks. It does ot materialise itermediate data betwee odes ad thus has lower latecy without a schedulig overhead: as we show i 6, batched dataflows caot achieve the same low latecies. Due to our focus o olie processig with low latecy, SDGs are fully pipelied (see 3.). To improve the performace of iterative computatio i dataflows, early frameworks such as HaLoop [5] cache the results of oe iteratio as iput to the ext. Recet frameworks [5, 38, 25, 9] geeralise this cocept by permittig iteratio over arbitrary parts of the dataflow graph, executig tasks repeatedly as part of loops. Similarly SDGs support iteratio explicitly by permittig cycles i the dataflow graph. Failure recovery. To recover from failure, frameworks either recompute state based o previous data or checkpoit state to restore it. For recomputatio, Spark represets dataflows as RDDs [38], which ca be recomputed determiistically based o their lieage. Cotiuous dataflow frameworks use techiques such as upstream backup [4] to reprocess buffered data after failure. Without checkpoitig, recomputatio ca lead to log recovery times. Checkpoitig periodically saves state to disk or the memory of other odes. With large state, this becomes resource-itesive. SEEP recovers state from memory, thus doublig the memory requiremet of a cluster []. A challege is how to take cosistet checkpoits while processig data. Sychroous global checkpoitig stops processig o all odes to obtai cosistet sapshots, thus reducig performace. For example, Naiad s stop-the-world approach exhibits low throughput with large state sizes [26]. Asychroous global checkpoitig, as used by Piccolo [3], permits odes to take cosistet checkpoits at differet times. Both techiques iclude all global state i a checkpoit ad thus require all odes to restore state after failure. Istead, SDGs use a asychroous checkpoitig mechaism with log-based recovery. As described i 5, 2 ew updateuseritem ratig Task Elemet (TE) rec request getuservec user Item State Elemet (SE) dataflow updatecoocc getrecvec coocc 3 merge rec result Figure : Stateful dataflow graph for CF algorithm it does ot require global coordiatio betwee odes durig recovery, ad it uses dirty state to miimise the disruptio to processig durig local checkpoitig. 3 Stateful Dataflow Graphs The goal of stateful dataflow graphs (SDGs) is to make it easy to traslate imperative programs with mutable state to a dataflow represetatio that performs parallel, iterative computatio with low latecy. Next we describe their model ( 3.), how they support distributed state ( 3.2) ad how they are executed ( 3.3). 3. Model We explai the mai features of SDGs usig the CF algorithm from 2. as a example. As show i Fig., a SDG has two types of vertices: task elemets, t T, trasform iput to output dataflows; ad state elemets, s S, represet the state i the SDG. Access edges, a =(t,s) A, coect task elemets to the state elemets that they read or update. To facilitate the allocatio of task ad state elemets to odes, each task elemet ca oly access a sigle state elemet, i.e. A is a partial fuctio: (t i,s j ) A,(t i,s k ) A s j = s k. Dataflows are edges betwee task elemets, d =(t i,t j ) D, ad cotai data items. Task elemets (TEs) are ot scheduled for executio but the etire SDG is materialised, i.e. each TE is assiged to oe or more physical odes. Sice TEs are pipelied, it is uecessary to geerate the complete output dataflow of a TE before it is processed by the ext TE. Data items are therefore processed with low latecy, eve across a sequece of TEs, without schedulig overhead, ad fewer data items are hadled durig failure recovery (see 5). The SDG i Fig. has five TEs assiged to three odes: the updateuseritem, updatecoocc TEs realise the addratig fuctio from Alg. ; ad the getuservec, getrecvec ad merge TEs implemet the getrec fuctio. We explai the traslatio process i 4.2. State elemets (SEs) ecapsulate the state of the computatio. They are implemeted usig efficiet data structures, such as hash tables or idexed sparse matrices. I the ext sectio, we describe the abstractios for distributed SEs, which spa multiple odes. Fig. shows the two SEs of the CF algorithm: the useritem ad the coocc matrices. The access edges spec USENIX Aual Techical Coferece USENIX Associatio

6 ify that useritem is updated by the updateuseritem TE ad read by the getuservec TE; coocc is updated by updatecoocc ad read by getrecvec. Parallelism. For data-parallel processig, a TE t i ca be istatiated multiple times to hadle parts of a dataflow, resultig i multiple TE istaces, ˆt i, j : j i. As we explai i 3.3, the umber of istaces i for each TE is chose at rutime ad adjusted based o workload demads ad the occurrece of stragglers. A appropriate dispatchig strategy seds items i dataflows to TE istaces: items ca be (i) partitioed usig hash- or rage-partitioig o a key; or (ii) dispatched to a arbitrary istace, e.g. i a roud-robi fashio for load-balacig. Iteratio. I iterative algorithms, SEs are accessed multiple times by TEs. There are two cases to be distiguished: (i) if the repeated access is from a sigle TE, the iteratio is etirely local ad ca be supported efficietly by a sigle ode; ad (ii) if the iteratio ivolves multiple pipelied TEs, a cycle i the dataflow of the SDG ca propagate updates betwee TEs. With cycles i the dataflow, SDGs do ot provide coordiatio durig iteratio by default. This is sufficiet for may iterative machie learig ad data miig algorithms because they ca coverge from differet itermediate states [3], eve without explicit coordiatio. A strog cosistecy model for SDGs could be realised with per-loop timestamps, as used by Naiad [26]. 3.2 Distributed state The SDG model provides abstractios for distributed state. A SE s i may be distributed across odes, leadig to multiple SE istaces ŝ i, j, because (i) it is too large to fit ito the memory of a sigle ode; or (ii) it is accessed by a TE that has multiple istaces to process the dataflow i parallel. This requires also multiple SE istaces so that the TE istaces access state locally. Fig. illustrates these two cases: (i) the useritem SE may grow larger tha the mai memory of a sigle ode; ad (ii) the data-parallel executio of the CPU-itesive updatecoocc TE leads to multiple istaces, each requirig local access to the coocc SE. A SE ca be distributed i differet ways, which are depicted i Fig. 2: a partitioed SE splits its iteral data structure ito disjoit partitios; if this is ot possible, a partial SE duplicates its data structure, creatig multiple copies that are updated idepedetly. As we describe i 4, developers selected the required type of distributed state usig source-level aotatios accordig to the sematics of their algorithm. Partitioed state. For algorithms for which state ca be partitioed, SEs ca be split ad SE istaces placed o separate odes (see Fig. 2b). Access to the SE istaces occurs i parallel. state merge (a) SE (b) Partitioed SE (c) Partial SE Figure 2: Types of distributed state i SDGs Developers ca use predefied data structures for SEs (e.g. Vector, HashMap, Matrix ad DeseMatrix) or defie their ow by implemetig dyamic partitioig ad dirty state support (see 5). Differet data structures support differet partitioig strategies: e.g. a map ca be hash- or rage-partitioed; a matrix ca be partitioed by row or colum. To obtai a uique partitioig, TEs caot access partitioed SEs usig coflictig strategies, such as accessig a matrix by row ad by colum. I additio, the dataflow partitioig strategy must be compatible with the data access patter by the TEs, as specified i the program (see 4.2). For example, multiple TE istaces with a access edge to a partitioed SE must use the same partitioig key o the dataflow so that they access SE istaces locally: i the CF algorithm, the useritem SE ad the ew ratig ad rec request dataflows must all be partitioed by row, i.e. the users for which ratigs are maitaied. Partial state. I some cases, the data structure of a SE caot be partitioed because the access patter of TEs is arbitrary. For example, i the CF algorithm, the coocc matrix has a access patter, i which the updatecoocc TE may update ay row or colum. I this case, a SE is distributed by creatig multiple partial SE istaces, each cotaiig the whole data structure (see Fig. 2c). Partial SE istaces ca be updated idepedetly by differet TE istaces. Whe a TE accesses a partial SE, there are two possible types of accesses based o the sematics of the algorithm: a TE istace may access (i) the local SE istace o the same ode; or (ii) the global state by accessig all of the partial SE istaces, which itroduces a sychroisatio poit. As we describe i 4.2, the type of access to partial SEs is determied by aotatios. Whe accessig all partial SE istaces, it is possible to execute computatio that merges their values, thus recocilig the differeces betwee them. This is doe by a merge TE that computes a sigle global value from partial SE istaces. Merge computatio is applicatio-specific ad must be defied by the developer. I the CF algorithm, the merge fuctio takes all partial userrec vectors ad computes a sigle recommedatio vector. 3.3 Executio To execute a SDG, the rutime system allocates TE ad SE istaces to odes, creatig istaces o-demad. Allocatio to odes. Sice we wat to avoid remote state access, the geeral strategy is to colocate TEs ad 5 USENIX Associatio 24 USENIX Aual Techical Coferece 53

7 SEs that are coected by access edges o the same ode. The rutime system uses four steps for mappig TEs ad SEs to odes: if there is a cycle i the SDG, all SEs accessed i the cycle are colocated if possible to reduce commuicatio i iterative algorithms (step ); the remaiig SEs are allocated o separate odes to icrease available memory (step 2); TEs are colocated with the SEs that they access (step 3); ad fially, ay uallocated TEs are assiged to separate odes (step 4). Fig. illustrates the above steps for allocatig the SDG to odes to 3 : sice there are o cycles (step ), the useritem SE is assiged to ode, ad the coocc SE is assiged to 2 (step 2); the updateuseritem ad getuservec TEs are assiged to, ad the updatecoocc ad getrecvec TEs are assiged to 2 (step 3); fially, the merge TE is allocated to a ew ode 3 (step 4). Rutime parallelism ad stragglers. Processig bottleecks i the deployed SDG, e.g. caused by the computatioal cost of TEs, caot be predicted statically, ad TEs istaces may become stragglers [4]. Previous work [26] tries to reduce stragglers proactively for low latecy processig, which is hard due to the may o-determiistic causes of stragglers. Istead, similar to speculative executio i MapReduce [4], SDGs adopt a reactive approach. Usig a dyamic dataflow graph approach [], the rutime system chages the umber of TE istaces i respose to stragglers. Each TE is moitored to determie if it costitutes a processig bottleeck that limits throughput. If so, a ew TE istace is created, which may result i ew partitioed or partial SE istaces. 3.4 Discussio With a explicit represetatio of state, a sigle SDG ca express multiple workflows over that state. I the case of the CF algorithm from Alg., the SDG processes ew ratigs by updatig the SEs for the user/item ad cooccurrece matrices, ad also serves recommedatio requests usig the same SEs with low latecy. Without SDGs, these two workflows would require separate offlie ad olie systems [23, 32]: a batch processig framework would icorporate ew ratigs periodically, ad olie recommedatio requests would be served by a dedicated system from memory. Sice it is iefficiet to reru the batch job after each ew ratig, the recommedatios would be computed o stale data. A drawback of the materialised represetatio of SDGs is the start-up cost. For short jobs, the deploymet cost may domiate the ruig time. Our prototype implemetatio deploys a SDG with 5 TE ad SE istaces o 5 odes withi 7 s, ad we assume that jobs are sufficietly log-ruig to amortise this delay. 4 Programmig SDGs We describe how to traslate stateful Java programs statically to SDGs for parallel executio. We do ot attempt to be completely trasparet for developers or to address the geeral problem of automatic code parallelisatio. Istead, we exploit data ad pipelie parallelism by relyig o source code aotatios. We require developers to provide a sigle Java class with aotatios that idicate how state is distributed ad accessed. 4. Aotatios Whe defiig a field i a Java class, a developer ca idicate if its cotet ca be partitioed or is partial by aotatig the field declaratio This aotatio specifies that a field ca be split ito disjoit partitios (see 3.2). A referece to field always refers to a sigle partitio. This requires that access to the field uses a access key to ifer the partitio. I the CF algorithm i Alg., rows of the useritem matrix are updated with iformatio about a sigle user oly, ad thus useritem ca be declared as a partitioed Fields are aotated if distributed istaces of the field should be accessed idepedetly (see 3.2). Partial fields eable developers to defie distributed state whe it caot be partitioed. I CF, matrix coocc is aotated which meas that multiple istaces of the matrix may be created, ad each of them is updated idepedetly for users i a partitio (lies By default, a referece to field refers to oly oe of its istaces. While most of the time, computatio should apply to oe istace to make idepedet progress, it may also be ecessary to support operatios o all istaces. A field referece aotated forces a Java expressio to apply to all istaces, deotig global access to a partial field, which itroduces a sychroisatio barrier i the SDG (see 4.2). Java expressios derivig access become logically multi-valued because they iclude results from all istaces of a partial field. As a result, ay local variable that is assiged the result of a global field access becomes partial ad must be aotated as such. I CF, the access to the coocc field carries aotatio to compute all partial recommedatios: each istace of coocc is multiplied with the user ratig vector userrow, ad the results are stored i the partial local variable userrec (lie Global access to a partial field applies to all istaces, but it hides the idividual istaces from the developer. At some poit i the program, however, it may be ecessary to recocile all istaces. The USENIX Aual Techical Coferece USENIX Associatio

8 SEs aalysis Iput program code geeratio Itermediate represetatio Bytecode TE code assembly 6 cotrol flow partitioig SE extractio 2 TE extractio 4 state accesses SE access traslatio 7 SEs state accesses partitioig disptachig sematic SE access extractio 3 Live variable aalysis 5 live variables TE ivocatio 8 TEs bytecode Figure 3: Traslatio of a aotated Java program to a aotatio therefore exposes all istaces of a partial field or variable as a Java array access. This eables the program to iterate over all values ad, for example, merge them ito a sigle value. I CF, the partial recommedatios are combied by accessig them usig aotatio ad the ivokig the merge method (lie 7). The parameter of merge is aotated which specifies that the method ca access all istaces of the partial userrec variable to compute the fial recommedatio result. Limitatios. Java programs eed to obey certai restrictios to be traslated to SDGs due to their dataflow ature ad fault tolerace properties: Explicit state classes. All state i the program must be implemeted usig the set of SE classes (see 3.2). This gives the rutime system the ability to partitio objects of these classes ito multiple istaces (for partitioed state) or distribute them (for partial state), ad recover them after failure (see 5). Locatio idepedece. Each object accessed i the program must support trasparet serialisatio/deserialisatio: as SDGs are distributed, objects are propagated betwee odes. The program also caot make assumptios about its executio eviromet, e.g. by relyig o local etwork sockets or files. Side-effect-free parallelism. To support the parallel evaluatio of multi-valued expressios state access, such expressios must ot affect siglevalued expressios. For example, the coocc.multiply(userrow), i lie 6 i Alg. caot update userrow, which is sigle-valued. Determiistic executio. The program must be determiistic, i.e. it should ot deped o system time or radom iput. This eables the rutime system to re-execute computatio whe recoverig after failure (see 5). 4.2 Traslatig programs to SDGs Aotated Java programs are traslated to SDGs by the java2sdg tool. Fig. 3 shows the steps performed by java2sdg: it first statically aalyses the Java class to idetify SEs, TEs ad their access edges (steps 5); it the trasforms the Java bytecode of the class to geerate TE code, ready for deploymet (steps 6 8).. SE geeratio. The class is compiled to Jimple code, a typed itermediate represetatio for static aalysis used by the Soot framework [33] (step ). The Jimple code is aalysed to idetify SEs with partitioed or partial fields ad partial local variables (step 2). Based o the aotatios i the code, access to SEs is classified as local, partitioed or global (step 3). 2. TE ad dataflow geeratio. Next TEs are created so that each TE oly accesses a sigle SE, i.e. a ew TE is created from a block of code whe access to a differet SE or a differet istace of the curret SE is detected (step 4). The dispatchig sematics of the dataflows betwee created TEs (i.e. partitioed, all-tooe, oe-to-all or oe-to-ay) is chose based o the type of state access. More specifically, a ew TE is created:. for each etry poit of the class; 2. whe a TE uses partitioed access to a ew SE (or to a previously-accessed SE with a ew access key). The access key is extracted usig reachig expressio aalysis, ad the dataflow edge betwee the two TEs is aotated with the access key; 3. whe a TE uses global access to a ew partial SE. I this case, the dataflow edge betwee the two TEs is aotated with oe-to-all dispatchig sematics; 4. whe a TE uses local access to a ew partial SE, the dataflow edge is aotated with oe-to-ay dispatchig sematics. I case of local (or partitioed) access after global access, all TE istaces must be sychroised usig a distributed barrier before cotrol is trasferred to the ew TE, ad the dataflow edge has all-to-oe dispatchig sematics; ad 5. expressios. A sychroisatio barrier collects values from multiple TEs istaces, ad its dataflow edge has all-to-oe sematics. After geeratig the TEs, java2sdg idetifies the variables that must propagate across TEs boudaries (step 5). For each dataflow, live variable aalysis idetifies the set of variables that are associated with that dataflow edge. 3. Bytecode geeratio. Next java2sdg sythesises the bytecode for each TE that will be executed by the rutime system. It compiles the code assiged with each TE i step 4 to bytecode ad ijects it ito a TE template (step 6) usig Javassist. State accesses to fields ad partial variables are traslated to ivocatios of the rutime system, which maages the SE istaces (step 7). Fially data dispatchig across TEs is added (step 8): java2sdg ijects code, (i) at the exit poit of TEs, to serialise live variables ad sed them to the correct successor TE istace; ad, (ii) at the etry poit of a TE, to add barriers for all-to-oe dispatchig ad to gather partial results for merge TEs. 5 Failure Recovery To recover from failures, it is ecessary to replace failed odes ad re-istatiate their TEs ad SEs. TEs are state- 7 USENIX Associatio 24 USENIX Aual Techical Coferece 55

9 less ad thus are restored trivially, but the state of SEs must be recovered. We face the challege of desigig a recovery mechaism that: (i) ca scale to save ad recover the state of a large umber of odes with low overhead, eve with frequet failures; (ii) has low impact o the processig latecy; ad (iii) achieves fast recovery time whe recoverig large SEs. We achieve these goals with a mechaism that (a) combies local checkpoits with message replay, thus avoidig both global checkpoit coordiatio ad global rollbacks; (b) divides state of SEs ito cosistet state, which is checkpoited, ad dirty state, which permits cotiued processig while checkpoitig; ad (c) partitios checkpoits ad saves them to multiple odes, which eables parallel recovery. Approach. Our failure recovery mechaism combies local checkpoitig ad message loggig ad is ispired by failure recovery i distributed stream processig systems [4]. Nodes periodically take checkpoits of their local SEs ad output commuicatio buffers. Dataflows iclude icreasig TE-geerated scalar timestamps, ad a vector timestamp of the last data item from each iput dataflow that modified the SEs is icluded i the checkpoit. Oce the checkpoit is saved to stable storage, upstream odes ca trim their output buffers of data items that are older tha all dowstream checkpoits. After failure, a ode recovers its SEs from the last checkpoit, replays its output buffers ad reprocesses data items received from the upstream output buffers. Dowstream odes detect duplicate data items based o the timestamps ad discard them. This approach allows odes to recover SEs locally beyod the last checkpoit, without requirig odes to coordiate global rollback, ad it avoids the output commit problem. State checkpoitig. We use a asychroous parallel checkpoitig mechaism that miimises the processig iterruptio whe checkpoitig large SEs with GBs of memory. The idea is to record updates i a separate data structure, while takig a checkpoit. For each type of data structure held by a SE, there must be a implemetatio that supports the separatio of dirty state ad its subsequet cosolidatio. Checkpoitig of a ode works as follows: () to iitiate a checkpoit, each SE is flagged as dirty ad the output buffers are added to the checkpoit; (2) updates from TEs to a SE are ow hadled usig a dirty state data structure: e.g. updates to keys i a dictioary are writte to the dirty state, ad reads are first served by the dirty state ad, oly o a miss, by the dictioary; (3) asychroously to the processig, the ow cosistet state is added to the checkpoit; (4) the checkpoit is backed up to multiple odes (see below); ad (5) the SE is locked ad its state is cosolidated with the dirty state. State backup ad restore. To be memory-efficiet, B serialisatio threads chuks remote storage B2 Backup to m odes B3 2 m R R Restore to odes R3 Figure 4: Parallel, m-to- state backup ad restore checkpoits must be stored o disk. We overcome the problem of low I/O performace by splittig checkpoits across m odes. To reduce recovery time, a failed SE istace ca be restored to ew partitioed SE istaces i parallel. This m-to- patter prevets a sigle ode from becomig a disk, etwork or processig bottleeck. Fig. 4 shows the distributed protocol for backig up checkpoits. I step B, checkpoit chuks, e.g. obtaied by hash-partitioig checkpoit data, are created, ad a thread pool serialises them i parallel (step B2). Checkpoit chuks are streamed to m odes, selected i a roud-robi fashio (step B3). Nodes write received checkpoit chuks directly to disk. After failure, ew odes are istatiated with the lost TEs ad SEs. Each ode with a checkpoit chuk splits it ito partitios, each of which is streamed to oe of the recoverig istaces (step R). The ew SE istaces recocile the chuks, revertig the partitioig (step R2). Fially, data items from output buffers are reprocessed to brig the recovered SE state up-to-date (step R3). 6 Evaluatio The goal of our experimetal evaluatio is to explore if SDGs ca (i) execute stateful olie processig applicatios with low latecy ad high throughput while supportig large state sizes with fie-graied updates ( 6.); (ii) scale i terms of odes comparable to stateless batch processig frameworks ( 6.2); hadle stragglers at rutime with low impact o throughput ( 6.3); ad (iii) recover from failures with low overhead ( 6.4). We exted the SEEP streamig platform to implemet SDGs ad deploy our prototype o Amazo EC2 ad a private cluster with 7 quad-core 3.4 GHz Itel Xeo servers with 8 GB of RAM. To support fast recovery, the checkpoitig frequecy for all experimets is s uless stated otherwise. Cadlesticks i plots show the 5 th, 25 th, 5 th, 75 th ad 95 th percetiles, respectively. 6. Stateful olie processig Throughput ad latecy. First we ivestigate the performace of SDGs usig the olie collaborative filterig (CF) applicatio (see 2.). We deploy it o 36 EC2 VM istaces ( c.xlarge ; 8 vcpus with 7 GB) usig the Netflix dataset, which cotais millio movie USENIX Aual Techical Coferece USENIX Associatio

10 Throughput ( requests/s) Throughput Latecy :5 :2 : 2: 5: Workload (state read/write ratio) Figure 5: Throughput ad latecy with differet read/write ratios (olie collaborative filterig) ratigs for evaluatig recommeder systems. We add ew ratigs cotiuously (addratig), while requestig fresh recommedatios (getrec). The state size maitaied by the system grows to 2 GB. Fig. 5 shows the throughput of getrec ad addratig requests ad the latecies of getrec requests whe the ratio betwee the two is varied. The achieved throughput is sufficiet to serve, 4, requests/s, with the 95 th percetile of resposes beig at most.5 s stale. As the workload ratio icludes more state reads (getrec), the throughput decreases slightly due to the cost of the sychroisatio barrier that aggregates the partial state i the SDG. The result shows that SDGs ca combie the fuctioality of a batch ad a olie processig system, while servig fresh results with low latecy ad high throughput over large mutable state. State size. Next we evaluate the performace of SDGs as the state size icreases. As a sythetic bechmark, we implemet a distributed partitioed key/value store (KV) usig SDGs because it exemplifies a algorithm with pure mutable state. We compare to a equivalet implemetatio i Naiad (versio.2) with global checkpoitig, which is the oly fault-tolerace mechaism available i the ope-source versio. We deploy it i oe VM ( m.xlarge ) ad measure the performace of servig update requests for keys. Fig. 6 shows that, for a small state size of MB, both SDGs ad Naiad exhibit similar throughput of 65, requests/s with low latecy. As the state size icreases to 2.5 GB, the SDG throughput is largely uaffected but Naiad s throughput decreases due to the overhead of its disk-based checkpoits (Naiad-Disk). Eve with checkpoits stored o a RAM disk (Naiad- NoDisk), its throughput with 2.5 GB of state is 63% lower tha that of SDGs. Similarly, the 95 th percetile latecy i Naiad icreases whe it stops processig durig checkpoitig SDGs do ot suffer from this problem. To ivestigate how SDGs ca support large distributed state across multiple odes, we scale the KV store by icreasig the umber of VMs from to 4, keepig the umber of dictioary keys per ode costat at 5 GB. Fig. 7 shows the throughput ad the latecy for read requests with a give total state size. The aggregate Latecy (ms) Throughput (, requests/s) SDG Naiad-NoDisk Naiad-Disk SDG (latecy) Naiad-NoDisk (latecy) 2 Aggregated memory (MB) Figure 6: Throughput ad latecy with icreasig state size o sigle ode (key/value store) Latecy (ms) Throughput (millio requests/s) Throughput Latecy Aggregated memory (GB) Figure 7: Throughput ad latecy with icreasig state size o multiple odes (key/value store) throughput scales ear liearly from 47, requests/s for 5 GB to.5 millio requests/s for 2 GB. The media latecy icreases from 8 29 ms, while the 95 th percetile latecy varies betwee 8 ms ad ms. This result demostrates that SDGs ca support stateful applicatios with large state sizes without compromisig throughput or processig latecy, while executig i a fault-tolerat fashio. Update graularity. We show the performace of SDGs whe performig frequet, fie-graied updates to state. For this, we deploy a streamig wordcout (WC) applicatio o 4 odes i our private cluster. WC reports the word frequecies over a wall clock time widow while processig the Wikipedia dataset. We compare to WC implemetatios i Streamig Spark [39] ad Naiad. We vary the size of the widow, which cotrols the graularity at which iput data updates the state: the smaller the widow size, the less batchig ca be doe whe updatig the state. Sice Naiad permits the cofiguratio of the batch size idepedetly of the widow size, we use a small batch size ( messages) for low-latecy (Naiad-LowLatecy) ad a large oe (2, messages) for high-throughput processig (Naiad-HighThroughput). Fig. 8 shows that oly SDG ad Naiad-LowLatecy ca sustai processig for all widow sizes, but SDG has a higher throughput due to Naiad s schedulig overhead. The other deploymets suffer from the overhead of micro-batchig: Streamig Spark has a throughput similar to SDG, but its smallest sustaiable widow size is 25 ms, after which its throughput collapses; Naiad- HighThroughput achieves the highest throughput of all, but it also caot support widows smaller tha ms. This shows that SDGs ca perform fie-graied state updates without tradig off throughput for latecy. 6.2 Scalability We explore if SDGs ca scale to higher throughput with more odes i a batch processig sceario. We deploy a implemetatio of logistic regressio (LR) [2] o EC2 ( m.xlarge ; 4 vcpus with 5 GB). We compare to LR from Spark [38], which is desiged for iterative processig, usig the GB dataset provided i its release. Latecy (ms) 9 USENIX Associatio 24 USENIX Aual Techical Coferece 57

11 Throughput ( requests/s) Naiad-HighThroughput 5 SDG Streamig Spark Naiad-LowLatecy Widow size (ms) Figure 8: Latecy with differet widow sizes (streamig wordcout) Throughput (GB/s) SDG Spark Fig. 9 shows the throughput of our SDG implemetatio ad Spark for 25 odes. Both systems exhibit liear scalability. The throughput of SDGs is higher tha Spark, which is likely due to the pipeliig i SDGs, which avoids the re-istatiatio of tasks after each iteratio. With higher throughput, iteratios are shorter, which leads to a faster covergece time. We coclude that the maagemet of partial state i the LR applicatio does ot limit scalability compared to existig stateless dataflow systems. 6.3 Stragglers We explore how SDGs hadle stragglig odes by creatig ew TE ad SE istaces at rutime (see 3.3). For this, we deploy the CF applicatio o our cluster ad iclude a less powerful machie (2.4 GHz with 4 GB). Fig. shows how the throughput ad the umber of odes chages over time as bottleeck TEs are idetified by the system. At the start, a sigle istace of the getrecvec TE is deployed. It is idetified as a bottleeck, ad a secod istace is added at t = s, which also causes a ew istace of the partial state i the coocc matrix to be created. This icreases the throughput from requests/s. The throughput spikes occur whe the iput queues of ew TE istaces fill up. Sice the ew ode is allocated o the less powerful machie, it becomes a straggler, limitig overall throughput. At t =3 s, addig a ew TE istace without relievig the straggler does ot icrease the throughput. At t =5 s, the stragglig ode is detected by the system, ad a ew istace is created to share its work. This icreases the throughput from 62, requests/s. This shows how stragglig odes are mitigated by allocatig ew TE istaces o-demad, distributig ew partial or partitioed SE istaces as required. I more extreme cases, a stragglig ode could eve be removed ad the job resumed from a checkpoit with ew odes. 6.4 Failure recovery We evaluate the performace ad overhead of our failure recovery mechaism for SDGs. We (i) explore the recovery time uder differet recovery strategies; (ii) assess the advatages of our asychroous checkpoitig mechaism; ad (iii) ivestigate the overhead with differet checkpoitig frequecies ad state sizes. We deploy Number of odes Figure 9: Scalability i terms of throughput (batch logistic regressio) Throughput ( request/s) Time (s) Throughput Nodes Figure : Rutime parallelism for hadlig stragglers (collaborative filterig) the KV store o oe ode of our cluster, together with spare odes to store backups ad replace failed odes. Recovery time. We fail the ode uder differet recovery strategies: a m-to- recovery strategy uses m backup odes to restore to recovered odes (see 5). For each, we measure the time to restore the lost SE, re-process uprocessed data ad resume processig. Fig. shows the recovery times for differet SE sizes uder differet strategies: (i) the simplest strategy, -to-, has the logest recovery time, especially with large state sizes, because the state is restored from a sigle ode; (ii) the 2-to- strategy streams checkpoit chuks from two odes i parallel, which improves disk I/O throughput but also icreases the load o the recoverig ode whe it recostitutes the state; (iii) i the -to-2 strategy, checkpoit chuks are streamed to two recoverig odes, thus halvig the load of state recostructio; ad (iv) the 2-to-2 strategy recovers fastest because it combies the above two strategies it parallelises both the disk reads ad the state recostructio. As the state becomes large, state recostructio domiates over disk I/O overhead: with 4 GB, streamig from two disks does ot improve recovery time. Adoptig a strategy that recovers a failed ode with multiple odes, however, has sigificat beefit, compared to cases with smaller state sizes. Sychroous vs. asychroous checkpoitig. We ivestigate the beefit of our asychroous checkpoitig mechaism i compariso with sychroous checkpoitig that stops processig, as used by Naiad [26] ad SEEP []. Fig. 2 compares the throughput ad 99 th percetile latecy with icreasig state sizes. As the checkpoit size grows from 4 GB, the average throughput uder sychroous checkpoitig reduces by 33%, ad the latecy icreases from 2 8 s because the system stops processig while checkpoitig. With asychroous checkpoitig, there is oly a small (~5%) impact o throughput. Latecy is a order of magitude lower ad oly moderately affected (from 2 5 ms). This result shows that a sychroous checkpoitig approach caot achieve low-latecy processig with large state sizes. Overhead of asychroous checkpoitig. Next we evaluate the overhead of our checkpoitig mechaism Number of odes USENIX Aual Techical Coferece USENIX Associatio

12 Recovery time (s) to- recovery 2-to- recovery -to-2 recovery 2-to-2 recovery 2 4 State size (GB) Throughput ( requests/s) T'put (Asyc) T'put (Syc) Latecy (Syc) State size (GB) Figure : Recovery times with differet Figure 2: m-to- recovery strategies asyc. checkpoitig as a fuctio of checkpoitig frequecy ad state size. Fig. 3 (top) shows the processig latecy whe varyig the checkpoitig frequecy. The rightmost data poit (No FT) represets the case where the checkpoitig mechaism is disabled. The bottom figure reports the impact of the size of the checkpoit o latecy. Checkpoitig has a limited impact o latecy: without fault tolerace, the 95 th percetile latecy is 68 ms, ad it icreases to 5 ms whe checkpoitig GB every s. This is due to the overhead of mergig dirty state ad savig checkpoits to disk. Icreasig the checkpoitig frequecy or size gradually also icreases latecy: the 95 th percetile latecy with 4 GB is 85 ms, while checkpoitig 2 GB every 4 s results i s. Beyod that, the checkpoitig overhead starts to impact higher percetiles more sigificatly. Checkpoitig frequecy ad size behave almost proportioally: as the state size icreases, the frequecy ca be reduced to maitai a low processig latecy. Overall this experimet demostrates the stregth of our checkpoitig mechaism, which oly locks state while mergig dirty state. The lockig overhead thus reduces proportioally to the state update rate. 7 Related Work Programmig model. Data-parallel frameworks typically support a fuctioal/declarative model: MapReduce [8] oly has two higher-order fuctios; more recet frameworks [5, 38, 3] permit user-defied fuctioal operators; ad Naiad [26] supports differet fuctioal ad declarative programmig models o top of its timely dataflow model. CBP [9], Storm ad SEEP [] expose a low-level dataflow programmig model: algorithms are defied as a dataflow pipelie, which is harder to program ad debug. While fuctioal ad dataflow models ease distributio ad fault tolerace, SDGs target a imperative programmig model, which remais widely used by data scietists [7]. Efforts exist to brig imperative programmig to dataparallel processig. CIEL [25] uses imperative costructs such as task spawig ad futures, but this exposes the low-level executio of the dyamic dataflow graph to developers. Piccolo [3] ad Oolog [24] offer imperative compute kerels with distributed state, which... Compariso of syc. ad Latecy (s) Latecy (ms) Latecy (ms) No FT Checkpoit frequecy (s) No FT State size (GB) Figure 3: Impact of checkpoit frequecy ad size o latecy requires algorithms to be structured accordigly. I cotrast, SDGs simplify the traslatio of imperative programs to dataflows usig basic program aalysis techiques, which ifer state accesses ad the dataflow. By separatig differet types of state access, it becomes possible to choose automatically a effective implemetatio for distributed state. GraphLab [2] ad Pregel [22] are frameworks for graph computatios based o a shared-memory abstractio. They expose a vertex-cetric programmig model whereas SDGs target geeric stateful computatio. Program parallelisatio. Matlab has laguage costructs for parallel processig of large datasets o clusters. However, it oly supports the parallelisatio of sequetial blocks or iteratios ad ot of geeral dataflows. Declarative models such as Pig [28], DyradLINQ [37], SCOPE [6] ad Stratosphere [9] are aturally ameable to automatic parallelisatio fuctios are stateless, which allows data-parallel versios to execute o multiple odes. Istead, we focus o a imperative model. Other approaches offer ew programmig abstractios for parallel computatio over distributed state. FlumeJava [7] provides distributed immutable collectios. While immutability simplifies parallel executio, it limits the expressio of imperative algorithms. I Piccolo [3], global mutable state is accessed remotely by parallel distributed fuctios. I cotrast, tasks i SDGs oly access local state with low latecy, ad state is always colocated with computatio. Presto [35] has distributed partitioed arrays for the R laguage. Partitios ca be collected but ot updated by multiple tasks, whereas SDGs permit arbitrary dataflows. Extractig parallel dataflows from imperative programs is a hard problem [6]. We follow a approach similar to that of Beck et al. [3], i which a dataflow graph is geerated compositioally from the executio graph. While early work focused o hardware-based dataflow models [27], more recet efforts target threadbased executio [8]. Our problem is simpler because we do ot extract task parallelism but oly focus o data ad pipelie parallelism i relatio to distributed state access. Similar to pragma-based techiques [34], we use aotatios to trasform access to distributed state ito access to local istaces. Blazes [2] uses aotatios to USENIX Associatio 24 USENIX Aual Techical Coferece 59

13 geerate automatically coordiatio code for distributed programs. Our goal is differet: SDGs execute imperative code i a distributed fashio, ad coordiatio is determied by the extracted dataflow. Failure recovery. I-memory systems are proe to failures [], ad fast recovery is importat for lowlatecy ad high-throughput processig. With large state sizes, checkpoits caot be stored i memory, but storig them o disk ca icrease recovery time. RAM- Cloud [29] replicates data across cluster memory ad evetually backs it up to persistet storage. Similar to our approach, data is recovered from multiple disks i parallel. However, rather tha replicatig each write request, we checkpoit large state atomically, while permittig ew requests to operate o dirty state. Streamig Spark [39] ad Spark [38] use RDDs for recovery. After a failure, RDDs are recomputed i parallel o multiple odes. Such a recovery mechaism is effective if recomputatio is iexpesive for state that depeds o the etire history of the data, it would be prohibitive. I cotrast, the parallel recovery i SDGs retrieves partitioed checkpoits from multiple odes, ad oly reprocesses data from output buffers to brig restored SE istaces up-to-date. 8 Coclusios Data-parallel processig frameworks must offer a familiar programmig model with good performace. Supportig imperative olie machie learig algorithms poses challeges to frameworks due to their use of large distributed state with fie-graied access. We describe stateful dataflow graphs (SDGs), a dataparallel model that is desiged to offer a dataflow abstractio over large mutable state. With the help of aotatios, imperative algorithms ca be traslated to SDGs, which maage partitioed or partial distributed state. As we demostrated i our evaluatio, SDGs ca support diverse stateful applicatios, thus geeralisig a umber of existig data-parallel computatio models. Ackowledgemets. This work was supported by a PhD CASE Award fuded by EPSRC/BAE Systems. We thak our PC cotact, Jiyag Li, ad the aoymous ATC reviewers for their feedback ad guidace. Refereces [] AKIDAU, T., BALIKOV, A., ET AL. MillWheel: Fault-Tolerat Stream Processig at Iteret Scale. I VLDB (23). [2] ALVARO, P., CONWAY, N., ET AL. Blazes: Coordiatio Aalysis for Distributed Programs. I ICDE (24). [3] BECK, M., AND PINGALI, K. From Cotrol Flow to Dataflow. I ICPP (99). [4] BHATOTIA, P., WIEDER, A., ET AL. Icoop: MapReduce for Icremetal Computatios. I SOCC (2). [5] BU, Y.,HOWE, B., ET AL. HaLoop: Efficiet Iterative Data Processig o Large Clusters. I VLDB (2). [6] CHAIKEN, R., JENKINS, B., ET AL. SCOPE: Easy ad Efficiet Parallel Processig of Massive Data Sets. I VLDB (28). [7] CHAMBERS, C., RANIWALA, A., ET AL. FlumeJava: Easy, Efficiet Data-Parallel Pipelies. I PLDI (2). [8] DEAN, J., AND GHEMAWAT, S. MapReduce: Simplified Data Processig o Large Clusters. I CACM (28). [9] EWEN, S., TZOUMAS, K., ET AL. Spiig Fast Iterative Data Flows. I VLDB (22). [] FERNANDEZ, R. C., MIGLIAVACCA, M., ET AL. Itegratig Scale Out ad Fault Tolerace i Stream Processig usig Operator State Maagemet. I SIGMOD (23). [] GUNDA, P. K., RAVINDRANATH, L., ET AL. Nectar: Automatic Maagemet of Data ad Comp. i Dataceters. I OSDI (2). [2] HE, B., YANG, M., ET AL. Comet: Batched Stream Processig for Data Itesive Distributed Computig. I SOCC (2). [3] HUESKE, F., PETERS, M., ET AL. Opeig the Black Boxes i Data Flow Optimizatio. I VLDB (22). [4] HWANG, J.-H., BALAZINSKA, M., ET AL. High-Availability Algorithms for Distributed Stream Processig. I ICDE (25). [5] ISARD, M., BUDIU, M., ET AL. Dryad: Dist. Data-Parallel Programs from Sequetial Buildig Blocks. I EuroSys (27). [6] JOHNSTON, W. M., HANNA, J., ET AL. Advaces i Dataflow Programmig Laguages. I CSUR (24). [7] KDNUGGETS ANNUAL SOFTWARE POLL. RapidMier ad R vie for the First Place [8] LI, F., POP, A., ET AL. Automatic Extractio of Coarse-Graied Data-Flow Threads from Imperative Programs. I Micro (22). [9] LOGOTHETHIS, D., OLSON, C., ET AL. Stateful Bulk Processig for Icremetal Aalytics. I SOCC (2). [2] LOW, Y., BICKSON, D., ET AL. Dist. GraphLab: A Framework for ML ad Data Miig i the Cloud. I VLDB (22). [2] MA, J., SAUL, L. K., ET AL. Idetifyig Suspicious URLs: a Applicatio of Large-Scale Olie Learig. I ICML (29). [22] MALEWICZ, G., AUSTERN, M. H., ET AL. Pregel: A System for Large-scale Graph Processig. I SIGMOD (2). [23] MISHNE, G., DALTON, J., ET AL. Fast Data i the Era of Big Data: Twitter s Real-Time Related Query Suggestio Architecture. I SIGMOD (23). [24] MITCHELL, C., POWER, R., ET AL. Oolog: Asychroous Distributed Applicatios Made Easy. I APSYS (22). [25] MURRAY, D., SCHWARZKOPF, M., ET AL. CIEL: A Uiversal Exec. Egie for Distributed Data-Flow Comp. I NSDI (2). [26] MURRAY, D. G., MCSHERRY, F., ET AL. Naiad: A Timely Dataflow System. I SOSP (23). [27] NIKHIL, R. S., ET AL. Executig a Program o the MIT Tagged- Toke Dataflow Architecture. I TC (99). [28] OLSTON, C., REED, B., ET AL. Pig Lati: A Not-So-Foreig Laguage for Data Processig. I SIGMOD (28). [29] ONGARO, D., RUMBLE, S. M., ET AL. Fast Crash Recovery i RAMcloud. I SOSP (2). [3] POWER, R., AND LI, J. Piccolo: Buildig Fast, Distributed Programs with Partitioed Tables. I OSDI (2). [3] SCHELTER, S., EWEN, S., ET AL. All Roads Lead to Rome: Optimistic Recovery for Distributed Iterative Data Processig. I CIKM (23). [32] SUMBALY, R., KREPS, J., ET AL. The Big Data Ecosystem at LikedI. I SIGMOD (23). [33] VALLÉE-RAI, R., HENDREN, L., ET AL. Soot: A Java Optimizatio Framework. I CASCON (999). [34] VANDIERENDONCK, H., RUL, S., ET AL. The Paralax Ifrastructure: Automatic Parallelizatio with a Helpig Had. I PACT (2). [35] VENKATARAMAN, S., BODZSAR, E., ET AL. Presto: Dist. ML ad Graph Processig with Sparse Matrices. I EuroSys (23). [36] XIN, R. S., ROSEN, J., ET AL. Shark: SQL ad Rich Aalytics at Scale. I SIGMOD (23). [37] YU, Y.,ISARD, M., ET AL. DryadLINQ: a System for Geeral- Purpose Distributed Data-Parallel Computig usig a High-Level Laguage. I OSDI (28). [38] ZAHARIA, M., CHOWDHURY, M., ET AL. Resiliet Distributed Datasets: A Fault-Tolerat Abstractio for I-Memory Cluster Computig. I NSDI (22). [39] ZAHARIA, M., DAS, T., ET AL. Discretized Streams: Faulttolerat Streamig Computatio at Scale. I SOSP (23). [4] ZAHARIA, M., KONWINSKI, A., ET AL. Improvig MapReduce Performace i Heterogeeous Eviromets. I OSDI (28) USENIX Aual Techical Coferece USENIX Associatio

Domain 1: Designing a SQL Server Instance and a Database Solution

Domain 1: Designing a SQL Server Instance and a Database Solution Maual SQL Server 2008 Desig, Optimize ad Maitai (70-450) 1-800-418-6789 Domai 1: Desigig a SQL Server Istace ad a Database Solutio Desigig for CPU, Memory ad Storage Capacity Requiremets Whe desigig a

More information

Stateful Distributed Dataflow Graphs: Imperative Big Data Programming for the Masses

Stateful Distributed Dataflow Graphs: Imperative Big Data Programming for the Masses Stateful Distributed Dataflow Graphs: Imperative Big Data Programming for the Masses Peter Pietzuch [email protected] Large-Scale Distributed Systems Group Department of Computing, Imperial College London

More information

(VCP-310) 1-800-418-6789

(VCP-310) 1-800-418-6789 Maual VMware Lesso 1: Uderstadig the VMware Product Lie I this lesso, you will first lear what virtualizatio is. Next, you ll explore the products offered by VMware that provide virtualizatio services.

More information

Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect

Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect Matteo Migliavacca (mm53@kent) School of Computing Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect Simple past - Traditional

More information

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature. Itegrated Productio ad Ivetory Cotrol System MRP ad MRP II Framework of Maufacturig System Ivetory cotrol, productio schedulig, capacity plaig ad fiacial ad busiess decisios i a productio system are iterrelated.

More information

Incremental calculation of weighted mean and variance

Incremental calculation of weighted mean and variance Icremetal calculatio of weighted mea ad variace Toy Fich [email protected] [email protected] Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically

More information

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis Ruig Time ( 3.) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

Reliability Analysis in HPC clusters

Reliability Analysis in HPC clusters Reliability Aalysis i HPC clusters Narasimha Raju, Gottumukkala, Yuda Liu, Chokchai Box Leagsuksu 1, Raja Nassar, Stephe Scott 2 College of Egieerig & Sciece, Louisiaa ech Uiversity Oak Ridge Natioal Lab

More information

Configuring Additional Active Directory Server Roles

Configuring Additional Active Directory Server Roles Maual Upgradig your MCSE o Server 2003 to Server 2008 (70-649) 1-800-418-6789 Cofigurig Additioal Active Directory Server Roles Active Directory Lightweight Directory Services Backgroud ad Cofiguratio

More information

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2 Itroductio DAME - Microsoft Excel add-i for solvig multicriteria decisio problems with scearios Radomir Perzia, Jaroslav Ramik 2 Abstract. The mai goal of every ecoomic aget is to make a good decisio,

More information

CHAPTER 3 THE TIME VALUE OF MONEY

CHAPTER 3 THE TIME VALUE OF MONEY CHAPTER 3 THE TIME VALUE OF MONEY OVERVIEW A dollar i the had today is worth more tha a dollar to be received i the future because, if you had it ow, you could ivest that dollar ad ear iterest. Of all

More information

LECTURE 13: Cross-validation

LECTURE 13: Cross-validation LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M

More information

ODBC. Getting Started With Sage Timberline Office ODBC

ODBC. Getting Started With Sage Timberline Office ODBC ODBC Gettig Started With Sage Timberlie Office ODBC NOTICE This documet ad the Sage Timberlie Office software may be used oly i accordace with the accompayig Sage Timberlie Office Ed User Licese Agreemet.

More information

5 Boolean Decision Trees (February 11)

5 Boolean Decision Trees (February 11) 5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected

More information

Effective Techniques for Message Reduction and Load Balancing in Distributed Graph Computation

Effective Techniques for Message Reduction and Load Balancing in Distributed Graph Computation Effective Techiques for Message Reductio ad Load Balacig i Distributed Graph Computatio ABSTRACT Da Ya, James Cheg, Yi Lu Dept. of Computer Sciece ad Egieerig The Chiese Uiversity of Hog Kog {yada, jcheg,

More information

Department of Computer Science, University of Otago

Department of Computer Science, University of Otago Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly

More information

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

The Forgotten Middle. research readiness results. Executive Summary

The Forgotten Middle. research readiness results. Executive Summary The Forgotte Middle Esurig that All Studets Are o Target for College ad Career Readiess before High School Executive Summary Today, college readiess also meas career readiess. While ot every high school

More information

A Secure Implementation of Java Inner Classes

A Secure Implementation of Java Inner Classes A Secure Implemetatio of Java Ier Classes By Aasua Bhowmik ad William Pugh Departmet of Computer Sciece Uiversity of Marylad More ifo at: http://www.cs.umd.edu/~pugh/java Motivatio ad Overview Preset implemetatio

More information

INVESTMENT PERFORMANCE COUNCIL (IPC)

INVESTMENT PERFORMANCE COUNCIL (IPC) INVESTMENT PEFOMANCE COUNCIL (IPC) INVITATION TO COMMENT: Global Ivestmet Performace Stadards (GIPS ) Guidace Statemet o Calculatio Methodology The Associatio for Ivestmet Maagemet ad esearch (AIM) seeks

More information

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology Adoptio Date: 4 March 2004 Effective Date: 1 Jue 2004 Retroactive Applicatio: No Public Commet Period: Aug Nov 2002 INVESTMENT PERFORMANCE COUNCIL (IPC) Preface Guidace Statemet o Calculatio Methodology

More information

Optimize your Network. In the Courier, Express and Parcel market ADDING CREDIBILITY

Optimize your Network. In the Courier, Express and Parcel market ADDING CREDIBILITY Optimize your Network I the Courier, Express ad Parcel market ADDING CREDIBILITY Meetig today s challeges ad tomorrow s demads Aswers to your key etwork challeges ORTEC kows the highly competitive Courier,

More information

Engineering Data Management

Engineering Data Management BaaERP 5.0c Maufacturig Egieerig Data Maagemet Module Procedure UP128A US Documetiformatio Documet Documet code : UP128A US Documet group : User Documetatio Documet title : Egieerig Data Maagemet Applicatio/Package

More information

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments Project Deliverables CS 361, Lecture 28 Jared Saia Uiversity of New Mexico Each Group should tur i oe group project cosistig of: About 6-12 pages of text (ca be loger with appedix) 6-12 figures (please

More information

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics Chair for Network Architectures ad Services Istitute of Iformatics TU Müche Prof. Carle Network Security Chapter 2 Basics 2.4 Radom Number Geeratio for Cryptographic Protocols Motivatio It is crucial to

More information

Effective Techniques for Message Reduction and Load Balancing in Distributed Graph Computation

Effective Techniques for Message Reduction and Load Balancing in Distributed Graph Computation Effective Techiques for Message Reductio ad Load Balacig i Distributed Graph Computatio ABSTRACT Da Ya, James Cheg, Yi Lu Dept. of Computer Sciece ad Egieerig The Chiese Uiversity of Hog Kog {yada, jcheg,

More information

CS100: Introduction to Computer Science

CS100: Introduction to Computer Science Review: History of Computers CS100: Itroductio to Computer Sciece Maiframes Miicomputers Lecture 2: Data Storage -- Bits, their storage ad mai memory Persoal Computers & Workstatios Review: The Role of

More information

Evaluating Model for B2C E- commerce Enterprise Development Based on DEA

Evaluating Model for B2C E- commerce Enterprise Development Based on DEA , pp.180-184 http://dx.doi.org/10.14257/astl.2014.53.39 Evaluatig Model for B2C E- commerce Eterprise Developmet Based o DEA Weli Geg, Jig Ta Computer ad iformatio egieerig Istitute, Harbi Uiversity of

More information

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth Questio 1: What is a ordiary auity? Let s look at a ordiary auity that is certai ad simple. By this, we mea a auity over a fixed term whose paymet period matches the iterest coversio period. Additioally,

More information

QUADRO tech. FSA Migrator 2.6. File Server Migrations - Made Easy

QUADRO tech. FSA Migrator 2.6. File Server Migrations - Made Easy QUADRO tech FSA Migrator 2.6 File Server Migratios - Made Easy FSA Migrator Cosolidate your archived ad o-archived File Server data - with ease! May orgaisatios struggle with the cotiuous growth of their

More information

Report Documentation Page

Report Documentation Page Applyig performace models to uderstad data-itesive computig efficiecy Elie Krevat, Tomer Shira, Eric Aderso, Joseph Tucek, Jay J. Wylie, Gregory R. Gager Caregie Mello Uiversity HP Labs CMU-PDL-10-108

More information

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria

More information

Neolane Leads. Neolane v6.1

Neolane Leads. Neolane v6.1 Neolae Leads Neolae v6.1 This documet, ad the software it describes, are provided subject to a Licese Agreemet ad may ot be used or copied outside of the provisios of the Licese Agreemet. No part of this

More information

Automatic Tuning for FOREX Trading System Using Fuzzy Time Series

Automatic Tuning for FOREX Trading System Using Fuzzy Time Series utomatic Tuig for FOREX Tradig System Usig Fuzzy Time Series Kraimo Maeesilp ad Pitihate Soorasa bstract Efficiecy of the automatic currecy tradig system is time depedet due to usig fixed parameters which

More information

Cantilever Beam Experiment

Cantilever Beam Experiment Mechaical Egieerig Departmet Uiversity of Massachusetts Lowell Catilever Beam Experimet Backgroud A disk drive maufacturer is redesigig several disk drive armature mechaisms. This is the result of evaluatio

More information

A Flexible Elastic Control Plane for Private Clouds

A Flexible Elastic Control Plane for Private Clouds A Flexible Elastic otrol Plae for Private louds Upedra Sharma IBM Watso [email protected] Prashat Sheoy Dept. of omputer Sciece Amherst MA 01003 [email protected] Sambit Sahu IBM Watso [email protected]

More information

The Power of Both Choices: Practical Load Balancing for Distributed Stream Processing Engines

The Power of Both Choices: Practical Load Balancing for Distributed Stream Processing Engines The Power of Both Choices: Practical Load Balacig for Distributed Stream Processig Egies Muhammad Ais Uddi Nasir #1, Giamarco De Fracisci Morales 2, David García-Soriao 3 Nicolas Kourtellis 4, Marco Serafii

More information

MapReduce Based Implementation of Aggregate Functions on Cassandra

MapReduce Based Implementation of Aggregate Functions on Cassandra Iteratioal Joural of Electroics Commuicatio ad Couter Techology (IJECCT) MapReduce Based Ilemetatio of Aggregate Fuctios o Cassadra Aseh Daesh Arasteh Departmet of Couter ad IT Islamic Azad Uiversity of

More information

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design A Combied Cotiuous/Biary Geetic Algorithm for Microstrip Atea Desig Rady L. Haupt The Pesylvaia State Uiversity Applied Research Laboratory P. O. Box 30 State College, PA 16804-0030 [email protected] Abstract:

More information

Baan Service Master Data Management

Baan Service Master Data Management Baa Service Master Data Maagemet Module Procedure UP069A US Documetiformatio Documet Documet code : UP069A US Documet group : User Documetatio Documet title : Master Data Maagemet Applicatio/Package :

More information

Domain 1 - Describe Cisco VoIP Implementations

Domain 1 - Describe Cisco VoIP Implementations Maual ONT (642-8) 1-800-418-6789 Domai 1 - Describe Cisco VoIP Implemetatios Advatages of VoIP Over Traditioal Switches Voice over IP etworks have may advatages over traditioal circuit switched voice etworks.

More information

The Power of Both Choices: Practical Load Balancing for Distributed Stream Processing Engines

The Power of Both Choices: Practical Load Balancing for Distributed Stream Processing Engines The Power of Both Choices: Practical Load Balacig for Distributed Stream Processig Egies Muhammad Ais Uddi Nasir #1, Giamarco De Fracisci Morales 2, David García-Soriao 3 Nicolas Kourtellis 4, Marco Serafii

More information

Domain 1: Configuring Domain Name System (DNS) for Active Directory

Domain 1: Configuring Domain Name System (DNS) for Active Directory Maual Widows Domai 1: Cofigurig Domai Name System (DNS) for Active Directory Cofigure zoes I Domai Name System (DNS), a DNS amespace ca be divided ito zoes. The zoes store ame iformatio about oe or more

More information

IntelliSOURCE Comverge s enterprise software platform provides the foundation for deploying integrated demand management programs.

IntelliSOURCE Comverge s enterprise software platform provides the foundation for deploying integrated demand management programs. ItelliSOURCE Comverge s eterprise software platform provides the foudatio for deployig itegrated demad maagemet programs. ItelliSOURCE Demad maagemet programs such as demad respose, eergy efficiecy, ad

More information

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return EVALUATING ALTERNATIVE CAPITAL INVESTMENT PROGRAMS By Ke D. Duft, Extesio Ecoomist I the March 98 issue of this publicatio we reviewed the procedure by which a capital ivestmet project was assessed. The

More information

Domain 1: Identifying Cause of and Resolving Desktop Application Issues Identifying and Resolving New Software Installation Issues

Domain 1: Identifying Cause of and Resolving Desktop Application Issues Identifying and Resolving New Software Installation Issues Maual Widows 7 Eterprise Desktop Support Techicia (70-685) 1-800-418-6789 Domai 1: Idetifyig Cause of ad Resolvig Desktop Applicatio Issues Idetifyig ad Resolvig New Software Istallatio Issues This sectio

More information

C.Yaashuwanth Department of Electrical and Electronics Engineering, Anna University Chennai, Chennai 600 025, India..

C.Yaashuwanth Department of Electrical and Electronics Engineering, Anna University Chennai, Chennai 600 025, India.. (IJCSIS) Iteratioal Joural of Computer Sciece ad Iformatio Security, A New Schedulig Algorithms for Real Time Tasks C.Yaashuwath Departmet of Electrical ad Electroics Egieerig, Aa Uiversity Cheai, Cheai

More information

AdaLab. Adaptive Automated Scientific Laboratory (AdaLab) Adaptive Machines in Complex Environments. n Start Date: 1.4.15

AdaLab. Adaptive Automated Scientific Laboratory (AdaLab) Adaptive Machines in Complex Environments. n Start Date: 1.4.15 AdaLab AdaLab Adaptive Automated Scietific Laboratory (AdaLab) Adaptive Machies i Complex Eviromets Start Date: 1.4.15 Scietific Backgroud The Cocept of a Robot Scietist Computer systems capable of origiatig

More information

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5

More information

Lecture 2: Karger s Min Cut Algorithm

Lecture 2: Karger s Min Cut Algorithm priceto uiv. F 3 cos 5: Advaced Algorithm Desig Lecture : Karger s Mi Cut Algorithm Lecturer: Sajeev Arora Scribe:Sajeev Today s topic is simple but gorgeous: Karger s mi cut algorithm ad its extesio.

More information

1 Correlation and Regression Analysis

1 Correlation and Regression Analysis 1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio

More information

Sequences and Series

Sequences and Series CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee

More information

Business Rules-Driven SOA. A Framework for Multi-Tenant Cloud Computing

Business Rules-Driven SOA. A Framework for Multi-Tenant Cloud Computing Lect. Phd. Liviu Gabriel CRETU / SPRERS evet Traiig o software services, Timisoara, Romaia, 6-10 dec 2010 www.feaa.uaic.ro Busiess Rules-Drive SOA. A Framework for Multi-Teat Cloud Computig Lect. Ph.D.

More information

Forecasting. Forecasting Application. Practical Forecasting. Chapter 7 OVERVIEW KEY CONCEPTS. Chapter 7. Chapter 7

Forecasting. Forecasting Application. Practical Forecasting. Chapter 7 OVERVIEW KEY CONCEPTS. Chapter 7. Chapter 7 Forecastig Chapter 7 Chapter 7 OVERVIEW Forecastig Applicatios Qualitative Aalysis Tred Aalysis ad Projectio Busiess Cycle Expoetial Smoothig Ecoometric Forecastig Judgig Forecast Reliability Choosig the

More information

EUROCONTROL PRISMIL. EUROCONTROL civil-military performance monitoring system

EUROCONTROL PRISMIL. EUROCONTROL civil-military performance monitoring system EUROCONTROL PRISMIL EUROCONTROL civil-military performace moitorig system Itroductio What is PRISMIL? PRISMIL is a olie civil-military performace moitorig system which facilitates the combied performace

More information

5.4 Amortization. Question 1: How do you find the present value of an annuity? Question 2: How is a loan amortized?

5.4 Amortization. Question 1: How do you find the present value of an annuity? Question 2: How is a loan amortized? 5.4 Amortizatio Questio 1: How do you fid the preset value of a auity? Questio 2: How is a loa amortized? Questio 3: How do you make a amortizatio table? Oe of the most commo fiacial istrumets a perso

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

Subject CT5 Contingencies Core Technical Syllabus

Subject CT5 Contingencies Core Technical Syllabus Subject CT5 Cotigecies Core Techical Syllabus for the 2015 exams 1 Jue 2014 Aim The aim of the Cotigecies subject is to provide a groudig i the mathematical techiques which ca be used to model ad value

More information

BaanERP 5.0c. EDI User Guide

BaanERP 5.0c. EDI User Guide BaaERP 5.0c A publicatio of: Baa Developmet B.V. P.O.Box 143 3770 AC Bareveld The Netherlads Prited i the Netherlads Baa Developmet B.V. 1999. All rights reserved. The iformatio i this documet is subject

More information

A Balanced Scorecard

A Balanced Scorecard A Balaced Scorecard with VISION A Visio Iteratioal White Paper Visio Iteratioal A/S Aarhusgade 88, DK-2100 Copehage, Demark Phoe +45 35430086 Fax +45 35434646 www.balaced-scorecard.com 1 1. Itroductio

More information

Confidence Intervals for One Mean

Confidence Intervals for One Mean Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a

More information

Evaluation of Different Fitness Functions for the Evolutionary Testing of an Autonomous Parking System

Evaluation of Different Fitness Functions for the Evolutionary Testing of an Autonomous Parking System Evaluatio of Differet Fitess Fuctios for the Evolutioary Testig of a Autoomous Parkig System Joachim Wegeer 1, Oliver Bühler 2 1 DaimlerChrysler AG, Research ad Techology, Alt-Moabit 96 a, D-1559 Berli,

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

Authentication - Access Control Default Security Active Directory Trusted Authentication Guest User or Anonymous (un-authenticated) Logging Out

Authentication - Access Control Default Security Active Directory Trusted Authentication Guest User or Anonymous (un-authenticated) Logging Out FME Server Security Table of Cotets FME Server Autheticatio - Access Cotrol Default Security Active Directory Trusted Autheticatio Guest User or Aoymous (u-autheticated) Loggig Out Authorizatio - Roles

More information

Optimization of Large Data in Cloud computing using Replication Methods

Optimization of Large Data in Cloud computing using Replication Methods Optimizatio of Large Data i Cloud computig usig Replicatio Methods Vijaya -Kumar-C, Dr. G.A. Ramachadhra Computer Sciece ad Techology, Sri Krishadevaraya Uiversity Aatapuramu, AdhraPradesh, Idia Abstract-Cloud

More information

Location, Location, Location! Modeling Data Proximity in the Cloud

Location, Location, Location! Modeling Data Proximity in the Cloud Locatio, Locatio, Locatio! Modelig Data Proximity i the Cloud Birjodh Tiwaa [email protected] Uiversity of Michiga rbor, MI Hitesh Ballai [email protected] Microsoft Research Cambridge, UK Mahesh

More information

Now here is the important step

Now here is the important step LINEST i Excel The Excel spreadsheet fuctio "liest" is a complete liear least squares curve fittig routie that produces ucertaity estimates for the fit values. There are two ways to access the "liest"

More information

Message Exchange in the Utility Market Using SAP for Utilities. Point of View by Marc Metz and Maarten Vriesema

Message Exchange in the Utility Market Using SAP for Utilities. Point of View by Marc Metz and Maarten Vriesema Eergy, Utilities ad Chemicals the way we see it Message Exchage i the Utility Market Usig SAP for Utilities Poit of View by Marc Metz ad Maarte Vriesema Itroductio Liberalisatio of utility markets has

More information

Conceptualization with Incremental Bron- Kerbosch Algorithm in Big Data Architecture

Conceptualization with Incremental Bron- Kerbosch Algorithm in Big Data Architecture Acta Polytechica Hugarica Vol. 13, No. 2, 2016 Coceptualizatio with Icremetal Bro- Kerbosch Algorithm i Big Data Architecture László Kovács 1, Gábor Szabó 2 1 Uiversity of Miskolc, Istitute of Iformatio

More information

How to use what you OWN to reduce what you OWE

How to use what you OWN to reduce what you OWE How to use what you OWN to reduce what you OWE Maulife Oe A Overview Most Caadias maage their fiaces by doig two thigs: 1. Depositig their icome ad other short-term assets ito chequig ad savigs accouts.

More information

Asymptotic Growth of Functions

Asymptotic Growth of Functions CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll

More information

Multiplexers and Demultiplexers

Multiplexers and Demultiplexers I this lesso, you will lear about: Multiplexers ad Demultiplexers 1. Multiplexers 2. Combiatioal circuit implemetatio with multiplexers 3. Demultiplexers 4. Some examples Multiplexer A Multiplexer (see

More information

Chatpun Khamyat Department of Industrial Engineering, Kasetsart University, Bangkok, Thailand [email protected]

Chatpun Khamyat Department of Industrial Engineering, Kasetsart University, Bangkok, Thailand ocpky@hotmail.com SOLVING THE OIL DELIVERY TRUCKS ROUTING PROBLEM WITH MODIFY MULTI-TRAVELING SALESMAN PROBLEM APPROACH CASE STUDY: THE SME'S OIL LOGISTIC COMPANY IN BANGKOK THAILAND Chatpu Khamyat Departmet of Idustrial

More information

Cooley-Tukey. Tukey FFT Algorithms. FFT Algorithms. Cooley

Cooley-Tukey. Tukey FFT Algorithms. FFT Algorithms. Cooley Cooley Cooley-Tuey Tuey FFT Algorithms FFT Algorithms Cosider a legth- sequece x[ with a -poit DFT X[ where Represet the idices ad as +, +, Cooley Cooley-Tuey Tuey FFT Algorithms FFT Algorithms Usig these

More information

CCH Accountants Starter Pack

CCH Accountants Starter Pack CCH Accoutats Starter Pack We may be a bit smaller, but fudametally we re o differet to ay other accoutig practice. Util ow, smaller firms have faced a stark choice: Buy cheaply, kowig that the practice

More information

CCH CRM Books Online Software Fee Protection Consultancy Advice Lines CPD Books Online Software Fee Protection Consultancy Advice Lines CPD

CCH CRM Books Online Software Fee Protection Consultancy Advice Lines CPD Books Online Software Fee Protection Consultancy Advice Lines CPD Books Olie Software Fee Fee Protectio Cosultacy Advice Advice Lies Lies CPD CPD facig today s challeges As a accoutacy practice, maagig relatioships with our cliets has to be at the heart of everythig

More information

Plug-in martingales for testing exchangeability on-line

Plug-in martingales for testing exchangeability on-line Plug-i martigales for testig exchageability o-lie Valetia Fedorova, Alex Gammerma, Ilia Nouretdiov, ad Vladimir Vovk Computer Learig Research Cetre Royal Holloway, Uiversity of Lodo, UK {valetia,ilia,alex,vovk}@cs.rhul.ac.uk

More information

June 3, 1999. Voice over IP

June 3, 1999. Voice over IP Jue 3, 1999 Voice over IP This applicatio ote discusses the Hypercom solutio for providig ed-to-ed Iteret protocol (IP) coectivity i a ew or existig Hypercom Hybrid Trasport Mechaism (HTM) etwork, reducig

More information

A Distributed Dynamic Load Balancer for Iterative Applications

A Distributed Dynamic Load Balancer for Iterative Applications A Distributed Dyamic Balacer for Iterative Applicatios Harshitha Meo, Laxmikat Kalé Departmet of Computer Sciece, Uiversity of Illiois at Urbaa-Champaig {gplkrsh2,kale}@illiois.edu ABSTRACT For may applicatios,

More information

Review: Classification Outline

Review: Classification Outline Data Miig CS 341, Sprig 2007 Decisio Trees Neural etworks Review: Lecture 6: Classificatio issues, regressio, bayesia classificatio Pretice Hall 2 Data Miig Core Techiques Classificatio Clusterig Associatio

More information

Quantitative Computer Architecture

Quantitative Computer Architecture Performace Measuremet ad Aalysis i Computer Quatitative Computer Measuremet Model Iovatio Proposed How to measure, aalyze, ad specify computer system performace or My computer is faster tha your computer!

More information

Designing Incentives for Online Question and Answer Forums

Designing Incentives for Online Question and Answer Forums Desigig Icetives for Olie Questio ad Aswer Forums Shaili Jai School of Egieerig ad Applied Scieces Harvard Uiversity Cambridge, MA 0238 USA [email protected] Yilig Che School of Egieerig ad Applied

More information

Escola Federal de Engenharia de Itajubá

Escola Federal de Engenharia de Itajubá Escola Federal de Egeharia de Itajubá Departameto de Egeharia Mecâica Pós-Graduação em Egeharia Mecâica MPF04 ANÁLISE DE SINAIS E AQUISÇÃO DE DADOS SINAIS E SISTEMAS Trabalho 02 (MATLAB) Prof. Dr. José

More information

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy

More information

Security Functions and Purposes of Network Devices and Technologies (SY0-301) 1-800-418-6789. Firewalls. Audiobooks

Security Functions and Purposes of Network Devices and Technologies (SY0-301) 1-800-418-6789. Firewalls. Audiobooks Maual Security+ Domai 1 Network Security Every etwork is uique, ad architecturally defied physically by its equipmet ad coectios, ad logically through the applicatios, services, ad idustries it serves.

More information

facing today s challenges As an accountancy practice, managing relationships with our clients has to be at the heart of everything we do.

facing today s challenges As an accountancy practice, managing relationships with our clients has to be at the heart of everything we do. CCH CRM cliet relatios facig today s challeges As a accoutacy practice, maagig relatioships with our cliets has to be at the heart of everythig we do. That s why our CRM system ca t be a bolt-o extra it

More information

MTO-MTS Production Systems in Supply Chains

MTO-MTS Production Systems in Supply Chains NSF GRANT #0092854 NSF PROGRAM NAME: MES/OR MTO-MTS Productio Systems i Supply Chais Philip M. Kamisky Uiversity of Califoria, Berkeley Our Kaya Uiversity of Califoria, Berkeley Abstract: Icreasig cost

More information

LEASE-PURCHASE DECISION

LEASE-PURCHASE DECISION Public Procuremet Practice STANDARD The decisio to lease or purchase should be cosidered o a case-by case evaluatio of comparative costs ad other factors. 1 Procuremet should coduct a cost/ beefit aalysis

More information