Effective Techniques for Message Reduction and Load Balancing in Distributed Graph Computation

Size: px
Start display at page:

Download "Effective Techniques for Message Reduction and Load Balancing in Distributed Graph Computation"

Transcription

1 Effective Techiques for Message Reductio ad Load Balacig i Distributed Graph Computatio ABSTRACT Da Ya, James Cheg, Yi Lu Dept. of Computer Sciece ad Egieerig The Chiese Uiversity of Hog Kog {yada, jcheg, ylu}@cse.cuhk.edu.hk Massive graphs, such as olie social etworks ad commuicatio etworks, have become commo today. To efficietly aalyze such large graphs, may distributed graph computig systems have bee developed. These systems employ the thik like a vertex programmig paradigm, where a program proceeds i iteratios ad at each iteratio, vertices exchage messages with each other. However, usig Pregel s simple message passig mechaism, some vertices may sed/receive sigificatly more messages tha others due to either the high degree of these vertices or the logic of the algorithm used. This forms the commuicatio bottleeck ad leads to imbalaced workload amog machies i the cluster. I this paper, we propose two effective message reductio techiques: 1)vertex mirrorig with message combiig, ad 2)a additioal requestrespod API. These techiques ot oly reduce the total umber of messages exchaged through the etwork, but also boud the umber of messages set/received by ay sigle vertex. We theoretically aalyze the effectiveess of our techiques, ad implemet them o top of our ope-source Pregel implemetatio called Pregel+. Our experimets o various large real graphs demostrate that our message reductio techiques sigificatly improve the performace of distributed graph computatio. Categories ad Subject Descriptors D.4.7 [Orgaizatio ad Desig]: Distributed systems Geeral Terms Performace Keywords Pregel; distributed graph computig; graph aalytics 1. INTRODUCTION With the growig iterest i aalyzig large real-world graphs such as olie social etworks, web graphs ad sematic web graphs, may distributed graph computig systems [1, 5, 10, 11, 13, 18, 21, Copyright is held by the Iteratioal World Wide Web Coferece Committee IW3C2). IW3C2 reserves the right to provide a hyperlik to the author s site if the Material is used i electroic media. WWW 2015, May 18 22, 2015, Florece, Italy. ACM /15/05. Wilfred Ng Dept. of Computer Sciece ad Egieerig The Hog Kog Uiversity of Sciece ad Techology wilfred@cse.ust.hk 23] have emerged. These systems are deployed i a shared-othig distributed computig ifrastructure usually built o top of a cluster of low-cost commodity PCs. Pioeered by Google s Pregel [13], these systems adopt a vertex-cetric computig paradigm, where programmers thik aturally like a vertex whe desigig distributed graph algorithms. A Pregel-like system also takes care of fault recovery ad scales to arbitrary cluster size without the eed of chagig the program code, both of which are idispesable properties for programs ruig i a cloud eviromet. MapReduce [3], ad its ope-source implemetatio Hadoop, are also popularly used for large scale graph processig. However, may graph algorithms are itrisically iterative, such as the computatio of PageRak, coected compoets, ad shortest paths. For iterative graph computatio, a Pregel program is much more efficiet tha its MapReduce couterpart [13]. Weakesses of Pregel. Although Pregel s vertex-cetric computig model has bee widely adopted i most of the recet distributed graph computig systems [1, 11, 10, 18] ad also ispired the edge-cetric model [5]), Pregel s vertex-to-vertex message passig mechaism ofte causes bottleecks i commuicatio whe processig real-world graphs. To clarify this poit, we first briefly review how Pregel performs message passig. I Pregel, a vertex v ca sed messages to aother vertex u if v kows u s vertex ID. I most cases, v oly seds messages to its eighbors whose IDs are available from v s adjacecy list. But there also exist Pregel algorithms i which a vertex v may sed messages to aother vertex that is ot a eighbor of v [24, 19]. These algorithms usually adopt poiter jumpig or doublig), a techique that is widely used i desigig PRAM algorithms [22], to boud the umber of iteratios by Olog V ), where V refers to the umber of vertices i the graph. The problem with Pregel s message passig mechaism is that a small umber of vertices, which we call bottleeck vertices, may sed/receive much more messages tha other vertices. A bottleeck vertex ot oly geerates heavy commuicatio, but also sigificatly icreases the workload of the machie i which the vertex resides, causig highly imbalaced workload amog differet machies. Bottleeck vertices are commo whe usig Pregel to process real-world graphs, maily due to either 1)high vertex degree or 2)algorithm logic, which we elaborate more as follows. We first cosider the problem caused by high vertex degree. Whe a high-degree vertex seds messages to all its eighbors, it becomes a bottleeck vertex. Ufortuately, real-world graphs usually have highly skewed degree distributio, with some vertices havig very high degrees. For example, i the Twitter who-follows-who graph 1, the maximum degree is over 2.99M while the average degree is

2 oly 35. Similarly, i the BTC dataset used i our experimets, the maximum degree is over 1.6M while the average degree is oly We ra Hash-Mi [17, 24], a distributed algorithm for computig coected compoets CCs), o the degree-skewed BTC dataset i a cluster with 1 master Worker 0) ad 120 slaves Workers 1 120), ad observed highly imbalaced workload amog differet workers, which we describe ext. Pregel assigs each vertex to a worker by hashig the vertex ID regardless of the degree of the vertex. As a result, each worker holds approximately the same umber of vertices, but the total umber of eighbors i the adjacecy lists i.e., umber of edges) varies greatly amog differet workers. I the computatio of Hash-Mi o BTC, we observed a ueve distributio of edge umber amog workers, as some workers cotai more high-degree vertices tha other workers. Sice messages are set alog the edges, the ueve distributio of edge umber also leads to a ueve distributio of the amout of commuicatio amog differet workers. I Figure 1, the taller blue bars idicate the total umber of messages set by each worker durig the etire computatio of Hash-Mi, where we observe highly ueve commuicatio workload amog differet workers. Bottleeck vertices may also be geerated by program logic. A example is the S-V algorithm proposed i [24, 22] for computig CCs, which we will describe i detail i Sectio 3.4. I S-V, each vertex v maitais a field D[v] which records the vertex that v is to commuicate with. The field D[v] may be updated at each iteratio as the algorithm proceeds; ad whe the algorithm termiates, vertices v i ad v j are i the same CC iff D[v i ]=D[v j ]. Thus, durig the computatio, some vertex u may commuicate with may vertices {v 1,v 2,...,v k } i its CC if u = D[v i ], for 1 i k. I this case, u becomes a bottleeck vertex. We ra S-V o the USA road etwork i a cluster with 1 master Worker 0) ad 60 slaves Workers 1 60), ad observed highly imbalaced commuicatio workload amog differet workers. I Figure 2, the taller blue bars idicate the total umber of messages set by each worker durig the etire computatio of S-V, where we ca see that the commuicatio workload is very biased especially at Worker 0). We remark that the imbalaced commuicatio workload is ot caused by skewed vertex degree distributio, sice the largest vertex degree of the USA road etwork is merely 9. Rather, it is because of the algorithm logic of S-V. Specifically, sice the USA road etwork is coected, i the last roud of S- V, all vertices v have D[v] equal to Vertex 0, idicatig that they all belog to the same CC. Sice Vertex 0 is hashed to Worker 0, Worker 0 seds much more messages tha the other workers, as ca be observed from Figure 2. I additio to the two problems metioed above, Pregel s message passig mechaism is also ot efficiet for processig graphs with relatively) high average degree due to the high overall commuicatio cost. However, may real-world graphs such as social etworks ad mobile phoe etworks have relatively high average degree, as a perso is ofte coected to at least dozes of people. Our Solutio. I this paper, we solve the problems caused by Pregel s message passig mechaism with two effective message reductio techiques. The goals are to 1)mitigate the problem of imbalaced workload by elimiatig bottleeck vertices, ad to 2)reduce the overall umber of messages exchaged through the etwork. The first techique is called mirrorig, which is desiged to elimiate bottleeck vertices caused by high vertex degree. The mai idea is to costruct mirrors of each high-degree vertex i differet machies, so that messages from a high-degree vertex are forwarded to its eighbors by its mirrors i local machies. Let Messages # Message # 2 x Worker ID 2 Figure 1: Hash-Mi o BTC with/without mirrorig) 4 x Worker ID Figure 2: S-V o USA with/without request-respod) dv) be the degree of a vertex v ad M be the umber of machies i the cluster, mirrorig bouds the umber of messages set by v each time to mi{m,dv)}. Ifv is a high-degree vertex, dv) ca be up to millios, but M is ormally oly from tes to a few hudred. We remark that ideas similar to mirrorig have bee adopted by existig systems [11, 18], but we fid that mirrorig a vertex does ot always reduce the umber of messages due to Pregel s use of message combier [13]. Hece, we provide a theoretical aalysis o which vertices should be selected for mirrorig i Sectio 5. I Figure 1, the short red bars idicate the total umber of messages set by each worker whe mirrorig is applied to all vertices with degree at least 100. We ca clearly see the big differece betwee the ueve blue bars without mirrorig) ad the eve-height short red bars with mirrorig). Furthermore, the umber of messages is also sigificatly reduced by mirrorig. We remark that the algorithm is still the same ad mirrorig is completely trasparet to users. Mirrorig reduces the ruig time of Hash-Mi o BTC from secods to 9.55 secods. The secod techique is a ew request-respod paradigm. We exted the basic Pregel framework by a additioal request-respod fuctioality. A vertex u may request aother vertex v for its attribute av), ad the requested value will be available i the ext iteratio. The request-respod programmig paradigm simplifies the codig of may Pregel algorithms, as otherwise at least three iteratios are required to explicitly code each request ad respose process. More importatly, the request-respod paradigm effectively elimiates the bottleeck vertices resulted from algorithm logic, by boudig the umber of respose messages set by ay vertex to M. Cosider the S-V algorithm metioed earlier, where a set of k vertices {v 1,v 2,...,v k } with D[v i ]=u require the value of D[u] from u thus there are k requests ad resposes). Uder the request-respod paradigm, all the requests from a machie to the same target vertex are merged ito oe request. Therefore, at most mi{m,k} requests are eeded for the k vertices ad at most mi{m,k} resposes are set from u. For large real-world graphs, k is ofte orders of magitude greater tha M. I Figure 2, the short red bars idicate the total umber of messages set by each worker whe the request-respod paradigm is applied. Agai, the skewed message passig represeted by the blue bars are ow replaced by the eve-height short red bars. I particular, Vertex 0 ow oly respods to the requestig workers istead of all the requestig vertices i the last roud, ad hece the highly imbalaced workload caused by Vertex 0 i Worker 0 is ow eveed out. The request-respod paradigm reduces the ruig time of S-V o the USA road etwork from secods to secods. 1308

3 Figure 3: Illustratio of combier Fially, we remark that our experimets were ru i a cluster without ay resource cotetio, ad our optimizatio techiques are expected to improve the overall performace of Pregel algorithms more sigificatly if they were ru i a public data ceter, where the etwork badwidth is lower ad reducig commuicatio overhead becomes more importat. The rest of the paper is orgaized as follows. We review existig parallel graph computig systems, ad highlight the differeces of our work from theirs, i Sectio 2. I Sectio 3, we describe some Pregel algorithms for problems that are commo i social etwork aalysis ad web aalysis. I Sectio 4, we itroduce the basic commuicatio framework. We preset the mirrorig techique ad the request-respod fuctioality i Sectios 5 ad 6. Fially, we report the experimetal results i Sectio 7 ad coclude the paper i Sectio BACKGROUND AND RELATED WORK We first review Pregel s framework, ad the discuss other related distributed graph computig systems. 2.1 Pregel Pregel [13] is desiged based o the bulk sychroous parallel BSP) model. It distributes vertices to differet machies i a cluster, where each vertex v is associated with its adjacecy list i.e., the set of v s eighbors). A program i Pregel implemets a userdefied compute) fuctio ad proceeds i iteratios called supersteps). I each superstep, the program calls compute) for each active vertex. The compute) fuctio performs the user-specified task for a vertex v, such as processig v s icomig messages set i the previous superstep), sedig messages to other vertices to be received i the ext superstep), ad makig v vote to halt. A halted vertex is reactivated if it receives a message i a subsequet superstep. The program termiates whe all vertices vote to halt ad there is o pedig message for the ext superstep. Pregel umbers the supersteps so that a user may use the curret superstep umber whe implemetig the algorithm logic i the compute) fuctio. As a result, a Pregel algorithm ca perform differet operatios i differet supersteps by brachig o the curret superstep umber. Message Combier. Pregel allows users to implemet a combie) fuctio, which specifies how to combie messages that are set from a machie M i to the same vertex v i a machie M j. These messages are combied ito a sigle message, which is the set from M i to v i M j. However, combier is applied oly whe commutative ad associative operatios are to be applied to the messages. For example, i the PageRak computatio, the messages set to a vertex v are to be summed up to compute v s PageRak value; i this case, we ca combie all messages set from a machie M i to the same target vertex i a machie M j ito a sigle message that equals their sum. Figure 3 illustrates the idea of combier, where the messages set by vertices i machie M 1 to the same target vertex v j i machie M 2 are combied ito their sum before sedig. Aggregator. Pregel also supports aggregator, which is useful for global commuicatio. Each vertex ca provide a value to a aggregator i compute) i a superstep. The system aggregates those values ad makes the aggregated result available to all vertices i the ext superstep. 2.2 Pregel-Like Systems i JAVA Sice Google s Pregel is proprietary, may ope-source Pregel couterparts are developed. Most of these systems are implemeted i JAVA, e.g., Giraph [1] ad GPS [18]. They read the graph data from Hadoop s DFS HDFS) ad write the results to HDFS. However, sice object deletio is hadled by JAVA s Garbage Collector GC), if a machie maitais a huge amout of vertex/edge objects i mai memory, GC eeds to track a lot of objects ad the overhead ca severely degrade the system performace. To decrease the umber of objects beig maitaied, JAVA-based systems maitai vertices i mai memory i their biary represetatio. For example, Giraph orgaizes vertices as mai memory pages, where each page is simply a byte array object that holds the biary represetatio of may vertices. As a result, a vertex eeds to be deserialized from the page holdig it before callig compute); ad after compute) completes, the updated vertex eeds to be serialized back to its page. The serializatio cost ca be high, especially if the adjacecy list is log. To avoid uecessary serializatio cost, a Pregel-like system should be implemeted i a laguage such as C/C++, where programmers who are system developers, ot ed users) maage mai memory objects themselves. We implemeted our Pregel+ system i C/C++. GPS [18] supports a optimizatio called large adjacecy list partitioig LALP) to hadle high-degree vertices, whose idea is similar to vertex mirrorig. However, GPS does ot explore the performace tradeoff betwee vertex mirrorig ad message combiig. Istead, it is claimed i [18] that very small performace differece ca be observed whether combier is used or ot, ad thus, GPS simply does ot perform seder-side message combiig. Our experimets i Sectio 7 show that seder-side message combiig sigificatly reduces the overall ruig time of Pregel algorithms, ad therefore, both vertex mirrorig ad message combiig should be used to achieve better performace. As we shall see i Sectio 5, vertex mirrorig ad message combiig are two coflictig message reductio techiques, ad a theoretical aalysis o their performace tradeoff is eeded i order to devise a cost model for automatically choosig vertices for mirrorig. 2.3 GraphLab ad PowerGraph GraphLab [11] is aother parallel graph computig system that follows a desig differet from Pregel. GraphLab supports asychroous executio, ad adopts a data pullig programmig paradigm. Specifically, each vertex actively pulls data from its eighbors, rather tha passively receives messages set/pushed by its eighbors. This feature is somewhat similar to our request-respod paradigm, but i GraphLab, the requests ca oly be set to the eighbors. As a result, GraphLab caot support parallel graph algorithms where a vertex eeds to commuicate with a o-eighbor. Such algorithms are, however, quite popular i Pregel as they make use of the poiter jumpig or doublig) techique of PRAM algorithms to boud the umber of iteratios by Olog V ). Examples iclude the S-V algorithm for computig CCs [24] ad Pregel algorithm for computig miimum spaig forest [19]. These algorithms ca beefit sigificatly from our request-respod techique. Recetly, 1309

4 several studies [8, 12] reported that GraphLab s asychroous executio is geerally slower tha its sychroous mode that simulates Pregel s model) due to the high lockig/ulockig overhead. Thus, we maily focus o Pregel s computig model i this paper. GraphLab also builds mirrors for vertices, which are called ghosts. However, GraphLab creates mirrors for every vertex regardless of its degree, which leads to excessive space cosumptio. A more recet versio of GraphLab, called PowerGraph [5], partitios the graph by edges rather tha by vertices. Edge partitioig mitigates the problem of imbalaced workload as the edges of a high-degree vertex are hadled by multiple workers. Accordigly, a ew edgecetric Gather-Apply-Scatter GAS) computig model is used istead of the traditioal vertex-cetric computig model. 3. PREGEL ALGORITHMS I this sectio, we describe some Pregel algorithms for problems that are commo i social etwork aalysis ad web aalysis, which will be used for illustratig importat cocepts ad for performace evaluatio. We cosider fudametal problems such as 1)computig coected compoets or bi-coected compoets), which is a commo preprocessig step for social etwork aalysis [14, 15]; 2)computig miimum spaig tree or forest), which is useful i miig social relatioships [15]; ad 3)computig PageRak, which is widely used i rakig web pages [16, 9] ad spam detectio[7]. For ease of presetatio, we first defie the graph otatios used i the paper. Give a udirect graph G = V,E), we deote the eighbors of a vertex v V by Γv), ad the degree of v by dv) = Γv) ; if G is directed, we deote the i-eighbors outeighbors) of a vertex v by Γ i v) Γ out v)), ad the i-degree out-degree) of v by d i v) = Γ i v) d out v) = Γ out v) ). Each vertex v V has a uique iteger ID, deoted by idv). The diameter of G is deoted by δ. 3.1 Attribute Broadcast We first itroduce a Pregel algorithm for attribute broadcast. Give a directed graph G, where each vertex v is associated with a attribute av) ad a adjacecy list that cotais the set of v s out-eighbors Γ out v), attribute broadcast costructs a ew adjacecy list for each vertex v i G, which is defied as Γ out v) = { u, au) u Γ out v)}. Put simply, attribute broadcast associates each eighbor u i the adjacecy list of a vertex v with u s attribute au). Attribute broadcast is very useful i distributed graph computatio, ad it is a frequetly performed key operatio i may Pregel algorithms. For example, the Pregel algorithm for computig bi-coected compoets [24] requires to relabel the ID of each vertex u by its preorder umber i the spaig tree, deoted by preu). Attribute broadcast is used i this case, where au) refers to preu). The Pregel algorithm for attribute broadcast cosists of 3 supersteps: i superstep 1, each vertex v seds a message v to each eighbor u Γ out v) to request for au); the i superstep 2, each vertex u obtais the requesters v from the icomig messages, ad seds the respose message u, au) to each requester v; fially i superstep 3, each vertex v collects the icomig messages to costruct Γ out v). 3.2 PageRak Next we preset a Pregel algorithm for PageRak computatio. Give a directed web graph G =V,E), where each vertex page) v liks to a list of pages Γ out v), the problem is to compute the PageRak, prv), of each vertex v V. Figure 4: Forest structure of the S-V algorithm Figure 5: Key operatios of the S-V algorithm Pregel s PageRak algorithm [13] works as follows. I superstep 1, each vertex v iitializes prv)=1/ V ad distributes the value prv)/d out v) to each out-eighbor of v. I superstep i i>1), each vertex v sums up the received values from its i-eighbors, deoted by sum, ad computes prv)=0.15/ V sum. It the distributes prv)/d out v) to each of its out-eighbors. 3.3 Hash-Mi We ext preset a Pregel algorithm for computig coected compoets CCs) i a udirected graph. We adopt the Hash- Mi algorithm [17, 24]. Give a CC C, let us deote the set of vertices of C by V C), ad defie the ID of C to be idc) = mi{idv) :v V C)}. We further defie the color of a vertex v as ccv) =idc), where v V C). Hash-Mi computes ccv) for each vertex v V, ad the idea is to broadcast the smallest vertex ID see so far by each vertex v, deoted by miv). Whe the algorithm termiates, miv) =ccv) for each vertex v V. We ow describe the Hash-Mi algorithm i Pregel framework. I superstep 1, each vertex v sets miv) to be idv), broadcasts miv) to all its eighbors, ad votes to halt. I superstep i i>1), each vertex v receives messages from its eighbors; let mi be the smallest ID received, if mi <miv), v sets miv) =mi ad broadcasts mi to its eighbors. All vertices vote to halt at the ed of a superstep. Whe the process coverges, all vertices have voted to halt ad for each vertex v,wehavemiv) =ccv). 3.4 The S-V Algorithm The Hash-Mi algorithm described i Sectio 3.3 requires Oδ) supersteps [24], which ca be slow for computig CCs i largediameter graphs. Aother Pregel algorithm proposed i [24] computes CCs i Olog V ) supersteps, by adaptig Shiloach-Vishki s S-V) algorithm for the PRAM model [22]. We use this algorithm to demostrate how algorithm logic geerates a bottleeck vertex v eve if dv) is small. I the S-V algorithm, each vertex u maitais a poiter D[u], which is iitialized as u, formig a self loop as show Figure 4a). Durig the computatio, vertices are orgaized ito a forest such that all vertices i a tree belog to the same CC. The tree defiitio is relaxed a bit here to allow the tree root w to have a self-loop, i.e., D[w] =w see Figures 4b) ad 4c)); while D[v] of ay other vertex v i the tree poits to v s paret. The S-V algorithm proceeds i rouds, ad i each roud, the poiters are updated i three steps illustrated i Figure 5): 1)tree 1310

5 Figure 6: Cojoied Tree hookig: for each edge u, v), ifu s paret w = D[u] is a tree root, hook w as a child of v s paret D[v], i.e., set D[D[u]] = D[v]; 2)star hookig: for each edge u, v), ifu is i a star see Figure 4c) for a example of star), hook the star to v s tree as i Step 1), i.e., set D[D[u]] = D[v]; 3)shortcuttig: for each vertex v, move vertex v ad its descedats closer to the tree root, by hookig v to the paret of v s paret, i.e., settig D[v] =D[D[v]]. The above three steps execute i rouds, ad the algorithm eds whe every vertex is i a star. Due to the shortcuttig operatio, the S-V algorithm creates flatteed trees e.g., stars) with large fa-out towards the ed of the executio. As a result, a vertex w may have may childre u i.e., D[u] =w), ad each of these childre u requests w for the value of D[w]. This reders w a bottleeck vertex. I particular, i the last roud of the S-V algorithm, all vertices v iaccc have D[v] =idc), ad they all sed requests to the vertex w = idc) for D[w]. I the basic Pregel framework, w receives V C) requests ad seds V C) resposes, which leads to skewed workload whe V C) is large. 3.5 Miimum Spaig Forest The Pregel algorithm proposed by [19] for miimum spaig forest MSF) computatio is aother example that shows how algorithm logic ca geerate bottleeck vertices. This algorithm proceeds i iteratios, where each iteratio cosists of three steps, which we describe below. I Step 1), each vertex v picks a edge with the miimum weight. The vertices ad their picked edges form disjoit subgraphs, each of which is a cojoied-tree: two trees with their roots joied by a cycle. Figure 6 illustrates the cocept of a cojoied-tree, where the edges are those picked i Step 1). The vertex with the smaller ID i the cycle of a cojoied-tree is called the supervertex of the tree e.g., vertex 5 is the supervertex i Figure 6), ad the other vertices are called the subvertices. I Step 2), each vertex fids the supervertex of the cojoiedtree it belogs to, which is accomplished by poiter jumpig. Specifically, each vertex v maitais a poiter D[v]; suppose that v picks edge v, u) i Step 1), the the value of D[v] is iitialized as u. Each vertex v the seds request to w = D[v] for D[w]. Iitially, the actual supervertex s e.g., vertex 5 i Figure 6) ad its eighbor s i the cycle e.g. vertex 6 i Figure 6) see that they have set each other messages ad detect that they are i the cycle. Vertex s the sets itself as the supervertex i.e., sets D[s] =s) due to s<s, before respodig D[s] =s to the requesters while D[s ]=s remais for s sice s >s). For ay other vertex v, it receives respose D[w] from w = D[v] ad updates D[v] to be D[w]. This process is repeated util covergece, upo whe D[v] records the supervertex s for all vertices v. I Step 3), each vertex v seds request to each eighbor u Γv) for its supervertex D[u], ad removes edge v, u) if D[v] = D[u] i.e., v ad u are i the same cojoied-tree); v the seds the remaiig edges to vertices i other cojoied-trees) to the supervertex D[v]. After this step, all subvertices are codesed ito their supervertex, which costructs a adjacecy list of edges to the other supervertices from those edges set by its subvertices. We cosider a improved versio of the above algorithm that applies the Storig-Edges-At-Subvertices SEAS) optimizatio of [19]. Specifically, istead of havig the supervertex merge ad store all cross-tree edges, the SEAS optimizatio stores the edges of a supervertex i a distributed fashio amog all of its subvertices. As a result, if a supervertex s is merged ito aother supervertex, it has to otify its subvertices of the ew supervertex they belog to. This is accomplished by havig each vertex v sed request to its supervertex D[v] =s for D[s]. Sice smaller cojoied-trees are merged ito larger oes, a supervertex s may have may subvertices v towards the ed of the executio, ad they all request for D[s] from s, rederig s a bottleeck vertex. 4. BASIC COMMUNICATION FRAMEWORK Whe cosiderig o which system we should implemet our message reductio techiques, we decided to implemet a ew ope-source Pregel system i C/C++, called Pregel+, to avoid the pitfalls of a JAVA-based system described i Sectio 2.2. Other reasos for a ew Pregel implemetatio iclude: 1)Giraph has bee show to have iferior performace i recet performace evaluatio of graph-parallel systems [2, 4, 6, 8, 20]; 2)GPS does ot perform seder-side message combiig, while our work studies effective message reductio techiques i a system that adheres to Pregel s framework, where message combiig is supported; 3)other systems such as GraphLab ad PowerGraph are also ot suitable as discussed i Sectio 2.3. We first itroduce the basic commuicatio framework of Pregel+. Our two ew message reductio techiques to be itroduced i Sectios 5 ad 6 further exted the basic commuicatio framework. We use the term worker to represet a computig uit, which ca be a machie or a thread/process i a machie. For ease of discussio, we assume that each machie rus oly oe worker but the cocepts ca be straightforwardly geeralized. I Pregel+, each worker is simply a MPI Message Passig Iterface) process ad commuicatios amog differet processes are implemeted usig MPI s commuicatio primitives. Each worker maitais a message chael, Ch msg, for exchagig the vertexto-vertex messages. I the compute) fuctio, if a vertex seds a message msg to a target vertex v tgt, the message is simply added to Ch msg. Like i Google s Pregel, messages i Ch msg are set to the target workers i batches before the ext superstep begis. Note that if a message msg is set from worker M i to vertex v tgt i worker M j, the ID of the target v tgt should be set alog with msg, so that whe M j receives msg, it kows which vertex msg should be directed to. The operatio of the message chael Ch msg is directly related to the commuicatio cost ad hece affects the overall performace of the system. We tested differet ways of implemetig Ch msg, ad the most efficiet oe is preseted i Figure 7. We assume that a worker maitais N vertices, {v 1,v 2,...,v N }. The message chael Ch msg associates each vertex v i with a icomig message buffer I i. Whe a icomig message msg 1 directed to vertex v i arrives, Ch msg looks up a hash table T i for the icomig message buffer I i usig v i s ID. It the appeds msg 1 to the ed of I i. The lookup table T i is static uless graph mutatio occurs, i which case updates to T i may be required. Oce all icomig messages are processed, compute) is called for each active vertex v i with the messages i I i as the iput. A worker also maitais M outgoig message buffers where M is the umber of workers), oe for each worker M j i the cluster, deoted by O j.icompute), a vertex v i may sed a message msg

6 Figure 7: Illustratio of Message Chael, Ch msg Figure 8: Illustratio of Mirrorig Figure 9: Mirrorig v.s. Message Combiig to aother vertex with ID tgt. Let hash.) be the hash fuctio that computes the worker ID of a vertex from its vertex ID, the the target vertex is i worker M hashtgt). Thus, msg 2 alog with tgt) is appeded to the ed of the buffer O hashtgt). Messages i each buffer O j are set to worker M j i batch. If a combier is used, the messages i a buffer O j are first grouped sorted) by target vertex IDs, ad messages i each group are combied ito oe message usig the combier logic before sedig. 5. THE MIRRORING TECHNIQUE The mirrorig techique is desiged to elimiate bottleeck vertices caused by high vertex degree. Give a high-degree vertex v, we costruct a mirror for v i ay worker i which some of v s eighbors reside. Whe v eeds to sed a message, e.g., the value of its attribute, av), to its eighbors, v seds av) to its mirrors. The, each mirror forwards av) to the eighbors of v that reside i the same local worker as the mirror, without ay message passig. Figure 8 illustrates the idea of mirrorig. Assume that u i is a high-degree vertex residig i worker machie M 1, ad u i has eighbors {v 1,v 2,...,v j } residig i machie M 2 ad eighbors {w 1,w 2,...,w k } residig i machie M 3. Suppose that u i eeds to sed a message au i ) to the j eighbors i M 2 ad k eighbors i M 3. Figure 8a) shows how u i seds au i ) to its eighbors i M 2 ad M 3 usig Pregel s vertex-to-vertex message passig. I total, j + k) messages are set, oe for each eighbor. To apply mirrorig, we costruct a mirror for u i i M 2 ad M 3, as show by the two squares with label u i ) i Figure 8b). I this way, as illustrated i Figure 8b), u i oly eeds to sed au i ) to the two mirrors i M 2 ad M 3. The, each mirror forwards au i ) to u i s eighbors locally i M 2 ad M 3 without ay etwork commuicatio. I total, oly two messages are set through the etwork, which ot oly tremedously reduces the commuicatio cost, but also elimiates the imbalaced commuicatio load caused by u i. We formalize the effectiveess of mirrorig for message reductio by the followig theorem. THEOREM 1. Let dv) be the degree of a vertex v ad M be the umber of machies. Suppose that v is to deliver a message av) to all its eighbors i oe superstep. If mirrorig is applied o v, the the total umber of messages set by v i order to deliver av) to all its eighbors is bouded by mi{m,dv)}. PROOF. The proof follows directly from the fact that v oly eeds to sed oe message av) to each of its mirrors i other machies ad there are at most mi{m,dv)} mirrors of v. Mirrorig Threshold. The mirrorig techique is trasparet to programmers. But we ca allow users to specify a mirrorig threshold τ such that mirrorig is applied to a vertex v oly if dv) τ we will see shortly that τ ca be automatically set by a cost model followig the result of Theorem 2). If a vertex has degree less tha τ, it seds messages through the ormal message chael Ch msg as usual. Otherwise, the vertex oly seds messages to its mirrors, ad we call this message chael as the mirrorig message chael, or Ch mir i short. I a utshell, a message is set either through Ch msg or Ch mir, depedig o the degree of the sedig vertex. Figure 9 illustrates the cocepts of Ch msg ad Ch mir, where we oly cosider the message passig betwee two machies M 1 ad M 2. The adjacecy lists of vertices u 1, u 2, u 3 ad u 4 i M 1 are show i Figure 9a), ad we cosider how they sed messages to their commo eighbor v 2 residig i machie M 2. Assume that τ =3, the as Figure 9b) shows, u 1, u 2 ad u 3 sed their messages, au 1 ), au 2 ) ad au 3 ), through Ch msg, while u 4 seds its message au 4 ) through Ch mir. Mirrorig v.s. Message Combiig. Now let us assume that the messages are to be applied with commutative ad associative operatios at the receivers side, e.g., the message values are to be summed up as i PageRak computatio. I this case, a combier ca be applied o the message chael Ch msg. However, the receiver-cetric message combiig is ot applicable to the sedercetric chael Ch mir. For example, i Figure 9b), whe u 4 i M 1 seds au 4 ) to its mirror i M 2, u 4 does ot eed to kow the receivers i.e., v 1, v 2, v 3 ad v 4 ); thus, its message to v 2 caot be combied with those messages from u 1, u 2 ad u 3 that are also to be set to v 2. I fact, u 4 oly holds a list of the machies that cotai u 4 s eighbors, i.e. {M 2 } i this example, ad u 4 s eighbors v 1, v 2, v 3 ad v 4 that are local to M 2 are coected by u 4 s mirror i M 2. It may appear that u 4 s message to its mirror is wasted, because if we combie u 4 s message with those messages from u 1, u 2 ad u 3, the we do ot eed to sed it through Ch mir. However, we ote that a high-degree vertex like u 4 ofte has may eighbors i aother worker machie, e.g., v 1, v 3 ad v 4 i additio to v 2 i this example, ad the message is ot wasted sice the message is also forwarded to v 3 ad v 4, which are ot the eighbors of ay other vertex i M 1. Choice of Mirrorig Threshold. The above discussio shows that there are cases where mirrorig is useful, but it does ot give ay formal guidelie as to whe exactly mirrorig should be applied. 1312

7 To this ed, we coduct a theoretical aalysis below o the iterplay betwee mirrorig ad message combiig. Our result shows that mirrorig is effective eve whe message combier is used. THEOREM 2. Give a graph G =V,E) with = V vertices ad m = E edges, we assume that the vertex set is evely partitioed amog M machies e.g., by hashig as i Pregel) ad each machie holds /M vertices. We further assume that the eighbors of a vertex i G are radomly chose amog V, ad the average degree deg avg = m/ is a costat. The, mirrorig should be applied to a vertex v if v s degree is at least M exp{deg avg /M }). PROOF. Cosider a machie M i that cotais a set of /M vertices, V i = {v 1,v 2,...,v /M }, where each vertex v j has l j eighbors for 1 j /M. Cosider a specific vertex v j i M i, ad ifer how large l j should be so that applyig mirrorig o v j ca reduce the overall commuicatio eve whe a combier is used. Cosider a applicatio where all vertices sed messages to all their eighbors i each superstep, such as i PageRak computatio. Further cosider vertex u Γ out v j ). If aother vertex v k V i \{v j } seds messages through Ch msg ad v k also has u as its eighbor, the v j s message to u is wasted sice it ca be combied with v k s message to u. We assume the worst case where all vertices i V i \{v j } sed messages through Ch msg. Sice the eighbors of a vertex i G are radomly chose amog V,wehave ad therefore, = Pr{u Γ out v k )} = l k /, Pr{v j s message to u is ot wasted} Pr{u Γ out v k )} = 1 l ) k. We regard each l k as a radom variable whose value is chose idepedetly from a degree distributio e.g., power-law degree distributio) with expectatio E[l k ]=m/ = deg avg. The, the expectatio of the above equatio is give by = E v k V i 1 l ) k = 1 E[l ) k] = 1 deg ) avg = 1 deg ) /M avg. For large graphs, we have Pr{v j s message to u is ot wasted} lim 1 deg avg [ E 1 l ] k 1 deg ) avg ) /M = exp{ deg avg M }, where the last step is derived from lim 1 1/) = e 1. Accordig to the above discussio, the expected umber of v j s eighbors that are ot the eighbors of ay other vertexes) i M i is equal to l j exp{ deg avg /M }. I other words, if mirrorig is ot used, v j eeds to sed at least l j exp{ deg avg /M } messages that are ot wasted. O the other had, if mirrorig is used, v j seds at most M messages, oe to each mirror. Therefore, mirrorig reduces the umber of messages if l j exp{ deg avg /M } M, or equivaletly, l j M exp{deg avg /M }. To coclude, choosig τ = M exp{deg avg /M } as the degree threshold reduces the commuicatio cost. Theorem 2 states that the choice of τ depeds o the umber of workers, M, ad the average vertex degree, deg avg. A cluster usually ivolves tes to hudreds of workers, while the average degree deg avg of a large real world graph is mostly below 50. Cosider the sceario where M = 100 ad deg avg 50, the τ 100e 0.5 =165. This shows that mirrorig is effective eve for vertices whose degree is ot very high. We remark that Theorem 2 makes some simplified assumptio e.g., G beig a radom graph) for ease of aalysis, which may ot be accurate for a real graph. However, our experimets i Sectio 7.1 show that Theorem 2 is effective o real graphs. Mirror Costructio. Pregel+ costructs mirrors for all vertices v with Γ out v) τ after the iput graph is loaded ad before the iterative computatio, although mirror costructio ca also be pre-computed offlie like GraphLab s ghost costructio. Specifically, the eighbors i v s adjacecy list Γ out is grouped by the workers i which they reside. Each group is defied as N i = {u Γ out v) hashu) = M i }. The, for each group N i, v seds v; N i to worker M i, ad M i costructs a mirror of v with the adjacecy list N i locally i M i. Each vertex v j N i also stores the address of v j s icomig message buffer I j so that messages ca be directly forwarded to v j by v s mirror i M i. Durig graph computatio, a vertex v seds message v, av) to its mirror i worker M i. O receivig the message, M i looks up v s mirror from a hash table usig v s ID similar to T i described i Sectio 4). The message value av) is the forwarded to the icomig message buffers of v s eighbors locally i M i. Hadlig Edge Fields. There are some mior chages to Pregel s programmig iterface for applyig mirrorig. I Pregel s iterface, a vertex calls sed_msgtgt, msg) to sed a arbitrary message msg to a target vertex tgt. With mirrorig, a vertex v seds a message cotaiig the value of its attribute av) to all its eighbors by callig broadcastav)) istead of callig sed_msgu, av)) for each eighbor u Γ out v). Cosider the algorithms described i Sectio 3. For PageRak, a vertex v simply calls broadcastprv)/ Γ out v) ); while for Hash- Mi, v calls broadcastmiv)). However, there are applicatios where the message value is ot oly decided by the seder vertex v s state, but also by the edge that the message is set alog. For example, i Pregel s algorithm for sigle-source shortest path SSSP) computatio [13], a vertex seds dv) +lv, u)) to each eighbor u Γ out v), where dv) is a attribute of v estimatig the distace from the source, ad lv, u) is a attribute of its out-edge v, u) idicatig the edge legth. To support applicatios like SSSP, Pregel+ requires that each edge object supports a fuctio relaymsg), which specifies how to update the value of msg before msg is added to the icomig message buffer I i of the target vertex v i. If msg is set through Ch msg, relaymsg) is called o the seder-side before sedig. If msg is set through Ch mir, relaymsg) is called o the receiverside whe the mirror forwards msg to each local eighbor as the edge field is maitaied by the mirror). For example, i Figure 9, relaymsg) is called whe msg is passed alog a dashed arrow. By default, relaymsg) does ot chage the value of msg. To support SSSP, a vertex v calls broadcastdv)) i compute), ad meawhile, the fuctio relaymsg) is overloaded to add the edge legth lv, u) to msg, which updates the value of msg to the required value dv)+lv, u)). Summary of Cotributios. GPS does ot use message combiig, ad therefore, its LALP techique are ot as effective as our mirrorig techique that is reiforced with message combier. GraphLab s ghost vertex techique creates mirrors for all vertices 1313

8 regardless of the vertex degree, ad thus it is also ot as effective as our mirrorig techique. As far as we kow, this is the first work that cosiders the itegratio of vertex mirrorig ad message combiig i Pregel s computig model. I additio, we also idetified the tradeoff betwee vertex mirrorig ad message combiig i message reductio, ad provided a cost model to automatically select vertices for mirrorig so as to miimize the umber of messages. As we shall see i our experimets i Sectio 7.1, the mirrorig threshold computed by our cost model i Theorem 2 achieves ear-optimal performace. I additio, we also cope with the case where the message value depeds o the edge field, which is ot supported by GPS s LALP techique. 6. THE REQUEST-RESPOND PARADIGM I Sectios 1, 3.4 ad 3.5, we have show that bottleeck vertices ca be geerated by algorithm logic eve if the iput graph has o high-degree vertices. For hadlig such bottleeck vertices, the mirrorig techique of Sectio 5 is ot effective. To this ed, we desig our secod message reductio techique, which exteds the basic Pregel framework with a ew request-respod fuctioality. We illustrate the cocept usig the algorithms described i Sectio 3. Usig the request-respod API, attribute broadcast i Sectio 3.1 is straightforward to implemet: i superstep 1, each vertex v seds requests to each eighbor u Γ out v) for au); i superstep 2, the vertex v simply obtais au) respoded by each eighbor u, ad costructs Γ out v). Similarly, for the S-V algorithm i Sectio 3.4, whe a vertex v eeds to obtai D[w] from vertex w = D[v], it simply seds a request to w so that D[w] ca be used i the ext superstep; for the MSF algorithm i Sectio 3.5, a vertex v simply seds a request to its supervertex D[v] =s so that D[s] ca be used to update D[v] i the ext superstep. Request-Respod Message Chael. We ow explai i detail how Pregel+ supports the request-respod API. The requestrespod paradigm supports all the fuctioality of Pregel. I additio, it supplemets the vertex-to-vertex message chael Ch msg with a request-respod message chael, deoted by Ch req. Figure 10 illustrates how requests ad resposes are exchaged betwee two machies M i ad M j through Ch req. Specifically, each machie maitais M request sets, where M is the umber of machies, ad each request set S to k stores the requests to vertices i machie M k. I a superstep, a vertex v i machie M j may call requestu) i its compute) fuctio to sed request to vertex u for its attribute value au) which will be used i the ext superstep). Let hashu) =i, the the requested vertex u is i machie M i, ad hece u is added to the request set S to i of M j. Although may vertices i M j may sed request to u, oly oe request to u will be set from M j to M i sice S to i is a hash) set that elimiates redudat elemets. After compute) is called for all active vertices, the vertex-tovertex messages are first exchaged through Ch msg. The, each machie seds each request set S to k to machie M k. After the requests are exchaged, each machie receives M request sets, where set S from k stores the requests set from machie M k. I the example show i Figure 10, u is cotaied i the set S fromj i machie M i, sice vertex v i machie M j set request to u. The, a respose set R tok is costructed for each request set S from k received, which is to be set back to machie M k. I our example, the requested vertex, u S fromj, calls a user-specified fuctio respod) to retur its specified attribute au), ad adds the etry u, au) to the respose set R toj. Oce the respose sets are exchaged, each machie costructs a hash table from the received etries. I the example show i Fig- Figure 10: Illustratio of request-respod paradigm ure 10, the etry u, au) is received by machie M j sice it is i the respose set R toj i machie M i. The hash table is available for the ext superstep, where vertices ca access their requested value i their compute) fuctio. I our example, vertex v i machie M j may call get_respu) i the ext superstep, which looks up u s attribute au) from the hash table. The followig theorem shows the effectiveess of the requestrespod paradigm for message reductio. THEOREM 3. Let {v 1,v 2,...,v l } be the set of requesters that request the attribute au) from a vertex u. The, the requestrespod paradigm reduces the total umber of messages from 2l i Pregel s vertex-to-vertex message passig framework to 2 mim,l), where M is the umber of machies. PROOF. The proof follows directly from the fact that each machie seds at most 1 request to u eve though there may be more tha 1 requester i that machie, ad that at most 1 respod from u is set to each machie that makes a request to u, ad that there are at most mim,l) machies that cotai a requester. I the worst case, the request-respod paradigm uses the same umber of messages as Pregel s vertex-to-vertex message passig. But i practice, may Pregel algorithms e.g., those described i Sectios 3.4 ad 3.5) have bottleeck vertices with a large umber of requesters, leadig to imbalaced workload ad log elapsed ruig time. I such cases, our request-respod paradigm effectively bouds the umber of messages to the umber of machies cotaiig the requesters ad elimiates the imbalaced workload. Explicit Respodig. I the above discussio, a vertex v simply calls requestu) i oe superstep, ad it ca the call get_respu) i the ext superstep to get au). All the operatios icludig request exchage, respose set costructio, respose exchage, ad respose table costructio are performed by Pregel+ automatically ad are thus trasparet to users. We ame the above process as implicit respodig, where a respoder does ot kow the requester util a request is received. Whe a respoder w kows its requesters v, w ca explicitly call respodv) icompute), which adds w, w.respod) to the respose set R to j where j = hashv). This process is also illustrated i Figure 10. Explicit respodig is more cost-efficiet sice there is o eed for request exchage ad respose set costructio. Explicit respodig is useful i may applicatios. For example, to compute PageRak o a udirected graph, a vertex v ca simply call respodu) for each u Γv) to push av) =prv)/ Γv) to v s eighbors; this is because i the ext superstep, vertex u kows its eighbors Γu), ad ca thus collect their resposes. Similarly, i attribute broadcast, if the iput graph is udirected, each vertex v ca simply push its attribute av) to its eighbors. Note that 1314

9 Data Type V E AVG Deg Max Deg WebUK directed 133,633,040 5,507,679, ,429 LiveJoural directed 10,690, ,614, ,053,676 Twitter directed 52,579,682 1,963,263, ,958 BTC udirected 164,732, ,822, ,637,619 USA Road udirected 23,947,347 58,333, Figure 11: Datasets M = millio) data pushig by explicit respodig requires less messages tha by Pregel s vertex-to-vertex message passig, sice respods are set to machies more precisely, their respose tables) rather tha idividual vertices. Programmig Iterface. Pregel+ exteds the vertex class i Pregel s iterface [13] by requirig users to specify a additioal template argumet <R>, which idicates the type of the attribute value that a vertex respods. I compute), a vertex ca either pull data from aother vertex v by callig requestv), or push data to v by callig respodv). The attribute value that a vertex returs is defied by a user-specified abstract fuctio respod), which returs a value of type <R>. Like compute), oe may program respod) to retur differet attributes of a vertex i differet supersteps accordig to the algorithm logic of the specific applicatio. Fially, a vertex may call get_respv)i compute) to get the attribute of v, if it is pushed ito the respose table i the previous superstep. 7. EXPERIMENTAL RESULTS We ow evaluate the effectiveess of our message reductio techiques. We ra our experimets o a cluster of 16 machies, each with 24 processors two Itel Xeo E CPU) ad 48GB RAM. Oe machie is used as the master, while the other 15 machies act as slaves. The coectivity betwee ay pair of odes i the cluster is 1Gbps. We used five real-world datasets, as show i Figure 11: 1)WebUK 2 : a web graph geerated by combiig twelve mothly sapshots of the.uk domai collected for the DELIS project; 2)Live- Joural LJ) 3 : a bipartite etwork of LiveJoural users ad their group memberships; 3)Twitter 4 : Twitter who-follows-who etwork based o a sapshot take i 2009; 4)BTC 5 : a sematic graph coverted from the Billio Triple Challege 2009 RDF dataset; 5)USA 6 : the USA road etwork. LJ, Twitter ad BTC have skewed degree distributio; WebUK, LJ ad Twitter have relatively high average degree; USA ad WebUK have a large diameter. Pregel+ Implemetatio. Pregel+ is implemeted i C/C++ as a group of header files, ad users oly eed to iclude the ecessary base classes ad implemet the applicatio logic i their subclasses. Pregel+ commuicates with HDFS through libhdfs, a JNI based C API for HDFS. Each worker is simply a MPI process ad commuicatios are implemeted usig MPI commuicatio primitives. While oe may deploy Pregel+ with ay Hadoop ad MPI versio, we use Hadoop ad MPICH i our experimets. All programs are compiled usig GCC with -O2 optio eabled All the system source codes, as well as the source codes of the algorithms discussed i this paper, ca be foud i cse.cuhk.edu.hk/pregelplus. 7.1 Effectiveess of Mirrorig Figure 12 reports the performace gai by mirrorig. We measure the gai by comparig with 1)Pregel+ without both mirrorig ad combier, deoted by Pregel-oMC; 2)Pregel+ with combier but without mirrorig, deoted by Pregel-oM; ad 3)GPS [18] with ad without LALP. The request-respod techique is ot applied i Pregel+ for this set of experimets. As a referece, we also report the performace of Giraph [1] with combier) ad GraphLab 2.2 which icludes PowerGraph [5]). We test the mirrorig thresholds 1, 10, 100, 1000, ad the oe automatically set by the cost model give by Theorem 2 which is 199, 165, 62, 126, for WebUK, Twitter, LJ, BTC, respectively). But for the USA road etwork, its maximum vertex degree is oly 9 ad thus we do ot apply mirrorig with large thresholds. For GPS, we follow [8] ad fix the threshold of LALP as 100. This is a reasoable choice, sice [8] reports that this threshold achieves good performace i geeral, ad we fid that the best performace after tuig the threshold is very close to the performace whe the threshold is 100. We also report the preprocessig time of costructig mirrors for Pregel+ ad that of LALP for GPS i rows marked by Preproc Time. We also report the umber of messages set by Pregel+ ad GPS ote that Giraph does ot report the umber of messages, but the umber should be the same as that of Pregel-oMC ad Pregel-oM; while GraphLab does ot employ message passig). We ra PageRak o the three directed graphs, ad Hash-Mi o the two udirected graphs i Figure 11. For PageRak computatio, we use aggregator to check whether every vertex chages its PageRak value by less tha 0.01 after each superstep, ad termiate if so. The computatio takes 89, 89 ad 96 supersteps o WebUK, Twitter ad LJ, respectively, before covergece. We do ot ru GraphLab i asychroous mode for PageRak, sice its covergece coditio is differet from the sychroous versio ad hece leads to differet PageRak results. Mirrorig i Pregel+. As Figure 12 shows, mirrorig sigificatly improves the performace of Pregel-oM, i terms of the reductio i both ruig time ad message umber. The improvemet is particularly obvious for the graphs, Twitter, LJ, ad BTC, which have highly skewed degree distributio. Thus, the result also demostrates the effectiveess of mirrorig i workload balacig. Mirrorig is ot so effective for PageRak o WebUK, for which Pregel-oM has the best performace. The umber of messages is oly slightly decreased whe mirrorig threshold τ = 1000, ad yet it is still slower tha Pregel-oM. This is because messages set through Ch mir are itercepted by mirrors which icurs additioal cost. Sice the degree of the majority of the vertices i WebUK is ot very high, mirrorig does ot sigificatly reduce the umber of messages, ad thus, the additioal cost of Ch mir is ot paid off. The results also show that the mirrorig threshold give by our cost model achieves either the best performace, or close to the performace of the best threshold tested. The oe-off preprocessig time required to costruct the mirrors is also short compared with the computatio time. Compariso with Other Systems. Figure 12 shows that Pregel+ without mirrorig i.e., Pregel-oM) is already faster tha both Giraph ad GraphLab, which verifies that our Ch msg implemetatio is efficiet, ad thus the performace gai by mirrorig is ot a over-claimed improvemet gaied over a slow implemetatio. 1315

10 PageRak o WebUK PageRak o Twitter PageRak o LJ Hash o BTC Hash o USA Pregel+ with Mirrorig GraphLab Pregel+ GPS Mirrorig Thresholds Giraph Syc Asyc om omc Cost Model Basic LALP Comput Time 2669* Preproc Time # of Msgs * Comput Time * Preproc Time # of Msgs * Comput Time * Preproc Time # of Msgs * Comput Time * Preproc Time # of Msgs * Comput Time * Preproc Time 4.52 # of Msgs Figure 12: Effects of mirrorig : best result; Comput/Preproc time: Computatio/Preprocessig time i sec; # of Msgs: # of messages i millios) Compared with GPS, the reductio i both message umber ad ruig time achieved by the itegratio of mirrorig ad combier i Pregel+ is sigificatly more tha that achieved by LALP aloe i GPS, which ca be observed from 1)Pregel+ with mirrorig vs. Pregel-oMC, ad 2)GPS with LALP v.s. GPS without LALP. I cotrast to the claim i [18] that message combiig is ot effective, our result clearly demostrates the beefits of itegratig mirrorig ad combier, ad hece highlights the importace of our theoretical aalysis o the tradeoff betwee mirrorig ad message combiig i.e., Theorem 2). However, we otice that GPS is sometimes faster tha Pregel+ eve though much more messages are exchaged. We foud it hard to explai ad so we studied the codes of GPS to explore the reaso, which we explai below. GPS requires that vertex IDs should be itegers that are cotiguous startig from 0, 1,, V ; while other systems allow vertex IDs to be of ay user-specified type as log as a hash fuctio is provided for calculatig the ID of the worker that a vertex resides i). As a result of the dese ID represetatio, each worker i GPS simply maitais the icomig message buffers of the vertices by a array, ad whe a worker receives a message targeted at vertex tgt, it is put ito tgt s icomig message buffer i.e., I tgt ) whose positio i the array ca be directly computed from tgt. O the cotrary, systems like Pregel+ ad Giraph eed to look up I tgt from a hash table usig key tgt, which has extra cost for each message exchaged. We remark that there are good reasos to require vertex IDs to take arbitrary type, rather tha to hard-code them as cotiguous itegers. For example, the Pregel algorithm i [24] for computig bi-coected compoets costructs a auxiliary graph from the iput graph, ad each vertex of the auxiliary graph correspods to a edge u, v) of the iput graph. While we ca simply use iteger pair as vertex ID i Pregel+, usig GPS requires extra effort from programmers to relabel the vertices of the auxiliary graph with cotiguous iteger IDs, which ca be costly for a large graph. We ote that, if oe desires, he ca easily implemet GPS s dese vertex ID represetatio i Pregel+ to further improve the performace for certai algorithms, but this is ot the focus of our work which studies message reductio techiques. 7.2 Effectiveess of Request-Respod Techique Figure 13 reports the performace gaied by the request-respod techique. We test the three algorithms i Sectio 3 to which the request-respod techique is applicable: attribute broadcast, S-V ad miimum spaig forest. We also iclude Giraph ad GPS Pregel+ ReqResp Giraph GPS Pregel+ ReqResp Giraph GPS Attribute Broadcast o WebUK S-V o USA Time s s s s* s s* 690 s s Msg # M 2699 M* M 6598 M 3789 M* 6598M Attribute Broadcast o BTC S-V o BTC Time s s s 8.69 s* s s* 1531 s s Msg # M M* M M M* 22393M Attribute Broadcast o LJ Miimum Spaig Forest o USA Time s 9.09 s s 6.43 s* s* s s s Msg # M M* M M M* M Attribute Broadcast o Twitter Miimum Spaig Forest o BTC Time s s* s s s s* s s Msg # 3927 M 1396 M* 3927 M 2424 M 1110 M* 2424 M Figure 13: Effects of the request-respod techique as a referece. We do ot iclude GraphLab sice the algorithms caot be easily implemeted i GraphLab e.g., it is ot clear how a vertex v ca commuicate with a o-eighbor D[v] as i S-V ad miimum spaig forest). The results show that Pregel+ with request-respod, deoted by ReqResq, uses sigificatly less messages. For example, for attribute broadcast o WebUK, ReqResq reduces the message umber from 11,015 millio to oly 2,699 millio. ReqResq also records the shortest ruig time except i a few cases where GPS is faster due to the same reaso give i Sectio 7.1. Aother exceptio is whe computig miimum spaig forest o USA, where Pregel+ is faster without request-respod. This is because vertices i USA have very low degree, rederig the request-respod techique ieffective, ad the additioal computatioal overhead is ot paid off by the reductio i message umber. 8. CONCLUSIONS We preseted two techiques to reduce the amout of commuicatio ad to elimiate skewed commuicatio workload. The first techique, mirrorig, elimiates commuicatio bottleecks caused by high vertex degree, ad is trasparet to programmig. The secod techique is a ew request-respod paradigm, which elimiates bottleecks caused by program logic, ad simplifies the programmig of may Pregel algorithms. Our experimets o large real-world graphs verified that our techiques are effective i reducig the commuicatio cost ad overall computatio time. Ackowledgmets. We thak the aoymous reviewers for their costructive commets. This work is supported by SHIAE Grat No ). 1316

11 9. REFERENCES [1] Apache Giraph. [2] Z. Cai, Z. J. Gao, S. Luo, L. L. Perez, Z. Vagea, ad C. M. Jermaie. A compariso of platforms for implemetig ad ruig very large scale machie learig algorithms. I SIGMOD, pages , [3] J. Dea ad S. Ghemawat. Mapreduce: Simplified data processig o large clusters. I OSDI, pages , [4] B. Elser ad A. Motresor. A evaluatio study of bigdata frameworks for graph processig. I BigData Coferece, pages 60 67, [5] J. E. Gozalez, Y. Low, H. Gu, D. Bickso, ad C. Guestri. Powergraph: Distributed graph-parallel computatio o atural graphs. I OSDI, pages 17 30, [6] Y. Guo, M. Biczak, A. L. Varbaescu, A. Iosup, C. Martella, ad T. L. Willke. How well do graph-processig platforms perform? a empirical performace evaluatio ad aalysis. IPDPS, [7] Z. Gyögyi, H. Garcia-Molia, ad J. O. Pederse. Combatig web spam with trustrak. I VLDB, pages , [8] M. Ha, K. Daudjee, K. Ammar, M. T. Özsu, X. Wag, ad T. Ji. A experimetal compariso of Pregel-like graph processig systems. PVLDB, 712): , [9] G. Jeh ad J. Widom. Scalig persoalized web search. I WWW, pages , [10] Z. Khayyat, K. Awara, A. Aloazi, H. Jamjoom, D. Williams, ad P. Kalis. Miza: a system for dyamic load balacig i large-scale graph processig. I EuroSys, pages , [11] Y. Low, J. Gozalez, A. Kyrola, D. Bickso, C. Guestri, ad J. M. Hellerstei. Distributed graphlab: A framework for machie learig i the cloud. PVLDB, 58): , [12] Y. Lu, J. Cheg, D. Ya, ad H. Wu. Large-scale distributed graph computig systems: A experimetal evaluatio. PVLDB, 83): , [13] G. Malewicz, M. H. Auster, A. J. C. Bik, J. C. Dehert, I. Hor, N. Leiser, ad G. Czajkowski. Pregel: a system for large-scale graph processig. I SIGMOD Coferece, pages , [14] A. Mislove, M. Marco, P. K. Gummadi, P. Druschel, ad B. Bhattacharjee. Measuremet ad aalysis of olie social etworks. I SIGCOMM Coferece o Iteret Measuremet, pages 29 42, [15] J. Niu, J. Peg, C. Tog, ad W. Liao. Evolutio of discoected compoets i social etworks: Patters ad a geerative model. I Performace Computig ad Commuicatios Coferece IPCCC), 2012 IEEE 31st Iteratioal, pages IEEE, [16] L. Page, S. Bri, R. Motwai, ad T. Wiograd. The pagerak citatio rakig: Brigig order to the web [17] V. Rastogi, A. Machaavajjhala, L. Chitis, ad A. D. Sarma. Fidig coected compoets i map-reduce i logarithmic rouds. I ICDE, pages 50 61, [18] S. Salihoglu ad J. Widom. GPS: a graph processig system. I SSDBM, page 22, [19] S. Salihoglu ad J. Widom. Optimizig graph algorithms o pregel-like systems. PVLDB, 77): , [20] N. Satish, N. Sudaram, M. M. A. Patwary, J. Seo, J. Park, M. A. Hassaa, S. Segupta, Z. Yi, ad P. Dubey. Navigatig the maze of graph aalytics frameworks usig massive graph datasets. I SIGMOD Coferece, pages , [21] Z. Shag ad J. X. Yu. Catch the wid: Graph workload balacig o cloud. I ICDE, pages , [22] Y. Shiloach ad U. Vishki. A olog ) parallel coectivity algorithm. J. Algorithms, 31):57 67, [23] D. Ya, J. Cheg, Y. Lu, ad W. Ng. Blogel: A block-cetric framework for distributed computatio o real-world graphs. PVLDB, 714): , [24] D. Ya, J. Cheg, K. Xig, Y. Lu, W. Ng, ad Y. Bu. Pregel algorithms for graph coectivity problems with performace guaratees. PVLDB, 714): ,

Effective Techniques for Message Reduction and Load Balancing in Distributed Graph Computation

Effective Techniques for Message Reduction and Load Balancing in Distributed Graph Computation Effective Techiques for Message Reductio ad Load Balacig i Distributed Graph Computatio ABSTRACT Da Ya, James Cheg, Yi Lu Dept. of Computer Sciece ad Egieerig The Chiese Uiversity of Hog Kog {yada, jcheg,

More information

5 Boolean Decision Trees (February 11)

5 Boolean Decision Trees (February 11) 5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected

More information

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

Domain 1: Designing a SQL Server Instance and a Database Solution

Domain 1: Designing a SQL Server Instance and a Database Solution Maual SQL Server 2008 Desig, Optimize ad Maitai (70-450) 1-800-418-6789 Domai 1: Desigig a SQL Server Istace ad a Database Solutio Desigig for CPU, Memory ad Storage Capacity Requiremets Whe desigig a

More information

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

Lecture 2: Karger s Min Cut Algorithm

Lecture 2: Karger s Min Cut Algorithm priceto uiv. F 3 cos 5: Advaced Algorithm Desig Lecture : Karger s Mi Cut Algorithm Lecturer: Sajeev Arora Scribe:Sajeev Today s topic is simple but gorgeous: Karger s mi cut algorithm ad its extesio.

More information

(VCP-310) 1-800-418-6789

(VCP-310) 1-800-418-6789 Maual VMware Lesso 1: Uderstadig the VMware Product Lie I this lesso, you will first lear what virtualizatio is. Next, you ll explore the products offered by VMware that provide virtualizatio services.

More information

Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows:

Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows: Subettig Subettig is used to subdivide a sigle class of etwork i to multiple smaller etworks. Example: Your orgaizatio has a Class B IP address of 166.144.0.0 Before you implemet subettig, the Network

More information

Department of Computer Science, University of Otago

Department of Computer Science, University of Otago Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis Ruig Time ( 3.) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

CHAPTER 3 THE TIME VALUE OF MONEY

CHAPTER 3 THE TIME VALUE OF MONEY CHAPTER 3 THE TIME VALUE OF MONEY OVERVIEW A dollar i the had today is worth more tha a dollar to be received i the future because, if you had it ow, you could ivest that dollar ad ear iterest. Of all

More information

I. Chi-squared Distributions

I. Chi-squared Distributions 1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

More information

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical

More information

Systems Design Project: Indoor Location of Wireless Devices

Systems Design Project: Indoor Location of Wireless Devices Systems Desig Project: Idoor Locatio of Wireless Devices Prepared By: Bria Murphy Seior Systems Sciece ad Egieerig Washigto Uiversity i St. Louis Phoe: (805) 698-5295 Email: bcm1@cec.wustl.edu Supervised

More information

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design A Combied Cotiuous/Biary Geetic Algorithm for Microstrip Atea Desig Rady L. Haupt The Pesylvaia State Uiversity Applied Research Laboratory P. O. Box 30 State College, PA 16804-0030 haupt@ieee.org Abstract:

More information

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments Project Deliverables CS 361, Lecture 28 Jared Saia Uiversity of New Mexico Each Group should tur i oe group project cosistig of: About 6-12 pages of text (ca be loger with appedix) 6-12 figures (please

More information

Domain 1 - Describe Cisco VoIP Implementations

Domain 1 - Describe Cisco VoIP Implementations Maual ONT (642-8) 1-800-418-6789 Domai 1 - Describe Cisco VoIP Implemetatios Advatages of VoIP Over Traditioal Switches Voice over IP etworks have may advatages over traditioal circuit switched voice etworks.

More information

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee

More information

Incremental calculation of weighted mean and variance

Incremental calculation of weighted mean and variance Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically

More information

Asymptotic Growth of Functions

Asymptotic Growth of Functions CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

CHAPTER 3 DIGITAL CODING OF SIGNALS

CHAPTER 3 DIGITAL CODING OF SIGNALS CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

Designing Incentives for Online Question and Answer Forums

Designing Incentives for Online Question and Answer Forums Desigig Icetives for Olie Questio ad Aswer Forums Shaili Jai School of Egieerig ad Applied Scieces Harvard Uiversity Cambridge, MA 0238 USA shailij@eecs.harvard.edu Yilig Che School of Egieerig ad Applied

More information

Domain 1: Configuring Domain Name System (DNS) for Active Directory

Domain 1: Configuring Domain Name System (DNS) for Active Directory Maual Widows Domai 1: Cofigurig Domai Name System (DNS) for Active Directory Cofigure zoes I Domai Name System (DNS), a DNS amespace ca be divided ito zoes. The zoes store ame iformatio about oe or more

More information

1 Computing the Standard Deviation of Sample Means

1 Computing the Standard Deviation of Sample Means Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.

More information

INVESTMENT PERFORMANCE COUNCIL (IPC)

INVESTMENT PERFORMANCE COUNCIL (IPC) INVESTMENT PEFOMANCE COUNCIL (IPC) INVITATION TO COMMENT: Global Ivestmet Performace Stadards (GIPS ) Guidace Statemet o Calculatio Methodology The Associatio for Ivestmet Maagemet ad esearch (AIM) seeks

More information

Reliability Analysis in HPC clusters

Reliability Analysis in HPC clusters Reliability Aalysis i HPC clusters Narasimha Raju, Gottumukkala, Yuda Liu, Chokchai Box Leagsuksu 1, Raja Nassar, Stephe Scott 2 College of Egieerig & Sciece, Louisiaa ech Uiversity Oak Ridge Natioal Lab

More information

Tradigms of Astundithi and Toyota

Tradigms of Astundithi and Toyota Tradig the radomess - Desigig a optimal tradig strategy uder a drifted radom walk price model Yuao Wu Math 20 Project Paper Professor Zachary Hamaker Abstract: I this paper the author iteds to explore

More information

A probabilistic proof of a binomial identity

A probabilistic proof of a binomial identity A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two

More information

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics Chair for Network Architectures ad Services Istitute of Iformatics TU Müche Prof. Carle Network Security Chapter 2 Basics 2.4 Radom Number Geeratio for Cryptographic Protocols Motivatio It is crucial to

More information

How to read A Mutual Fund shareholder report

How to read A Mutual Fund shareholder report Ivestor BulletI How to read A Mutual Fud shareholder report The SEC s Office of Ivestor Educatio ad Advocacy is issuig this Ivestor Bulleti to educate idividual ivestors about mutual fud shareholder reports.

More information

Recovery time guaranteed heuristic routing for improving computation complexity in survivable WDM networks

Recovery time guaranteed heuristic routing for improving computation complexity in survivable WDM networks Computer Commuicatios 30 (2007) 1331 1336 wwwelseviercom/locate/comcom Recovery time guarateed heuristic routig for improvig computatio complexity i survivable WDM etworks Lei Guo * College of Iformatio

More information

ODBC. Getting Started With Sage Timberline Office ODBC

ODBC. Getting Started With Sage Timberline Office ODBC ODBC Gettig Started With Sage Timberlie Office ODBC NOTICE This documet ad the Sage Timberlie Office software may be used oly i accordace with the accompayig Sage Timberlie Office Ed User Licese Agreemet.

More information

On the Capacity of Hybrid Wireless Networks

On the Capacity of Hybrid Wireless Networks O the Capacity of Hybrid ireless Networks Beyua Liu,ZheLiu +,DoTowsley Departmet of Computer Sciece Uiversity of Massachusetts Amherst, MA 0002 + IBM T.J. atso Research Ceter P.O. Box 704 Yorktow Heights,

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may

More information

Hypergeometric Distributions

Hypergeometric Distributions 7.4 Hypergeometric Distributios Whe choosig the startig lie-up for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you

More information

MTO-MTS Production Systems in Supply Chains

MTO-MTS Production Systems in Supply Chains NSF GRANT #0092854 NSF PROGRAM NAME: MES/OR MTO-MTS Productio Systems i Supply Chais Philip M. Kamisky Uiversity of Califoria, Berkeley Our Kaya Uiversity of Califoria, Berkeley Abstract: Icreasig cost

More information

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value

More information

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection The aalysis of the Courot oligopoly model cosiderig the subjective motive i the strategy selectio Shigehito Furuyama Teruhisa Nakai Departmet of Systems Maagemet Egieerig Faculty of Egieerig Kasai Uiversity

More information

Properties of MLE: consistency, asymptotic normality. Fisher information.

Properties of MLE: consistency, asymptotic normality. Fisher information. Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout

More information

The Stable Marriage Problem

The Stable Marriage Problem The Stable Marriage Problem William Hut Lae Departmet of Computer Sciece ad Electrical Egieerig, West Virgiia Uiversity, Morgatow, WV William.Hut@mail.wvu.edu 1 Itroductio Imagie you are a matchmaker,

More information

A Distributed Dynamic Load Balancer for Iterative Applications

A Distributed Dynamic Load Balancer for Iterative Applications A Distributed Dyamic Balacer for Iterative Applicatios Harshitha Meo, Laxmikat Kalé Departmet of Computer Sciece, Uiversity of Illiois at Urbaa-Champaig {gplkrsh2,kale}@illiois.edu ABSTRACT For may applicatios,

More information

Chatpun Khamyat Department of Industrial Engineering, Kasetsart University, Bangkok, Thailand ocpky@hotmail.com

Chatpun Khamyat Department of Industrial Engineering, Kasetsart University, Bangkok, Thailand ocpky@hotmail.com SOLVING THE OIL DELIVERY TRUCKS ROUTING PROBLEM WITH MODIFY MULTI-TRAVELING SALESMAN PROBLEM APPROACH CASE STUDY: THE SME'S OIL LOGISTIC COMPANY IN BANGKOK THAILAND Chatpu Khamyat Departmet of Idustrial

More information

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return EVALUATING ALTERNATIVE CAPITAL INVESTMENT PROGRAMS By Ke D. Duft, Extesio Ecoomist I the March 98 issue of this publicatio we reviewed the procedure by which a capital ivestmet project was assessed. The

More information

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5

More information

Configuring Additional Active Directory Server Roles

Configuring Additional Active Directory Server Roles Maual Upgradig your MCSE o Server 2003 to Server 2008 (70-649) 1-800-418-6789 Cofigurig Additioal Active Directory Server Roles Active Directory Lightweight Directory Services Backgroud ad Cofiguratio

More information

Digital Enterprise Unit. White Paper. Web Analytics Measurement for Responsive Websites

Digital Enterprise Unit. White Paper. Web Analytics Measurement for Responsive Websites Digital Eterprise Uit White Paper Web Aalytics Measuremet for Resposive Websites About the Authors Vishal Machewad Vishal Machewad has over 13 years of experiece i sales ad marketig, havig worked as a

More information

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem Lecture 4: Cauchy sequeces, Bolzao-Weierstrass, ad the Squeeze theorem The purpose of this lecture is more modest tha the previous oes. It is to state certai coditios uder which we are guarateed that limits

More information

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5 Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.

More information

How To Understand The Theory Of Coectedess

How To Understand The Theory Of Coectedess 35 Chapter 1: Fudametal Cocepts Sectio 1.3: Vertex Degrees ad Coutig 36 its eighbor o P. Note that P has at least three vertices. If G x v is coected, let y = v. Otherwise, a compoet cut off from P x v

More information

The Power of Free Branching in a General Model of Backtracking and Dynamic Programming Algorithms

The Power of Free Branching in a General Model of Backtracking and Dynamic Programming Algorithms The Power of Free Brachig i a Geeral Model of Backtrackig ad Dyamic Programmig Algorithms SASHKA DAVIS IDA/Ceter for Computig Scieces Bowie, MD sashka.davis@gmail.com RUSSELL IMPAGLIAZZO Dept. of Computer

More information

Capacity of Wireless Networks with Heterogeneous Traffic

Capacity of Wireless Networks with Heterogeneous Traffic Capacity of Wireless Networks with Heterogeeous Traffic Migyue Ji, Zheg Wag, Hamid R. Sadjadpour, J.J. Garcia-Lua-Aceves Departmet of Electrical Egieerig ad Computer Egieerig Uiversity of Califoria, Sata

More information

Optimization of Large Data in Cloud computing using Replication Methods

Optimization of Large Data in Cloud computing using Replication Methods Optimizatio of Large Data i Cloud computig usig Replicatio Methods Vijaya -Kumar-C, Dr. G.A. Ramachadhra Computer Sciece ad Techology, Sri Krishadevaraya Uiversity Aatapuramu, AdhraPradesh, Idia Abstract-Cloud

More information

Measures of Spread and Boxplots Discrete Math, Section 9.4

Measures of Spread and Boxplots Discrete Math, Section 9.4 Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,

More information

Engineering Data Management

Engineering Data Management BaaERP 5.0c Maufacturig Egieerig Data Maagemet Module Procedure UP128A US Documetiformatio Documet Documet code : UP128A US Documet group : User Documetatio Documet title : Egieerig Data Maagemet Applicatio/Package

More information

Sequences and Series

Sequences and Series CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

More information

CCH Accountants Starter Pack

CCH Accountants Starter Pack CCH Accoutats Starter Pack We may be a bit smaller, but fudametally we re o differet to ay other accoutig practice. Util ow, smaller firms have faced a stark choice: Buy cheaply, kowig that the practice

More information

Unicenter TCPaccess FTP Server

Unicenter TCPaccess FTP Server Uiceter TCPaccess FTP Server Release Summary r6.1 SP2 K02213-2E This documetatio ad related computer software program (hereiafter referred to as the Documetatio ) is for the ed user s iformatioal purposes

More information

International Journal on Emerging Technologies 1(2): 48-56(2010) ISSN : 0975-8364

International Journal on Emerging Technologies 1(2): 48-56(2010) ISSN : 0975-8364 e t Iteratioal Joural o Emergig Techologies (): 48-56(00) ISSN : 0975-864 Dyamic load balacig i distributed ad high performace parallel eterprise computig by embeddig MPI ad ope MP Sadip S. Chauha, Sadip

More information

Lesson 15 ANOVA (analysis of variance)

Lesson 15 ANOVA (analysis of variance) Outlie Variability -betwee group variability -withi group variability -total variability -F-ratio Computatio -sums of squares (betwee/withi/total -degrees of freedom (betwee/withi/total -mea square (betwee/withi

More information

The Fundamental Capacity-Delay Tradeoff in Large Mobile Ad Hoc Networks

The Fundamental Capacity-Delay Tradeoff in Large Mobile Ad Hoc Networks The Fudametal Capacity-Delay Tradeoff i Large Mobile Ad Hoc Networks Xiaoju Li ad Ness B. Shroff School of Electrical ad Computer Egieerig, Purdue Uiversity West Lafayette, IN 47907, U.S.A. {lix, shroff}@ec.purdue.edu

More information

MARTINGALES AND A BASIC APPLICATION

MARTINGALES AND A BASIC APPLICATION MARTINGALES AND A BASIC APPLICATION TURNER SMITH Abstract. This paper will develop the measure-theoretic approach to probability i order to preset the defiitio of martigales. From there we will apply this

More information

Optimize your Network. In the Courier, Express and Parcel market ADDING CREDIBILITY

Optimize your Network. In the Courier, Express and Parcel market ADDING CREDIBILITY Optimize your Network I the Courier, Express ad Parcel market ADDING CREDIBILITY Meetig today s challeges ad tomorrow s demads Aswers to your key etwork challeges ORTEC kows the highly competitive Courier,

More information

Professional Networking

Professional Networking Professioal Networkig 1. Lear from people who ve bee where you are. Oe of your best resources for etworkig is alumi from your school. They ve take the classes you have take, they have bee o the job market

More information

Convention Paper 6764

Convention Paper 6764 Audio Egieerig Society Covetio Paper 6764 Preseted at the 10th Covetio 006 May 0 3 Paris, Frace This covetio paper has bee reproduced from the author's advace mauscript, without editig, correctios, or

More information

5: Introduction to Estimation

5: Introduction to Estimation 5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample

More information

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations CS3A Hadout 3 Witer 00 February, 00 Solvig Recurrece Relatios Itroductio A wide variety of recurrece problems occur i models. Some of these recurrece relatios ca be solved usig iteratio or some other ad

More information

Simple Annuities Present Value.

Simple Annuities Present Value. Simple Auities Preset Value. OBJECTIVES (i) To uderstad the uderlyig priciple of a preset value auity. (ii) To use a CASIO CFX-9850GB PLUS to efficietly compute values associated with preset value auities.

More information

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here). BEGINNING ALGEBRA Roots ad Radicals (revised summer, 00 Olso) Packet to Supplemet the Curret Textbook - Part Review of Square Roots & Irratioals (This portio ca be ay time before Part ad should mostly

More information

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009) 18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the Bru-Mikowski iequality for boxes. Today we ll go over the

More information

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth Questio 1: What is a ordiary auity? Let s look at a ordiary auity that is certai ad simple. By this, we mea a auity over a fixed term whose paymet period matches the iterest coversio period. Additioally,

More information

A Constant-Factor Approximation Algorithm for the Link Building Problem

A Constant-Factor Approximation Algorithm for the Link Building Problem A Costat-Factor Approximatio Algorithm for the Lik Buildig Problem Marti Olse 1, Aastasios Viglas 2, ad Ilia Zvedeiouk 2 1 Ceter for Iovatio ad Busiess Developmet, Istitute of Busiess ad Techology, Aarhus

More information

Hypothesis testing. Null and alternative hypotheses

Hypothesis testing. Null and alternative hypotheses Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

More information

Baan Service Master Data Management

Baan Service Master Data Management Baa Service Master Data Maagemet Module Procedure UP069A US Documetiformatio Documet Documet code : UP069A US Documet group : User Documetatio Documet title : Master Data Maagemet Applicatio/Package :

More information

NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff,

NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff, NEW HIGH PERFORMNCE COMPUTTIONL METHODS FOR MORTGGES ND NNUITIES Yuri Shestopaloff, Geerally, mortgage ad auity equatios do ot have aalytical solutios for ukow iterest rate, which has to be foud usig umerical

More information

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k. 18.409 A Algorithmist s Toolkit September 17, 009 Lecture 3 Lecturer: Joatha Keler Scribe: Adre Wibisoo 1 Outlie Today s lecture covers three mai parts: Courat-Fischer formula ad Rayleigh quotiets The

More information

Finding the circle that best fits a set of points

Finding the circle that best fits a set of points Fidig the circle that best fits a set of poits L. MAISONOBE October 5 th 007 Cotets 1 Itroductio Solvig the problem.1 Priciples............................... Iitializatio.............................

More information

CS100: Introduction to Computer Science

CS100: Introduction to Computer Science Review: History of Computers CS100: Itroductio to Computer Sciece Maiframes Miicomputers Lecture 2: Data Storage -- Bits, their storage ad mai memory Persoal Computers & Workstatios Review: The Role of

More information

Evaluating Model for B2C E- commerce Enterprise Development Based on DEA

Evaluating Model for B2C E- commerce Enterprise Development Based on DEA , pp.180-184 http://dx.doi.org/10.14257/astl.2014.53.39 Evaluatig Model for B2C E- commerce Eterprise Developmet Based o DEA Weli Geg, Jig Ta Computer ad iformatio egieerig Istitute, Harbi Uiversity of

More information

Dynamic House Allocation

Dynamic House Allocation Dyamic House Allocatio Sujit Gujar 1 ad James Zou 2 ad David C. Parkes 3 Abstract. We study a dyamic variat o the house allocatio problem. Each aget ows a distict object (a house) ad is able to trade its

More information

Pre-Suit Collection Strategies

Pre-Suit Collection Strategies Pre-Suit Collectio Strategies Writte by Charles PT Phoeix How to Decide Whether to Pursue Collectio Calculatig the Value of Collectio As with ay busiess litigatio, all factors associated with the process

More information

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8 CME 30: NUMERICAL LINEAR ALGEBRA FALL 005/06 LECTURE 8 GENE H GOLUB 1 Positive Defiite Matrices A matrix A is positive defiite if x Ax > 0 for all ozero x A positive defiite matrix has real ad positive

More information

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find 1.8 Approximatig Area uder a curve with rectagles 1.6 To fid the area uder a curve we approximate the area usig rectagles ad the use limits to fid 1.4 the area. Example 1 Suppose we wat to estimate 1.

More information

The Power of Both Choices: Practical Load Balancing for Distributed Stream Processing Engines

The Power of Both Choices: Practical Load Balancing for Distributed Stream Processing Engines The Power of Both Choices: Practical Load Balacig for Distributed Stream Processig Egies Muhammad Ais Uddi Nasir #1, Giamarco De Fracisci Morales 2, David García-Soriao 3 Nicolas Kourtellis 4, Marco Serafii

More information

Conceptualization with Incremental Bron- Kerbosch Algorithm in Big Data Architecture

Conceptualization with Incremental Bron- Kerbosch Algorithm in Big Data Architecture Acta Polytechica Hugarica Vol. 13, No. 2, 2016 Coceptualizatio with Icremetal Bro- Kerbosch Algorithm i Big Data Architecture László Kovács 1, Gábor Szabó 2 1 Uiversity of Miskolc, Istitute of Iformatio

More information

Filtering: A Method for Solving Graph Problems in MapReduce

Filtering: A Method for Solving Graph Problems in MapReduce Filterig: A Method for Solvig Graph Problems i MapReduce Silvio Lattazi Google, Ic. New York, NY, USA silviolat@gmail.com Bejami Moseley Uiversity of Illiois Urbaa, IL, USA bmosele@illiois.edu Sergei Vassilvitskii

More information

Amendments to employer debt Regulations

Amendments to employer debt Regulations March 2008 Pesios Legal Alert Amedmets to employer debt Regulatios The Govermet has at last issued Regulatios which will amed the law as to employer debts uder s75 Pesios Act 1995. The amedig Regulatios

More information

Concept: Types of algorithms

Concept: Types of algorithms Discrete Math for Bioiformatics WS 10/11:, by A. Bockmayr/K. Reiert, 18. Oktober 2010, 21:22 1001 Cocept: Types of algorithms The expositio is based o the followig sources, which are all required readig:

More information

Quantitative Computer Architecture

Quantitative Computer Architecture Performace Measuremet ad Aalysis i Computer Quatitative Computer Measuremet Model Iovatio Proposed How to measure, aalyze, ad specify computer system performace or My computer is faster tha your computer!

More information

CS103X: Discrete Structures Homework 4 Solutions

CS103X: Discrete Structures Homework 4 Solutions CS103X: Discrete Structures Homewor 4 Solutios Due February 22, 2008 Exercise 1 10 poits. Silico Valley questios: a How may possible six-figure salaries i whole dollar amouts are there that cotai at least

More information

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature. Itegrated Productio ad Ivetory Cotrol System MRP ad MRP II Framework of Maufacturig System Ivetory cotrol, productio schedulig, capacity plaig ad fiacial ad busiess decisios i a productio system are iterrelated.

More information

BaanERP 5.0c. EDI User Guide

BaanERP 5.0c. EDI User Guide BaaERP 5.0c A publicatio of: Baa Developmet B.V. P.O.Box 143 3770 AC Bareveld The Netherlads Prited i the Netherlads Baa Developmet B.V. 1999. All rights reserved. The iformatio i this documet is subject

More information

4. Trees. 4.1 Basics. Definition: A graph having no cycles is said to be acyclic. A forest is an acyclic graph.

4. Trees. 4.1 Basics. Definition: A graph having no cycles is said to be acyclic. A forest is an acyclic graph. 4. Trees Oe of the importat classes of graphs is the trees. The importace of trees is evidet from their applicatios i various areas, especially theoretical computer sciece ad molecular evolutio. 4.1 Basics

More information

LOAD BALANCING IN PUBLIC CLOUD COMBINING THE CONCEPTS OF DATA MINING AND NETWORKING

LOAD BALANCING IN PUBLIC CLOUD COMBINING THE CONCEPTS OF DATA MINING AND NETWORKING LOAD BALACIG I PUBLIC CLOUD COMBIIG THE COCEPTS OF DATA MIIG AD ETWORKIG Priyaka R M. Tech Studet, Dept. of Computer Sciece ad Egieerig, AIET, Karataka, Idia Abstract Load balacig i the cloud computig

More information

Perfect Packing Theorems and the Average-Case Behavior of Optimal and Online Bin Packing

Perfect Packing Theorems and the Average-Case Behavior of Optimal and Online Bin Packing SIAM REVIEW Vol. 44, No. 1, pp. 95 108 c 2002 Society for Idustrial ad Applied Mathematics Perfect Packig Theorems ad the Average-Case Behavior of Optimal ad Olie Bi Packig E. G. Coffma, Jr. C. Courcoubetis

More information

Class Meeting # 16: The Fourier Transform on R n

Class Meeting # 16: The Fourier Transform on R n MATH 18.152 COUSE NOTES - CLASS MEETING # 16 18.152 Itroductio to PDEs, Fall 2011 Professor: Jared Speck Class Meetig # 16: The Fourier Trasform o 1. Itroductio to the Fourier Trasform Earlier i the course,

More information

Escola Federal de Engenharia de Itajubá

Escola Federal de Engenharia de Itajubá Escola Federal de Egeharia de Itajubá Departameto de Egeharia Mecâica Pós-Graduação em Egeharia Mecâica MPF04 ANÁLISE DE SINAIS E AQUISÇÃO DE DADOS SINAIS E SISTEMAS Trabalho 02 (MATLAB) Prof. Dr. José

More information