Filtering: A Method for Solving Graph Problems in MapReduce

Size: px
Start display at page:

Download "Filtering: A Method for Solving Graph Problems in MapReduce"

Transcription

1 Filterig: A Method for Solvig Graph Problems i MapReduce Silvio Lattazi Google, Ic. New York, NY, USA silviolat@gmail.com Bejami Moseley Uiversity of Illiois Urbaa, IL, USA bmosele@illiois.edu Sergei Vassilvitskii Yahoo! Research New York, NY, USA sergei@yahoo-ic.com Siddharth Suri Yahoo! Research New York, NY, USA ssuri@yahoo-ic.com ABSTRACT The MapReduce framework is curretly the de facto stadard used throughout both idustry ad academia for petabyte scale data aalysis. As the iput to a typical MapReduce computatio is large, oe of the key requiremets of the framework is that the iput caot be stored o a sigle machie ad must be processed i parallel. I this paper we describe a geeral algorithmic desig techique i the MapReduce framework called filterig. The mai idea behid filterig is to reduce the size of the iput i a distributed fashio so that the resultig, much smaller, problem istace ca be solved o a sigle machie. Usig this approach we give ew algorithms i the MapReduce framework for a variety of fudametal graph problems for sufficietly dese graphs. Specifically, we preset algorithms for miimum spaig trees, maximal matchigs, approximate weighted matchigs, approximate vertex ad edge covers ad miimum cuts. I all of these cases, we parameterize our algorithms by the amout of memory available o the machies allowig us to show tradeoffs betwee the memory available ad the umber of MapReduce rouds. For each settig we will show that eve if the machies are oly give substatially subliear memory, our algorithms ru i a costat umber of MapReduce rouds. To demostrate the practical viability of our algorithms we implemet the maximal matchig algorithm that lies at the core of our aalysis ad show that it achieves a sigificat speedup over the sequetial versio. Categories ad Subject Descriptors F.. [Aalysis of Algorithms ad Problem Complexity]: Noumerical Algorithms ad Problems Work doe while visitig Yahoo! Labs. Work doe while visitig Yahoo! Labs. Partially supported by NSF grats CCF ad CCF Permissio to make digital or hard copies of all or part of this work for persoal or classroom use is grated without fee provided that copies are ot made or distributed for profit or commercial advatage ad that copies bear this otice ad the full citatio o the first page. To copy otherwise, to republish, to post o servers or to redistribute to lists, requires prior specific permissio ad/or a fee. SPAA 11, Jue 4 6, 011, Sa Jose, Califoria, USA. Copyright 011 ACM /11/06...$ Geeral Terms Algorithms, Theory Keywords MapReduce, Graph Algorithms, Matchigs 1. INTRODUCTION The amout of data available ad requirig aalysis has grow at a astoishig rate i recet years. For example, Yahoo! processes over 100 billio evets, amoutig to over 10 terabytes, daily [7]. Similarly, Facebook processes over 80 terabytes of data per day [17]. Although the amout of memory i commercially available servers has also grow at a remarkable pace i the past decade, ad ow exceeds a oce uthikable amout of 100 GB, it remais woefully iadequate to process such huge amouts of data. To cope with this deluge of iformatio people have (agai) tured to parallel algorithms for data processig. I recet years MapReduce [3], ad its ope source implemetatio, Hadoop [0], have emerged as the stadard platform for large scale distributed computatio. About 5 years ago, Google reported that it processes over 3 petabytes of data usig MapReduce i oe moth [3]. Yahoo! ad Facebook use Hadoop as their primary method for aalyzig massive data sets [7, 17]. Moreover, over 100 compaies ad 10 uiversities are usig Hadoop [6, 1] for large scale data aalysis. May differet types of data have cotributed to this growth. Oe particularly rich datatype that has captured the iterest of both idustry ad academia is massive graphs. Graphs such as the World Wide Web ca easily cosist of billios of odes ad trillios of edges [16]. Citatio graphs, affiliatio graphs, istat messeger graphs, ad phoe call graphs have recetly bee studied as part of social etwork aalysis. Although it was previously thought that graphs of this ature are sparse, the work of Leskovec, Kleiberg ad Faloutsos [14] dispelled this otio. The authors aalyzed the growth over time of 9 differet massive graphs from 4 differet domais ad showed that graphs grow dese over time. Specifically, if (t) ad e(t) deote the umber of odes ad edges at time t, respectively, they show that e(t) (t) 1+c, where 1 c > 0. They lowest value of c they fid is 0.08, but they observe three graphs with c > 0.5. The algorithms we preset are efficiet for such dese graphs, as well as their sparser couterparts. Previous approaches to graph algorithms o MapReduce attempt to shoehor message passig style algorithms ito the framework

2 [9, 15]. These algorithms ofte require O(d) rouds, where d is the diameter of the iput graph, eve for such simple tasks as computig coected compoets, miimum spaig trees, etc. A roud i a MapReduce computatio ca be very expesive time-wise, because it ofte requires a massive amout of data (o the order of terabytes) to be trasmitted from oe set of machies to aother. This is usually the domiat cost i a MapReduce computatio. Therefore miimizig the umber of rouds is essetial for efficiet MapReduce computatios. I this work we show how may fudametal graph algorithms ca be computed i a costat umber of rouds. We use the previously defied model of computatio for MapReduce [13] to perform our aalysis. 1.1 Cotributios All of our algorithms take the same geeral approach, which we call filterig. They proceed i two stages. They begi by usig the parallelizatio of MapReduce to selectively drop, or filter, parts of the iput with the goal of reducig the problem size so that the result is small eough to fit ito a sigle machie s memory. I the secod stage the algorithms compute the fial aswer o this reduced iput. The techical challege is to choose eough edges to drop but still be able to compute either a optimal or provably ear optimal solutio. The filterig step differs i complexity depedig o the problem ad takes a few slightly differet forms. We exhibit the flexibility of this approach by showig how it ca be used to solve a variety of graph problems. I Sectio.4 we apply the filterig techique to computig the coected compoets ad miimum spaig trees of dese graphs. The algorithm, which is much simpler, ad more efficiet algorithm tha the the oe that appeared i [13], partitios the origial iput ad solves a subproblem o each partitio. The algorithm recurses util the data set is small eough to fit ito the memory of a sigle machie. I Sectio 3, we tur to the problem of matchigs, ad show how to compute a maximal matchig i three MapReduce rouds i the model of [13]. The algorithm works by solvig a subproblem o a small sample of the origial iput. We the use this iterim solutio to prue out the vast majority of edges of the origial iput, thus dramatically reducig the size of the remaiig problem, ad recursig if it is ot small eough to fit oto a sigle machie. The algorithm allows for a tradeoff betwee the umber of rouds ad the available memory. Specifically, for graphs with at most 1+c edges ad machies with memory at least 1+ɛ our algorithm will require O(c/ɛ) rouds. If the machies have memory O() the our algorithm requires O(log ) rouds. We the use this algorithm as a buildig block, ad show algorithms for computig a 8-approximatio for maximum weighted matchig, a -approximatio to Vertex Cover ad a 3 /- approximatio to Edge Cover. For all of these algorithms, the umber of machies used will be at most O(N/η) where N is the size of the iput ad η is the memory available o each machie. That is, these algorithms require just eough machies to fit the etire iput o all of the machies. Fially, i Sectio 4 we adapt the semial work of Karger [10] to the MapReduce settig. Here the filterig succeeds with a limited probability; however, we argue that we ca replicate the algorithm eough times i parallel so that oe of the rus succeeds without destroyig the miimum cut. 1. Related Work The authors of [13] give a formal model of computatio of MapReduce called MRC which we will briefly summarize i the ext sectio. There are two models of computatio that are similar to MRC. We describe these models ad their relatioship to MRC i tur. We also discuss how kow algorithms i those models relate to the algorithms preseted i this work. The algorithms preseted i this paper ru i a costat umber of rouds whe the memory per machie is superliear i the umber of vertices ( 1+ɛ for some ɛ > 0). Although this requiremet is remiiscet of the semi-streamig model [4], the similarities ed there, as the two models are very differet. Oe problem that is hard to solve i semi-streamig but ca be solved i MRC is graph coectivity. As show i [4], i the semi-streamig model, without a superliear amout of memory it is impossible to aswer coectivity queries. I MRC, however, previous work [13] shows how to aswer coectivity queries whe the memory per machie is limited to 1 ɛ, albeit at the cost of a logarithmic umber of rouds. Coversely, a problem that is trivial i the semi-streamig model but more complicated i MRC is fidig a maximal matchig. I the semi-streamig model oe simply streams through the edges, ad adds the edge to the curret matchig if it is feasible. As we show i Sectio 3, fidig a maximal matchig i MRC is a computable, but o-trivial edeavor. The techical challege for this algorithm stems from the fact that o sigle machie ca see all of the edges of the iput graph, rather the model requires the algorithm desiger to parallelize the processig 1. Although parallel algorithms are gaiig a resurgece, this is a area that was widely studied previously uder differet models of parallel computatio. The most popular model is the PRAM model, which allows for a polyomial umber of processors with shared memory. There are hudreds of papers for solvig problems i this model ad previous work [5, 13] shows how to simulate certai types of PRAM algorithms i MRC. Most of these results yield MRC algorithms that require Ω(log ) rouds, whereas i this work we focus o algorithms that use O(1) rouds. Noetheless, to compare with previous work, we ext describe PRAM algorithms that either ca be simulated i MRC, or could be directly implemeted i MRC. Israel ad Itai [8] give a O(log ) roud algorithm for computig maximal matchigs o a PRAM. It could be implemeted i MRC, but would require O(log ) rouds. Similarly, [19] gives a distributed algorithm which yields costat factor approximatio to the weighted matchig problem. This algorithm, which could also be implemeted i MRC, takes O(log ) rouds. Fially, Karger s algorithm is i RN C but also requires O(log ) rouds. We show how to implemet it i MapReduce i a costat umber of rouds i Sectio 4.. PRELIMINARIES.1 MapReduce Overview We remid the reader about the saliet features of the MapReduce computig paradigm (see [13] for more details). The iput, ad all itermediate data, is stored i key; value pairs ad the computatio proceeds i rouds. Each roud is split ito three cosecutive phases: map, shuffle ad reduce. I the map phase the iput is processed oe tuple at a time. All key; value pairs emitted by the map phase which have the same key are the aggregated by the MapReduce system durig the shuffle phase ad set to the same machie. Fially each key, alog with all the values associated with it, are processed together durig the reduce phase. Sice all the values with the same key ed up o the same machie, oe ca view the map phase as a kid of routig step that determies which values ed up together. The key acts as a (logi- 1 I practice this requiremet stems from the fact that eve streamig through a terabyte of data requires a o-trivial amout of time as the machie remais IO boud.

3 cal) address of the machie, ad the system makes sure all of the key; value pairs with the same key are collected o the same machie. To simplify our reasoig about the model, we ca combie the reduce ad the subsequet map phase. Lookig at the computatio through this les, every roud each machie performs some computatio o the set of key; value pairs assiged to it (reduce phase), ad the desigates which machie each output value should be set to i the ext roud (map phase). The shuffle esures that the data is moved to the right machie, after which the ext roud of computatio ca begi. I this simpler model, we shall oly use the term machies as opposed to mappers ad reducers. More formally, let ρ j deote the reduce fuctio for roud j, ad let µ j+1 deote the map fuctio for the followig roud of a MRC algorithm [13] where j 1. Now let φ j(x) = µ j+1 ρ j(x). Here ρ j takes as iput some set of key; value pairs deoted by x ad outputs aother set of key; value pairs. We defie the operator to feed the output of ρ j(x) to µ j+1 oe key; value pair at a time. Thus φ j deotes the operatio of first executig the reducer fuctio, ρ j, o the set of values i x ad the executig the map fuctio, µ j+1, o each key; value pair output by ρ j(x) idividually. This sytactic chage allows the algorithm desiger to avoid defiig mappers ad reducers ad istead defie what each machie does durig each roud of computatio ad specify which machie each output key; value pair should go to. We ca ow traslate the restrictios o ρ j ad µ j from the MRC model of [13] to restrictios o φ j. Sice we are joiig the reduce ad the subsequet map phase, we combie the restrictios imposed o both of these computatios. There are three sets of restrictios: those o the umber of machies, the memory available o each machie ad the total umber of rouds take by the computatio. For a iput of size N, ad a sufficietly small ɛ > 0, there are N 1 ɛ machies, each with N 1 ɛ memory available for computatio. As a result, the total amout of memory available to the etire system is O(N ɛ ). See [13] for a discussio ad justificatio. A algorithm i MRC belogs to MRC i if it rus i worst case O(log i N) rouds. Thus, whe desigig a MRC 0 algorithm there are three properties that eed to be checked: Machie Memory: I each roud the total memory used by a sigle machie is at most O(N 1 ɛ ) bits. Total Memory: The total amout of data shuffled i ay roud is O(N ɛ ) bits. Rouds: The umber of rouds is a costat.. Total Work ad Work Efficiecy Next we defie the amout of work doe by a MRC algorithm by takig the stadard defiitio of work efficiecy from the PRAM settig ad adaptig it to the MapReduce settig. Let w(n) deote the amout of work doe by a r-roud, MRC algorithm o a iput of size N. This is simply the sum of the amout of work doe durig each roud of computatio. The amout of work doe durig roud i of a computatio is the product of the umber of machies used i that roud, deoted p i(n), ad the worst case ruig time of each machie, deoted t i(n). More specifically, w(n) = rx w i(n) = i=1 rx p i(n)t i(n). (1) If the amout of work doe by a MRC algorithm matches the I other words, the total amout of data shuffled i ay roud must be less tha the total amout of memory i the system. i=1 Algorithm: MST(V,E) 1: if E < η the : Compute T = MST (E) 3: retur T 4: ed if 5: l Θ( E /η) 6: Partitio E ito E 1, E,..., E l where E i < η usig a uiversal hash fuctio h : E {1,,..., l}. 7: I parallel: Compute T i, the miimum spaig tree o G(V, E i). 8: retur MST(V, it i) Figure 1: Miimum spaig tree algorithm ruig time of the best kow sequetial algorithm, we say the MRC algorithm is work efficiet..3 Notatio Let G = (V, E) be a udirected graph, ad deote by = V ad m = E. We will call G, c-dese, if m = 1+c where 0 < c 1. I what follows we assume that the machies have some limited memory η. We will assume that the umber of available machies is O(m/η). Notice that the umber of machies is just the umber required to fit the iput o all of the machies simultaeously. All of our algorithms will cosider the case where η = 1+ɛ for some ɛ > 0. For a costat ɛ, the algorithms we defie will take a costat umber of rouds ad lie i MRC 0 [13], beatig the Ω(log ) ruig time provided by the PRAM simulatio costructios (see Theorem 7.1 i [13]). However, eve whe η = O() our algorithms will ru i O(log ) rouds. This exposes the memory vs. rouds tradeoff sice most of the algorithms preseted take fewer rouds as the memory per machie icreases. We ow proceed to describe the idividual algorithms, i order of progressively more complex filterig techiques..4 Warm Up: Coected Compoets ad Miimum Spaig Trees We preset the formal algorithm for computig miimum spaig trees (the coected compoets algorithm is idetical). The algorithm works by partitioig the edges of the iput graph ito subsets of size η ad sedig each subgraph to its ow machie. The, each machie throws out ay edge that is guarateed ot to be a part of ay MST because it is the heaviest edge o some cycle i that machie s subgraph. If the resultig graph fits ito memory of a sigle machie, the algorithm termiates. Otherwise, the algorithm recurses o the smaller istace. We give the pseudocode i Figure 1. We assume the algorithm is give a c-dese graph; each machie has memory η = O( 1+ɛ ), ad that the umber of machies l = Θ( c ɛ ). Thus the algorithm oly uses eough memory, across the etire system, to store the iput. We show that every iteratio reduces the iput size by c/ɛ, ad thus after c /ɛ iteratios the algorithm termiates. LEMMA.1. Algorithm MST(V,E) termiates after c /ɛ iteratios ad returs the Miimum Spaig Tree. PROOF. To show correctess, ote that ay edge that is ot part of the MST o a subgraph of G is also ot part of the MST of G by the cycle property of miimum spaig trees. It remais to show that (1) the memory costraits of each machie are ever violated ad () the total umber of rouds is limited. Sice the partitio is doe radomly, a easy Cheroff argumet shows that o machie gets assiged more tha η edges

4 with high probability. Fially, ote that S i Ti l( 1) = O( 1+c ɛ ). Therefore after c /ɛ 1 iteratios the iput is small eough to fit oto a sigle machie, ad the overall algorithm termiates after c /ɛ rouds. LEMMA.. The MST(V,E) algorithm does O( cm ɛ total work. α(m, )) PROOF. Durig a specific iteratio, radomly partitioig E ito E 1, E,..., E l requires a liear sca over the edges which is O(m) work. Computig the miimum spaig tree M i of each part of the partitio usig the algorithm of [] takes O(l m α(m, )) work. Computig the MST of Gsparse o oe l machie usig the same algorithm requires l( 1)α(m, ) = O(mα(m, )) work. For costat ɛ the MRC algorithm uses O(mα(m, )) work. Sice the best kow sequetial algorithm [11] rus i time O(m) i expectatio, the MRC algorithm is work efficiet up to a factor of α(m, ). 3. MATCHINGS AND COVERS The maximum matchig problem ad its variats play a cetral role i theoretical computer sciece, so it is atural to determie if is possible to efficietly compute a maximum matchig, or, more simply, a maximal matchig, i the MapReduce framework. The questio is ot trivial. Ideed, due to the costraits of the model, it is ot possible to store (or eve stream through) all of the edges of a graph o a sigle machie. Furthermore, it is easy to come up with examples where the partitioig techique similar to that used for MSTs (Sectio.4) yields a arbitrarily bad matchig. Simply samplig the edges uiformly, or eve usig oe of the sparsificatio approaches [18] appears ufruitful because good sparsifiers do ot ecessarily preserve maximal matchigs. Despite these difficulties, we are able to show that by combiig a simple samplig techique ad a post-processig strategy it is possible to compute a uweighted maximal matchig ad thus a - approximatio to the uweighted maximum matchig problem usig oly machies with memory of size O() ad O(log ) rouds. More geerally, we show that we ca fid a maximal matchig o c-dese graphs i O( c /ɛ) rouds usig machies with Ω( 1+ɛ ) memory; oly three rouds are ecessary if ɛ = c /3. We exted this techique to obtai a 8-approximatio algorithm for maximum weighted matchig ad use similar approaches to approximate the vertex ad edge cover problems. This sectio is orgaized as follows: first we preset the algorithm to solve the uweighted maximal matchig, ad the we explai how to use this algorithm to solve the weighted maximum matchig problem. Fially, we show how the techiques ca be adapted to solve the miimum vertex ad the miimum edge cover problems. 3.1 Uweighted Maximal Matchigs The algorithm works by first samplig O(η) edges ad fidig a maximal matchig M 1 o the resultig subgraph. Give this matchig, we ca ow safely remove edges that are i coflict (i.e. those icidet o odes i M 1) from the origial graph G. If the resultig filtered graph, H is small eough to fit oto a sigle machie, the algorithm augmets M 1 with a matchig foud o H. Otherwise, we augmet M 1 with the matchig foud by recursig o H. Note that sice the size of the graph reduces from roud to roud, the effective samplig probability icreases, resultig i a larger sample of the remaiig graph. Formally, let G(V, E) be a simple graph where = V ad E 1+c for some c > 0. We begi by assumig that each of the machies has at least η memory. We fix the exact value of η later, but require that η 40. We give the pseudocode for the algorithm below: 1. Set M = ad S = E.. Sample every edge (u, v) S uiformly at radom with probability p = η. Let 10 S E be the set of sampled edges. 3. If E > η the algorithm fails. Otherwise give the graph G(V, E ) as iput to a sigle machie ad compute a maximal matchig M o it. Set M = M M. 4. Let I be the set of umatched vertices i G. Compute the subgraph of G iduced by I, G[I], ad let E[I] be the set of edges i G[I]. If E i > η, set S = E[I] ad retur to step. Otherwise cotiue to step Compute a maximal matchig M o G[I] ad output M = M M To proceed we eed the followig techical lemma, which shows that with high probability every iduced subgraph with sufficietly may edges, has at least oe edge i the sample. LEMMA 3.1. Let E E be a set of edges chose idepedetly with probability p. The with probability at least 1 e, for all I V either E[I] < /p or E[I] E. PROOF. Fix oe such subgraph, G[I] = (I, E[I]) with E[I] /p. The probability that oe of the edges i E[I] were chose to be i E is (1 p) E[I] (1 p) /p e. Sice there are at most total possible iduced subgraphs G[I], the probability that there exists oe that does ot have a edge i E is at most e e. Next we boud the umber of iteratios the algorithm takes. Note that, the term iteratio refers to the umber of times the algorithm is repeated. This does ot refer to a MapReduce roud. LEMMA 3.. If η 40 the the algorithm rus for at most O(log ) iteratios with high probability. Furthermore, if η = 1+ɛ, where 0 < ɛ < c is a fixed costat, the the algorithm rus i at most c/ɛ iteratios with high probability. PROOF. Fix a iteratio i of the algorithm ad let p be the samplig probability for this iteratio. Let E i be the set of edges at the begiig of this iteratio, ad deote by I be the set of umatched vertices after this iteratio. From Lemma 3.1, if E[I] /p the a edge of E[I] will be sampled with high probability. Note that o edge i E[I] is icidet o ay edge i M. Thus, if a edge from E[I] is sampled the our algorithm would have chose this edge to be i the matchig. This cotradicts the fact that o vertex i I is matched. Hece, E[I] /p 0 E i with η high probability. Now cosider the first iteratio of the algorithm, let G 1(V 1, E 1) be the iduced graph o the umatched odes after the first step of the algorithm. The above argumet implies that E 1 0 E 0 0 E η E. Similarly E 0 E 1 (0) E 0 E. So η η after i iteratios E i E i. The first part of the claim follows. To coclude the proof ote that if η = 1+ɛ, we have that E i E iɛ, ad thus the algorithm termiates after c/ɛ iteratios. We cotiue by showig the correctess of the algorithm. THEOREM 3.1. The algorithm fids a maximal matchig of G = (V, E) with high probability. PROOF. First cosider the case that the algorithm does ot fail. Assume, for the sake of cotradictio, that there exists a edge η

5 (u, v) E such that either u or v are matched i the fial matchig M that is output. Cosider the last iteratio of the algorithm. Sice (u, v) E ad u ad v are ot matched, (u, v) E[I]. Sice this is the last ru of the algorithm, a maximal matchig M of G[I] is computed o oe machie. Sice M is maximal, either u or v or both must be matched i it. All of the edges of M get added to M i the last step, which gives our cotradictio. Next, cosider the case that the algorithm failed. This occurs due to the set of edges E havig size larger tha η i some iteratio of the algorithm. Note that E[ E ] = S p = η /10 i a give iteratio. By the Cheroff Boud it follows that E η with probability smaller tha η 40 (sice η 40). By Lemma 3. the algorithm completes i at most O(log ) rouds, thus the total failure probability is bouded by O(log 40 ) usig the uio boud. Fially we show how to implemet this algorithm i MapReduce.!"#$%&'(%)#"*&+,' %#!" %!!" $#!" $!!" #!"!" '()(**+*",*-.)/01" 30)+(/4-""!&!!$"!&!!#"!&!$"!&!%"!&!#"!&$" $" -.%/0&'134.4)0)*5'(/,' COROLLARY 3.1. The Maximal Matchig algorithm ca be implemeted i three MapReduce rouds whe η = 1+c/3. Furthermore, whe η = 1+ɛ the the algorithm rus for 3 c/ɛ rouds ad O(log ) rouds whe η 40. PROOF. By Lemma 3. the algorithm rus for oe iteratio with high probability whe η = 1+c/3, c/ɛ iteratios whe η = 1+ɛ. Therefore it oly remais to describe how to compute the graph G[I]. For this we appeal to Lemma 6.1 i [13], where the set S i are the edges icidet o ode i, ad the fuctio f i drops the edge i if it is matched ad keeps it otherwise. Hece, each iteratio of the algorithm requires 3 MapReduce rouds. LEMMA 3.3. The maximal matchig algorithm preseted above is work efficiet whe η = 1+ɛ where 0 < ɛ < c is a fixed costat. PROOF. By Lemma 3. whe η = 1+ɛ there are at most a costat umber of iteratios of the algorithm. Thus it suffices to show that O(m) work is doe i a sigle iteratio. Samplig each edge with probability p requires a liear sca over the edges, which is O(m) work. Computig a maximal matchig o oe machie ca be doe usig a straightforward, greedy semi-streamig algorithm requirig E η m work. Computig G[I] ca be doe as follows. Load M oto m ɛ machies where 0 < ɛ < c ad partitio E amog those machies. The, if a edge i E is icidet o a edge i M the machies drop that edge, otherwise that edge is i G[I]. This results i O(m) work to load all of the data oto the machies ad O(m) work to compute G[I]. Sice G[I] has at most m edges, computig M o oe machie usig the best kow greedy semi-streamig algorithm also requires O(m) work. Sice the vertices i a maximal matchig provide a - approximatio to the vertex cover problem, we get the followig corollary. COROLLARY 3.. A -approximatio to the optimal vertex cover ca be computed i three MapReduce rouds whe η = 1+c/3. Further, whe η = 1+ɛ the the algorithm rus for 3 c/ɛ rouds ad O(log ) rouds whe η 40. This algorithm does O(m) total work whe η = 1+ɛ for a costat ɛ > Experimetal Validatio I this Sectio we experimetally validate the above algorithm ad demostrate that it leads to sigificat rutime improvemets Figure : The ruig time of the MapReduce matchig algorithm for differet values of p, as well as the baselie provided by the streamig implemetatio. i practice. Our data set cosists of a sample of a graph of the twitter follower etwork, previously used i [1]. The graph has 50,767,3 odes,,669,50,959 edges, ad takes about 44GB whe stored o disk. We implemeted the greedy streamig algorithm for maximum matchig as well as the three phase MapReduce algorithm described above. The streamig algorithm remais I/O bouded ad completes i 81 miutes. The total ruig times for the MapReduce algorithm with differet values for the samplig probability p are give i Figure. The MapReduce algorithm achieves a sigificat speedup (over 10x) over a large umber of values for p. The speed up is the result of the fact that a sigle machie ever scas the whole iput. Both the samplig i stage 1 ad the filterig i stage are performed i parallel. Note that the parallelizatio does ot come for free, ad the MapReduce system has o-trivial overhead over the straightforward streamig implemetatio. For example, whe p = 1, the MapReduce algorithm essetially implemets the streamig algorithm (sice all of the edges are mapped oto a sigle machie), however the ruig time is almost.5 times slower. Overall these results show that the algorithms proposed are ot oly iterestig from a theoretical viewpoit, but are viable ad useful i practice. 3. Maximum Weighted Matchig We preset a algorithm that computes a approximatio to the maximum weighted matchig problem usig a costat umber of MapReduce rouds. Our algorithm takes advatage of both the sequetial ad parallel power of MapReduce. Ideed, it will compute several matchigs i parallel ad the combie them o a sigle machie to compute the fial result. We assume that the maximum weight o a edge is polyomial i E ad we prove a 8- approximatio algorithm. Our aalysis is motivated by the work of Feigebaum et al. [4], but is techically differet sice o sigle machies sees all of the edges. The iput of the algorithm is a simple graph G(V, E) ad a weight fuctio w : E R. We assume that V = ad E = 1+c for a costat c > 0. Without loss of geerality, assume that mi{w(e) : e E} = 1 ad W = max{w(e) : e E}. The algorithm works as follows: 1. Split the graph G ito G 1, G,, G log W, where G i is

6 the graph o the set V of vertices ad cotais edges with weights i ( i 1, i ].. For 1 i log W ru the maximal matchig algorithm o G i. Let M i be the maximal matchig for the graph G i. 3. Set M =. Cosider the edge sets sequetially, i descedig order, M log W,..., M, M 1. Whe cosiderig a edge e M i, we add it to the matchig if ad oly if M {e} is a valid matchig. After all edges are cosidered, output M. LEMMA 3.4. The above algorithm outputs a 8-approximatio to the weighted maximum matchig problem. PROOF. Let OPT be the maximum weighted matchig i G ad deote by V (M i) the set of vertices icidet o the edges i M i. Cosider a edge (u, v) = e OPT, such that e G j for some j. Let i be the maximum i such that {u, v} V (M i ). Note that i must exist, ad i j else we could have added e to M j. Therefore, w(e) i. Now for every such edge (u, v) OPT we select oe vertex from {u, v} V (M i ). Without loss of geerality, let that v be the selected vertex. We say that v is a blockig vertex for e. For each blockig vertex v, we associate its icidet edge i M i ad call it the blockig edge for e. Let V b (i) be the set of blockig vertices i V (M i), we have that log W X i=1 i V b (i) X e OPT w(e). This follows from the fact that every vertex ca block at most oe e OPT ad that OPT is a valid matchig. Note also that from the defiitio of blockig vertex if (u, v) M M j the u, v / k<j V b (k). Now suppose that a edge (x, y) M k is discarded by step 3 of the algorithm. This ca happe if ad oly if there is a edge already preset i the matchig with a higher weight adjacet to x or y. Formally, there is a (u, v) M, (u, v) M j with j > k ad {u, v} {x, y}. Without loss of geerality assume that {u, v} {x, y} = x ad cosider such a edge (x, v). We say that (x, v) killed the edge (x, y) ad the vertex y. Notice that a edge (u, v) M ad (u, v) M j kills at most two edges for every M k with k < j ad kills at most two odes i V b (k). Fially we also defie (u, v) b as the set of blockig vertices associated with the blockig edge (u, v). Now cosider V b (k), each blockig vertex was either killed by oe of the edges i the matchig M, or is adjacet to oe of the edges i M k. Furthermore, the total weight of the edges i OPT with that were blocked by a blockig vertex killed by (u, v) is at most j 1 X i {V j 1 X b (i) killed by (u, v)} i+1 j+1 4w((u, v)). i=1 () To coclude, ote that each edge i OPT that is ot i M was either blocked directly by a edge i M, or was blocked by a vertex that was killed by a edge i M. To boud the former, cosider a edge (u, v) M j M. Note that this edge ca be icidet o at most edges i OPT, each of weight j w((u, v)), ad thus the weight i OPT icidet o a edge (u, v) is 4w((u, v)). Puttig this together with Equatio we coclude: X 8 w((u, v)) X w(e). (u,v) M i=1 e OPT Furthermore we ca show that the aalysis of our algorithm is essetially tight. Ideed there exists a family of graphs where our algorithm fids a solutio with weight w(op T ) with high probability. We prove the followig lemma i Appedix 8 o(1) A. LEMMA 3.5. There is a graph where our algorithm computes a solutio that has value w(op T ) with high probability. 8 o(1) Fially, suppose that the weight fuctio w : E R is such that e E, w(e) O(poly( E )) ad that each machie has memory at least η max{ log, V log W }. The we ca ru the above algorithm i MapReduce usig oly oe more roud tha the maximal matchig algorithm. I the first roud we split G ito G 1,..., G log W ; the we ru the maximal matchig algorithm of the previous subsectio i parallel o log W machies. I the last roud, we ru the last step o a sigle machie. The last step is always possible because we have at most V log W edges each with weights of size log W. THEOREM 3.. There is a algorithm that fids a 8- approximatio to the maximum weighted matchig problem o a c dese graph usig machies with memory η = 1+ɛ i 3 c/ɛ + 1 rouds with high probability. COROLLARY 3.3. There is a algorithm that, with high probability, fids a 8-approximatio to the maximum weighted matchig problem that rus i four MapReduce rouds whe η = 1+/3c. To coclude the aalysis of the algorithm we ow study the work amout of the maximum matchig algorithm. LEMMA 3.6. The amout of work performed by the maximum matchig algorithm preseted above is O(m) whe η = 1+ɛ where 0 < ɛ < c is a fixed costat. PROOF. The first step of the algorithm requires O(m) work as it ca be doe usig a liear sca over the edges. I the secod step, by Lemma 3.3 each machie performs work that is liear i the umber of edges that are assiged to the machie. Sice the edges are partitioed across the machies, the total work doe i the secod step is O(m). Fially we ca perform the third step by a semi-streamig algorithm that greedily adds edges i the order M log W,..., M, M 1, requirig O(m) work. 3.3 Miimum Edge Cover Next we tur to the miimum edge cover problem. A edge cover of a graph G(V, E) is a set of edges E E such that each vertex of V has at least oe edpoit i E. The miimum edge cover is a edge cover E of miimum size. Let G(V, E) be a simple graph. The algorithm to compute a edge cover is as follows: 1. Fid a maximal matchig M of G usig the procedure described i Sectio Let I be the set of ucovered vertices. For each ucovered vertex, take ay edge icidet o the vertex i I. Let this set of edges be U. 3. Output E = M U. Note that this procedure produces a feasible edge cover E. To boud the size of E let OPT deote the size of the miimum edge cover for the graph G ad let OPT m deote the size of the maximum matchig i G. It is kow that the miimum edge cover of a graph is equal to V OPT m. We also kow that U = V M. Therefore, E = V M V 1 OPTm

7 sice a maximal matchig has size at least 1 OPTm. Kowig that OPT m V / ad usig Corollary 3.1 to boud the umber of rouds we have the followig theorem. THEOREM 3.3. There is a algorithm that, with high probability, fids a 3 -approximatio to the miimum edge cover i MapReduce. If each machie has memory η 40 the the algorithm rus i O(log ) rouds. Further, if η = 1+ɛ, where 0 < ɛ < c is a fixed costat, the the algorithm rus i 3 c/ɛ + 1 rouds. COROLLARY 3.4. There is a algorithm that, with high probability, fids a 3 -approximatio to the miimum edge cover i four MapReduce rouds whe η = 1+/3c. Now we prove that the amout of work performed by the edge cover algorithm is O(m). LEMMA 3.7. The amout of work performed by the edge cover algorithm preseted above is O(m) whe η = 1+ɛ where 0 < ɛ < c is a fixed costat. PROOF. By Lemma 3.3 whe η = 1+ɛ the first step of the algorithm ca be doe performig O(m) operatios. The secod step ca be performed by a semi streamig algorithm that requires O(m) work. Thus the claim follows. 4. MINIMUM CUT Whereas i the previous algorithms the filterig was doe by droppig certai edges, this algorithm filters by cotractig edges. Cotractig a edge, will obviously reduce the umber of edges ad may either keep the umber of vertices the same (i the case we cotracted a self loop), or reduce it by oe. To compute the miimum cut of a graph we appeal to the cotractio algorithm itroduced by Karger [10]. The algorithm has a well kow property that the radom choices made i the early rouds succeed with high probability, whereas those made i the later rouds have a much lower probability of success. We exploit this property by showig how to filter the iput i the first phase (by cotractig edges) so that the remaiig graph is guarateed to be small eough to fit oto a sigle machie, yet large eough to esure that the failure probability remais bouded. Oce the filterig phase is complete, ad the problem istace is small eough to fit oto a sigle machie, we ca employ ay oe of the well kow methods to fid the miimum cut i the filtered graph. We the decrease the failure probability by ruig several executios of the algorithm i parallel, thus esurig that i oe of the copies the miimum cut survives this filterig phase. The complicatig factor i the scheme above is cotractig the right umber of edges so that the properties above hold. We proceed by labelig each edge with a radom umber betwee 0 ad 1 ad the searchig for a threshold t so that cotractig all of the edges with label less tha t results i the desired umber of vertices. Typically such a search would take logarithmic time, however, by doig the search i parallel across a large umber of machies, we ca reduce the depth of the recursio tree to be costat. Moreover, to compute the umber of vertices remaiig after the first t edges are cotracted, we refer to the coected compoets algorithm i Sectio.4. Sice the coected compoets algorithm uses a small umber of machies, we ca show that eve with may parallel ivocatios we will ot violate the machie budget. We preset the algorithm ad its aalysis below. Also, the algorithm uses two subrouties, Fid t ad Cotract which are defied i tur. Algorithm 1 MiCut(E) 1: for i = 1 to δ 1 (i parallel) do : tag e E with a umber r e chose uiformly at radom from [0, 1] 3: t Fid t(e, 0, 1) 4: E i Cotract(E, t) 5: C i mi cut of E i 6: ed for 7: retur miimum cut over all C i 4.1 Fid Algorithm The pseudocode for the algorithm to fid the correct threshold is give below. The algorithm performs a parallel search o the value t so that cotractig all edges with weight at most t results i a graph with δ 3 vertices. The algorithm ivokes δ copies of the coected compoets algorithm, each of which uses at most c ɛ machies, with 1+ɛ memory. Algorithm Fid t(e, mi, max) 1: {Uses δ+c/ɛ machies.} : γ max mi δ 3: for j = 1 to δ (i parallel) do 4: τ j mi +jγ 5: E j {e E r e τ j} 6: cc j umber of coected compoets i G = (V, E j) 7: ed for 8: if there exists a j such that cc j = δ 3 the 9: retur j 10: else 11: retur Fid t(e, τ j, τ j+1) where j is the smallest value s.t. cc j < δ 3, cc j+1 > δ 3 1: ed if 4. Cotractio Algorithm We state the cotractio algorithm ad prove bouds o its performace. Algorithm 3 Cotract(E, t) 1: CC coected compoets i {e E r e t} : let h : [] [ δ 4 ] be a uiversal hash fuctio 3: map each edge (u, v) to machie h(u) ad h(v) 4: map the assigmet of ode u to its coected compoet CC(u), to machie h(u) 5: o each reducer reame all istaces of u to CC(u) 6: map each edge (u, v) to machie h(u) + h(v) 7: Drop self loops (edges i same coected compoet) 8: Aggregate parallel edges LEMMA 4.1. The Cotract algorithm uses δ 4 machies with O( m ) space with high probability. δ 4 PROOF. Partitio V ito parts P j = {v V j 1 < deg(v) j }. Sice the degree of each ode is bouded by, there are at most log parts i the partitio. Defie the volume of part j as V j = P j j. Parts havig volume less tha m 1 ɛ could all be mapped to oe reducer without violatig its space restrictio. We ow focus o parts with V j > m 1 ɛ, ad so let P j be such a part. Thus P j cotais betwee m1 ɛ ad m1 ɛ vertices. Let ρ j j

8 be a arbitrary reducer. Sice h is uiversal, the probability that ay vertex v P j maps to ρ is exactly δ 4. Therefore, i expectatio, the umber of vertices of P j mappig to ρ is at most m1 ɛ. j δ 4 Sice each of these vertices has degree at most j, i expectatio the umber of edges that map to ρ is at most m1 ɛ. Let the radom variable X j deote the umber of vertices from P j that map δ 4 to ρ. Say that a bad evet happes if more tha 4m1 ɛ vertices of V j j map to ρ. Cheroff bouds tell us that the probability of such a evet happeig is O(1/ δ 4 ), Pr»X j > 10m1 ɛ δ 4 < «10m 1 ɛ δ 4 < 1. (3) δ 4 Takig a uio boud over all δ 4 reducers ad log parts, we ca coclude that the probability of ay reducer beig overloaded is bouded below by 1 o(1). 4.3 Aalysis of the MiCut Algorithm We proceed to boud the total umber of machies, maximum amout of memory, ad the total umber of rouds used by the MiCut algorithm. LEMMA 4.. The total umber of machies used by the MiCut algorithm is δ 1 ` δ +c ɛ + δ 4. PROOF. The algorithm begis by ruig δ 1 parallel copies of a simpler algorithm which first ivokes Fid t, to fid a threshold t for each istace. This algorithm uses δ parallel copies of a coected compoet algorithm, which itself uses c ɛ machies (see Sectio.4). After fidig the threshold, we ivoke the Cotract algorithm, which uses δ 4 machies per istace. Together this gives the desired umber of machies. LEMMA 4.3. The memory used by each machie durig the executio of MiCut is bouded by max{ δ 3, 1+c δ 4, 1+ɛ }. PROOF. There are three distict steps where we must boud the memory. The first is the the searchig phase of Fid t. Sice this algorithm executes istaces of the coected compoets algorithm i parallel, the results of Sectio.4 esure that each istace uses at most η = 1+ɛ memory. The secod is the cotractio algorithm. Lemma 4.1 assures us that the iput to each machie is of size at most O( m δ 4 ). Fially, the last step of MiCut requires that we load a istace with δ 3 vertices, ad hece at most δ 3 edges oto a sigle machie. LEMMA 4.4. Suppose the amout of memory available per machie is η = 1+ɛ. MiCut rus i O( 1 ɛδ ) umber of rouds. PROOF. The oly variable part of the ruig time is the umber of rouds ecessary to foud a threshold τ so that the umber of coected compoets i Fid t is exactly δ 3. Observe that after the k th recursive call, the umber of edges with threshold betwee m mi ad max is. Therefore the algorithm must termiate δ k after at most 1+c δ rouds, which is costat for costat δ. We are ow ready to prove the mai theorem. THEOREM 4.1. Algorithm MiCut returs the miimum cut i G with probability at least 1 o(1), uses at most η = 1+ɛ memory per machie ad completes i O( 1 ɛ ) rouds. PROOF. We first show that the success probability is at least «1 1 δ 3 δ 1. The algorithm ivokes δ 1 parallel copies of the followig approach: (1) simulate Karger s cotractio algorithm [10] for the first δ 3 steps resultig i a graph G t ad () Idetify the miimum cut o G t. By Corollary. of [1] step (1) succeeds with probability at least p = Ω( δ3 ). Sice the secod step ca be made to fail with 0 probability, each of the parallel copies succeeds with probability at least p. By ruig δ 1 idepedet copies of the algorithm, the probability that all of the copies fail i step (1) is at most 1 (1 p) δ1. To prove the theorem, we must fid a settig of the parameters δ 1, δ, δ 3, ad δ 4 so that the memory, machies, ad correctess costraits are satisfied. δ 1 max{ δ 3, 1+c δ 4 } η = 1+ɛ Memory δ+c ɛ + δ 4 = o(m) = o( 1+c ) Machies «1 δ 3 δ 1 = o(1) Correctess Settig δ 1 = 1 ɛ /, δ = ɛ, δ 3 = 1+ɛ, δ4 = c satisfies all of them. Ackowledgmets We would like to thak Ashish Goel, Jake Hofma, Joh Lagford, Ravi Kumar, Serge Plotki ad Cog Yu for may helpful discussios. 5. REFERENCES [1] E. Bakshy, J. Hofma, W. Maso, ad D. J. Watts. Everyoe s a ifluecer: Quatifyig ifluece o twitter. I Proceedigs of WSDM, 011. [] Berard Chazelle. A miimum spaig tree algorithm with iverse-ackerma type complexity. Joural of the ACM, 47(6): , November 000. [3] Jeffrey Dea ad Sajay Ghemawat. MapReduce: Simplified data processig o large clusters. I Proceedigs of OSDI, pages , 004. [4] Joa Feigebaum, Sampath Kaa, Adrew McGregor, Siddharth Suri, ad Jia Zhag. O graph problems i a semi-streamig model. Theoretical Computer Sciece, 348( 3):07 16, December 005. [5] Michael T. Goodrich. Simulatig parallel algorithms i the mapreduce framework with applicatios to parallel computatioal geometry. Secod Workshop o Massive Data Algorithmics (MASSIVE 010), Jue 010. [6] Hadoop Wiki - Powered By. [7] Blake Irvig. Big data ad the power of hadoop. Yahoo! Hadoop Summit, Jue 010. [8] Amos Israel ad A. Itai. A fast ad simple radomized parallel algorithm for maximal matchig. Iformatio Processig Letters, ():77 80, [9] U Kag, Charalampos Tsourakakis, Aa Paula Appel, Christos Faloutsos, ad Jure Leskovec. HADI: Fast diameter estimatio ad miig i massive graphs with hadoop. Techical Report CMU-ML , CMU, December 008. [10] David R. Karger. Global mi-cuts i RN C ad other ramificatios of a simple micut algorithm. I Proceedigs of SODA, pages 1 30, Jauary 1993.

9 [11] David R. Karger, Philip N. Klei, ad Robert E. Tarja. A radomized liear-time algorithm for fidig miimum spaig trees. I Proceedigs of the twety-sixth aual ACM symposium o Theory of computig, Proceedigs of STOC, pages 9 15, New York, NY, USA, ACM. [1] David R. Karger ad Clifford Stei. A Õ( ) algorithm for miimum cuts. I Proceedigs of STOC, pages , May [13] Howard Karloff, Siddharth Suri, ad Sergei Vassilvitskii. A model of computatio for MapReduce. I Proceedigs of SODA, pages , 010. [14] Jure Leskovec, Jo Kleiberg, ad Christos Faloutsos. Graphs over time: Desificatio laws, shrikig daimeters ad possible explaatios. I Proc. 11th ACM SIGKDD Iteratioal Coferece o Kowledge Discovery ad Data Miig, 005. [15] Jimmy Li ad Chris Dyer. Data-Itesive Text Processig with MapReduce. Number 7 i Sythesis Lectures o Huma Laguage Techologies. Morga ad Claypool, April 010. [16] Grzegorz Malewicz, Matthew H. Auster, Aart J.C. Bik, James C. Dehert, Ila Hor, Naty Leiser, ad Grzegorz Czajkowski. Pregel: A system for large-scale graph processig. I Proceedigs of SIGMOD, pages , Idiaapolis, IN, USA, Jue 010. ACM. [17] Mike Schroepfer. Iside large-scale aalytics at facebook. Yahoo! Hadoop Summit, Jue 010. [18] Daiel A. Spielma ad Nikhil Srivastava. Graph sparsificatio by effective resistaces. I Proceedigs of STOC, pages , New York, NY, USA, 008. ACM. [19] Mirjam Wattehofer ad Roger Wattehofer. Distributed weighted matchig. I Proceedigs of DISC, pages Spriger, 003. [0] Tom White. Hadoop: The Defiitive Guide. O Reilly Media, 009. [1] Yahoo! Ic Press Release. Yahoo! parters with four top uiversities to advace cloud computig systems ad applicatios research. April 009. APPENDIX A. WEIGHTED MATCHING LOWER BOUND LEMMA A.1. There is a graph where our algorithm compute a solutio that has value w(op T ) with high probability. 8 o(1) PROOF. Let G(V, E) a graph o odes ad m vertices, ad fix a W = log m. We say that a bipartite graph G(V, E 1, E ) is balaced if E 1 = E. Cosider the followig graph: there is a cetral balaced bipartite clique, B 1, o odes ad all the edges of the clique have weight W + 1. log W Every side of the cetral bipartite clique is also part of other log W 1 balaced bipartite cliques. We refer to those cliques has B E 1, BE 1 3,, BE 1 W, BE, BE 3,, BE W. I both BE 1 i ad B E i we have that the weight of the edges i them have weight W + 1. Furthermore every ode i B i 1 is also coected with a additioal ode of degree oe with a edge of weight W, ad every ode i B E 1 i \B 1 ad B E i \B 1 is coected to a ode of degree W oe with a edge of weight. Figure 3 shows the subgraph i 1 composed by B 1 ad the two graphs B E 1 i ad B E i. Figure 3: The subgraph(we have draw oly the edges of W weight W, + 1, W ad W + 1) of the graph of G for i+1 i which our algorithm fids a solutio with value w(op T ) with 8 o(1) high probability. Note that the optimal weighted maximum matchig for this graph is the oe that is composed by all the edges icidet to a ode of degree oe ad its total value is W + W log W + W + + = W P W 1 log W log W log W log W i=0 = i W +1 1 W = (4 o(1))w. log W 1 1 log W Now we will show a upper-boud o the performace of our algorithm that holds with high probability. Recall that i step oe our algorithm splits the graph i G 1,, G W subgraph where the edges i G i are i i ( i 1, i ] the it computes a maximal matchig o G i usig the techique show i the previous subsectio. I particular the algorithm works as follows: it samples the edges i 1 G i with probability E i ad the computes a maximal matchig ɛ o it, fially it tries to match the umatched odes usig the edges betwee the umatched odes i G i. To uderstad the value of the matchig retured by the algorithm, cosider the graph G i ote that this graph is composed oly by the edges i B E 1 i, B E i ad the edges coected to odes of W degree oe with weight. We refer to those last edges as the heavy edges i G i 1 i. Note that the heavy edges are all coected to vertices i B E 1 i \B 1 ad B E i \B 1. Let s be the umber of vertices i a side of B E 1 i, ote that G i, for i > 1 has 6s odes ad s + s edges. 1 Recall that we sample a edges with probability C V i = ɛ 1, so we have that i expectatio we sample Θ s 1 = C(6s) ɛ (s) ɛ Θ `(s) 1 ɛ heavy edges, thus usig the Cheroff boud we have that the probability that the umber of sampled heavy edges is bigger or equal tha Θ 3(6s) (1 ɛ) is e 3(6s)(1 ɛ). Further otice that by lemma 3.1 for every set of ode i B E 1 i or B E i with (6s) 1+ɛ edges we have at least a edge i it with probability e 6s( C 1 (6s)ɛ log(6s)). Thus the maximum umber of odes left umatched i B E 1 i or after step of the maximal matchig algorithm is smaller B E i the (6s) 1+ɛ + Θ 3(6s) (1 ɛ) so eve if we matched those odes with the heavy edges i G i we use at most (6s) 1+ɛ +

10 Θ 3(6s) (1 ɛ) of those edges. Thus we have that for every G i, for every i > 1 the maximal matchig algorithm uses at most Θ 6(6s) (1 ɛ) + (6s) 1+ɛ s = o heavy edges with probability 1 e 6s( C 1 (6s)ɛ log(6s)). log s With the same reasoig, just with differet costat, we otice that the same fact holds also for G 1. So we have that for every G i we use oly o Vi heavy edges with probability log V i 1 e Θ( V i ), further otice that every maximal matchig that the algorithm computes it always matches the odes i B 1, because it is alway possible to use the edges that coect those odes to the odes of degree 1. Kowig that we otice that the fial matchig that our algorithm outputs is composed by the maximal matchig of G 1 plus all the heavy edges i maximal matchig of G,, G W. Thus we have that with probability Q i 1 e Θ( V i ) = 1 o(1) 3 the total weight of the computed solutio is upper-bouded by ` W W log W o s = W + log W log s log W o(w ) ad so the ratio betwee the optimum ad the solutio is (4 o(1))w log W = 1 W log W +o(w ) 8 o(1). Thus the claim follows. 3 Note that every V i Θ log W

Lecture 2: Karger s Min Cut Algorithm

Lecture 2: Karger s Min Cut Algorithm priceto uiv. F 3 cos 5: Advaced Algorithm Desig Lecture : Karger s Mi Cut Algorithm Lecturer: Sajeev Arora Scribe:Sajeev Today s topic is simple but gorgeous: Karger s mi cut algorithm ad its extesio.

More information

5 Boolean Decision Trees (February 11)

5 Boolean Decision Trees (February 11) 5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected

More information

I. Chi-squared Distributions

I. Chi-squared Distributions 1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

More information

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length Joural o Satisfiability, Boolea Modelig ad Computatio 1 2005) 49-60 A Faster Clause-Shorteig Algorithm for SAT with No Restrictio o Clause Legth Evgey Datsi Alexader Wolpert Departmet of Computer Sciece

More information

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments Project Deliverables CS 361, Lecture 28 Jared Saia Uiversity of New Mexico Each Group should tur i oe group project cosistig of: About 6-12 pages of text (ca be loger with appedix) 6-12 figures (please

More information

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria

More information

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

Asymptotic Growth of Functions

Asymptotic Growth of Functions CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

Domain 1: Designing a SQL Server Instance and a Database Solution

Domain 1: Designing a SQL Server Instance and a Database Solution Maual SQL Server 2008 Desig, Optimize ad Maitai (70-450) 1-800-418-6789 Domai 1: Desigig a SQL Server Istace ad a Database Solutio Desigig for CPU, Memory ad Storage Capacity Requiremets Whe desigig a

More information

CHAPTER 3 THE TIME VALUE OF MONEY

CHAPTER 3 THE TIME VALUE OF MONEY CHAPTER 3 THE TIME VALUE OF MONEY OVERVIEW A dollar i the had today is worth more tha a dollar to be received i the future because, if you had it ow, you could ivest that dollar ad ear iterest. Of all

More information

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may

More information

Department of Computer Science, University of Otago

Department of Computer Science, University of Otago Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly

More information

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee

More information

1 Computing the Standard Deviation of Sample Means

1 Computing the Standard Deviation of Sample Means Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

Designing Incentives for Online Question and Answer Forums

Designing Incentives for Online Question and Answer Forums Desigig Icetives for Olie Questio ad Aswer Forums Shaili Jai School of Egieerig ad Applied Scieces Harvard Uiversity Cambridge, MA 0238 USA shailij@eecs.harvard.edu Yilig Che School of Egieerig ad Applied

More information

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth Questio 1: What is a ordiary auity? Let s look at a ordiary auity that is certai ad simple. By this, we mea a auity over a fixed term whose paymet period matches the iterest coversio period. Additioally,

More information

The Stable Marriage Problem

The Stable Marriage Problem The Stable Marriage Problem William Hut Lae Departmet of Computer Sciece ad Electrical Egieerig, West Virgiia Uiversity, Morgatow, WV William.Hut@mail.wvu.edu 1 Itroductio Imagie you are a matchmaker,

More information

A probabilistic proof of a binomial identity

A probabilistic proof of a binomial identity A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two

More information

Capacity of Wireless Networks with Heterogeneous Traffic

Capacity of Wireless Networks with Heterogeneous Traffic Capacity of Wireless Networks with Heterogeeous Traffic Migyue Ji, Zheg Wag, Hamid R. Sadjadpour, J.J. Garcia-Lua-Aceves Departmet of Electrical Egieerig ad Computer Egieerig Uiversity of Califoria, Sata

More information

(VCP-310) 1-800-418-6789

(VCP-310) 1-800-418-6789 Maual VMware Lesso 1: Uderstadig the VMware Product Lie I this lesso, you will first lear what virtualizatio is. Next, you ll explore the products offered by VMware that provide virtualizatio services.

More information

Incremental calculation of weighted mean and variance

Incremental calculation of weighted mean and variance Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically

More information

Chapter 7 Methods of Finding Estimators

Chapter 7 Methods of Finding Estimators Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of

More information

The Power of Free Branching in a General Model of Backtracking and Dynamic Programming Algorithms

The Power of Free Branching in a General Model of Backtracking and Dynamic Programming Algorithms The Power of Free Brachig i a Geeral Model of Backtrackig ad Dyamic Programmig Algorithms SASHKA DAVIS IDA/Ceter for Computig Scieces Bowie, MD sashka.davis@gmail.com RUSSELL IMPAGLIAZZO Dept. of Computer

More information

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return EVALUATING ALTERNATIVE CAPITAL INVESTMENT PROGRAMS By Ke D. Duft, Extesio Ecoomist I the March 98 issue of this publicatio we reviewed the procedure by which a capital ivestmet project was assessed. The

More information

Perfect Packing Theorems and the Average-Case Behavior of Optimal and Online Bin Packing

Perfect Packing Theorems and the Average-Case Behavior of Optimal and Online Bin Packing SIAM REVIEW Vol. 44, No. 1, pp. 95 108 c 2002 Society for Idustrial ad Applied Mathematics Perfect Packig Theorems ad the Average-Case Behavior of Optimal ad Olie Bi Packig E. G. Coffma, Jr. C. Courcoubetis

More information

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009) 18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the Bru-Mikowski iequality for boxes. Today we ll go over the

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis Ruig Time ( 3.) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

MARTINGALES AND A BASIC APPLICATION

MARTINGALES AND A BASIC APPLICATION MARTINGALES AND A BASIC APPLICATION TURNER SMITH Abstract. This paper will develop the measure-theoretic approach to probability i order to preset the defiitio of martigales. From there we will apply this

More information

INVESTMENT PERFORMANCE COUNCIL (IPC)

INVESTMENT PERFORMANCE COUNCIL (IPC) INVESTMENT PEFOMANCE COUNCIL (IPC) INVITATION TO COMMENT: Global Ivestmet Performace Stadards (GIPS ) Guidace Statemet o Calculatio Methodology The Associatio for Ivestmet Maagemet ad esearch (AIM) seeks

More information

LECTURE 13: Cross-validation

LECTURE 13: Cross-validation LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M

More information

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value

More information

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k. 18.409 A Algorithmist s Toolkit September 17, 009 Lecture 3 Lecturer: Joatha Keler Scribe: Adre Wibisoo 1 Outlie Today s lecture covers three mai parts: Courat-Fischer formula ad Rayleigh quotiets The

More information

THE ABRACADABRA PROBLEM

THE ABRACADABRA PROBLEM THE ABRACADABRA PROBLEM FRANCESCO CARAVENNA Abstract. We preset a detailed solutio of Exercise E0.6 i [Wil9]: i a radom sequece of letters, draw idepedetly ad uiformly from the Eglish alphabet, the expected

More information

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem Lecture 4: Cauchy sequeces, Bolzao-Weierstrass, ad the Squeeze theorem The purpose of this lecture is more modest tha the previous oes. It is to state certai coditios uder which we are guarateed that limits

More information

Properties of MLE: consistency, asymptotic normality. Fisher information.

Properties of MLE: consistency, asymptotic normality. Fisher information. Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here). BEGINNING ALGEBRA Roots ad Radicals (revised summer, 00 Olso) Packet to Supplemet the Curret Textbook - Part Review of Square Roots & Irratioals (This portio ca be ay time before Part ad should mostly

More information

CHAPTER 3 DIGITAL CODING OF SIGNALS

CHAPTER 3 DIGITAL CODING OF SIGNALS CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity

More information

Hypothesis testing. Null and alternative hypotheses

Hypothesis testing. Null and alternative hypotheses Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

More information

Effective Techniques for Message Reduction and Load Balancing in Distributed Graph Computation

Effective Techniques for Message Reduction and Load Balancing in Distributed Graph Computation Effective Techiques for Message Reductio ad Load Balacig i Distributed Graph Computatio ABSTRACT Da Ya, James Cheg, Yi Lu Dept. of Computer Sciece ad Egieerig The Chiese Uiversity of Hog Kog {yada, jcheg,

More information

NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff,

NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff, NEW HIGH PERFORMNCE COMPUTTIONL METHODS FOR MORTGGES ND NNUITIES Yuri Shestopaloff, Geerally, mortgage ad auity equatios do ot have aalytical solutios for ukow iterest rate, which has to be foud usig umerical

More information

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics Chair for Network Architectures ad Services Istitute of Iformatics TU Müche Prof. Carle Network Security Chapter 2 Basics 2.4 Radom Number Geeratio for Cryptographic Protocols Motivatio It is crucial to

More information

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means) CHAPTER 7: Cetral Limit Theorem: CLT for Averages (Meas) X = the umber obtaied whe rollig oe six sided die oce. If we roll a six sided die oce, the mea of the probability distributio is X P(X = x) Simulatio:

More information

Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows:

Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows: Subettig Subettig is used to subdivide a sigle class of etwork i to multiple smaller etworks. Example: Your orgaizatio has a Class B IP address of 166.144.0.0 Before you implemet subettig, the Network

More information

Present Value Factor To bring one dollar in the future back to present, one uses the Present Value Factor (PVF): Concept 9: Present Value

Present Value Factor To bring one dollar in the future back to present, one uses the Present Value Factor (PVF): Concept 9: Present Value Cocept 9: Preset Value Is the value of a dollar received today the same as received a year from today? A dollar today is worth more tha a dollar tomorrow because of iflatio, opportuity cost, ad risk Brigig

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. This documet was writte ad copyrighted by Paul Dawkis. Use of this documet ad its olie versio is govered by the Terms ad Coditios of Use located at http://tutorial.math.lamar.edu/terms.asp. The olie versio

More information

5.4 Amortization. Question 1: How do you find the present value of an annuity? Question 2: How is a loan amortized?

5.4 Amortization. Question 1: How do you find the present value of an annuity? Question 2: How is a loan amortized? 5.4 Amortizatio Questio 1: How do you fid the preset value of a auity? Questio 2: How is a loa amortized? Questio 3: How do you make a amortizatio table? Oe of the most commo fiacial istrumets a perso

More information

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8 CME 30: NUMERICAL LINEAR ALGEBRA FALL 005/06 LECTURE 8 GENE H GOLUB 1 Positive Defiite Matrices A matrix A is positive defiite if x Ax > 0 for all ozero x A positive defiite matrix has real ad positive

More information

Lesson 15 ANOVA (analysis of variance)

Lesson 15 ANOVA (analysis of variance) Outlie Variability -betwee group variability -withi group variability -total variability -F-ratio Computatio -sums of squares (betwee/withi/total -degrees of freedom (betwee/withi/total -mea square (betwee/withi

More information

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S CONTROL CHART FOR THE CHANGES IN A PROCESS Supraee Lisawadi Departmet of Mathematics ad Statistics, Faculty of Sciece ad Techoology, Thammasat

More information

A Secure Implementation of Java Inner Classes

A Secure Implementation of Java Inner Classes A Secure Implemetatio of Java Ier Classes By Aasua Bhowmik ad William Pugh Departmet of Computer Sciece Uiversity of Marylad More ifo at: http://www.cs.umd.edu/~pugh/java Motivatio ad Overview Preset implemetatio

More information

Simple Annuities Present Value.

Simple Annuities Present Value. Simple Auities Preset Value. OBJECTIVES (i) To uderstad the uderlyig priciple of a preset value auity. (ii) To use a CASIO CFX-9850GB PLUS to efficietly compute values associated with preset value auities.

More information

Infinite Sequences and Series

Infinite Sequences and Series CHAPTER 4 Ifiite Sequeces ad Series 4.1. Sequeces A sequece is a ifiite ordered list of umbers, for example the sequece of odd positive itegers: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29...

More information

Class Meeting # 16: The Fourier Transform on R n

Class Meeting # 16: The Fourier Transform on R n MATH 18.152 COUSE NOTES - CLASS MEETING # 16 18.152 Itroductio to PDEs, Fall 2011 Professor: Jared Speck Class Meetig # 16: The Fourier Trasform o 1. Itroductio to the Fourier Trasform Earlier i the course,

More information

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature. Itegrated Productio ad Ivetory Cotrol System MRP ad MRP II Framework of Maufacturig System Ivetory cotrol, productio schedulig, capacity plaig ad fiacial ad busiess decisios i a productio system are iterrelated.

More information

Elementary Theory of Russian Roulette

Elementary Theory of Russian Roulette Elemetary Theory of Russia Roulette -iterestig patters of fractios- Satoshi Hashiba Daisuke Miematsu Ryohei Miyadera Itroductio. Today we are goig to study mathematical theory of Russia roulette. If some

More information

How To Understand The Theory Of Coectedess

How To Understand The Theory Of Coectedess 35 Chapter 1: Fudametal Cocepts Sectio 1.3: Vertex Degrees ad Coutig 36 its eighbor o P. Note that P has at least three vertices. If G x v is coected, let y = v. Otherwise, a compoet cut off from P x v

More information

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design A Combied Cotiuous/Biary Geetic Algorithm for Microstrip Atea Desig Rady L. Haupt The Pesylvaia State Uiversity Applied Research Laboratory P. O. Box 30 State College, PA 16804-0030 haupt@ieee.org Abstract:

More information

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical

More information

Irreducible polynomials with consecutive zero coefficients

Irreducible polynomials with consecutive zero coefficients Irreducible polyomials with cosecutive zero coefficiets Theodoulos Garefalakis Departmet of Mathematics, Uiversity of Crete, 71409 Heraklio, Greece Abstract Let q be a prime power. We cosider the problem

More information

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5 Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.

More information

Convention Paper 6764

Convention Paper 6764 Audio Egieerig Society Covetio Paper 6764 Preseted at the 10th Covetio 006 May 0 3 Paris, Frace This covetio paper has bee reproduced from the author's advace mauscript, without editig, correctios, or

More information

Quantitative Computer Architecture

Quantitative Computer Architecture Performace Measuremet ad Aalysis i Computer Quatitative Computer Measuremet Model Iovatio Proposed How to measure, aalyze, ad specify computer system performace or My computer is faster tha your computer!

More information

On the Capacity of Hybrid Wireless Networks

On the Capacity of Hybrid Wireless Networks O the Capacity of Hybrid ireless Networks Beyua Liu,ZheLiu +,DoTowsley Departmet of Computer Sciece Uiversity of Massachusetts Amherst, MA 0002 + IBM T.J. atso Research Ceter P.O. Box 704 Yorktow Heights,

More information

4. Trees. 4.1 Basics. Definition: A graph having no cycles is said to be acyclic. A forest is an acyclic graph.

4. Trees. 4.1 Basics. Definition: A graph having no cycles is said to be acyclic. A forest is an acyclic graph. 4. Trees Oe of the importat classes of graphs is the trees. The importace of trees is evidet from their applicatios i various areas, especially theoretical computer sciece ad molecular evolutio. 4.1 Basics

More information

Notes on exponential generating functions and structures.

Notes on exponential generating functions and structures. Notes o expoetial geeratig fuctios ad structures. 1. The cocept of a structure. Cosider the followig coutig problems: (1) to fid for each the umber of partitios of a -elemet set, (2) to fid for each the

More information

Tradigms of Astundithi and Toyota

Tradigms of Astundithi and Toyota Tradig the radomess - Desigig a optimal tradig strategy uder a drifted radom walk price model Yuao Wu Math 20 Project Paper Professor Zachary Hamaker Abstract: I this paper the author iteds to explore

More information

Hypergeometric Distributions

Hypergeometric Distributions 7.4 Hypergeometric Distributios Whe choosig the startig lie-up for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you

More information

Effective Techniques for Message Reduction and Load Balancing in Distributed Graph Computation

Effective Techniques for Message Reduction and Load Balancing in Distributed Graph Computation Effective Techiques for Message Reductio ad Load Balacig i Distributed Graph Computatio ABSTRACT Da Ya, James Cheg, Yi Lu Dept. of Computer Sciece ad Egieerig The Chiese Uiversity of Hog Kog {yada, jcheg,

More information

Integer Factorization Algorithms

Integer Factorization Algorithms Iteger Factorizatio Algorithms Coelly Bares Departmet of Physics, Orego State Uiversity December 7, 004 This documet has bee placed i the public domai. Cotets I. Itroductio 3 1. Termiology 3. Fudametal

More information

Systems Design Project: Indoor Location of Wireless Devices

Systems Design Project: Indoor Location of Wireless Devices Systems Desig Project: Idoor Locatio of Wireless Devices Prepared By: Bria Murphy Seior Systems Sciece ad Egieerig Washigto Uiversity i St. Louis Phoe: (805) 698-5295 Email: bcm1@cec.wustl.edu Supervised

More information

International Journal on Emerging Technologies 1(2): 48-56(2010) ISSN : 0975-8364

International Journal on Emerging Technologies 1(2): 48-56(2010) ISSN : 0975-8364 e t Iteratioal Joural o Emergig Techologies (): 48-56(00) ISSN : 0975-864 Dyamic load balacig i distributed ad high performace parallel eterprise computig by embeddig MPI ad ope MP Sadip S. Chauha, Sadip

More information

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL. Auities Uder Radom Rates of Iterest II By Abraham Zas Techio I.I.T. Haifa ISRAEL ad Haifa Uiversity Haifa ISRAEL Departmet of Mathematics, Techio - Israel Istitute of Techology, 3000, Haifa, Israel I memory

More information

Lecture 4: Cheeger s Inequality

Lecture 4: Cheeger s Inequality Spectral Graph Theory ad Applicatios WS 0/0 Lecture 4: Cheeger s Iequality Lecturer: Thomas Sauerwald & He Su Statemet of Cheeger s Iequality I this lecture we assume for simplicity that G is a d-regular

More information

Determining the sample size

Determining the sample size Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors

More information

Recovery time guaranteed heuristic routing for improving computation complexity in survivable WDM networks

Recovery time guaranteed heuristic routing for improving computation complexity in survivable WDM networks Computer Commuicatios 30 (2007) 1331 1336 wwwelseviercom/locate/comcom Recovery time guarateed heuristic routig for improvig computatio complexity i survivable WDM etworks Lei Guo * College of Iformatio

More information

MTO-MTS Production Systems in Supply Chains

MTO-MTS Production Systems in Supply Chains NSF GRANT #0092854 NSF PROGRAM NAME: MES/OR MTO-MTS Productio Systems i Supply Chais Philip M. Kamisky Uiversity of Califoria, Berkeley Our Kaya Uiversity of Califoria, Berkeley Abstract: Icreasig cost

More information

The Power of Both Choices: Practical Load Balancing for Distributed Stream Processing Engines

The Power of Both Choices: Practical Load Balancing for Distributed Stream Processing Engines The Power of Both Choices: Practical Load Balacig for Distributed Stream Processig Egies Muhammad Ais Uddi Nasir #1, Giamarco De Fracisci Morales 2, David García-Soriao 3 Nicolas Kourtellis 4, Marco Serafii

More information

Trackless online algorithms for the server problem

Trackless online algorithms for the server problem Iformatio Processig Letters 74 (2000) 73 79 Trackless olie algorithms for the server problem Wolfgag W. Bei,LawreceL.Larmore 1 Departmet of Computer Sciece, Uiversity of Nevada, Las Vegas, NV 89154, USA

More information

1. C. The formula for the confidence interval for a population mean is: x t, which was

1. C. The formula for the confidence interval for a population mean is: x t, which was s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : p-value

More information

Sequences and Series

Sequences and Series CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

More information

The Forgotten Middle. research readiness results. Executive Summary

The Forgotten Middle. research readiness results. Executive Summary The Forgotte Middle Esurig that All Studets Are o Target for College ad Career Readiess before High School Executive Summary Today, college readiess also meas career readiess. While ot every high school

More information

Domain 1 - Describe Cisco VoIP Implementations

Domain 1 - Describe Cisco VoIP Implementations Maual ONT (642-8) 1-800-418-6789 Domai 1 - Describe Cisco VoIP Implemetatios Advatages of VoIP Over Traditioal Switches Voice over IP etworks have may advatages over traditioal circuit switched voice etworks.

More information

CS103X: Discrete Structures Homework 4 Solutions

CS103X: Discrete Structures Homework 4 Solutions CS103X: Discrete Structures Homewor 4 Solutios Due February 22, 2008 Exercise 1 10 poits. Silico Valley questios: a How may possible six-figure salaries i whole dollar amouts are there that cotai at least

More information

Measures of Spread and Boxplots Discrete Math, Section 9.4

Measures of Spread and Boxplots Discrete Math, Section 9.4 Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,

More information

Firewall Modules and Modular Firewalls

Firewall Modules and Modular Firewalls Firewall Modules ad Modular Firewalls H. B. Acharya Uiversity of Texas at Austi acharya@cs.utexas.edu Aditya Joshi Uiversity of Texas at Austi adityaj@cs.utexas.edu M. G. Gouda Natioal Sciece Foudatio

More information

Subject CT5 Contingencies Core Technical Syllabus

Subject CT5 Contingencies Core Technical Syllabus Subject CT5 Cotigecies Core Techical Syllabus for the 2015 exams 1 Jue 2014 Aim The aim of the Cotigecies subject is to provide a groudig i the mathematical techiques which ca be used to model ad value

More information

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx SAMPLE QUESTIONS FOR FINAL EXAM REAL ANALYSIS I FALL 006 3 4 Fid the followig usig the defiitio of the Riema itegral: a 0 x + dx 3 Cosider the partitio P x 0 3, x 3 +, x 3 +,......, x 3 3 + 3 of the iterval

More information

7.1 Finding Rational Solutions of Polynomial Equations

7.1 Finding Rational Solutions of Polynomial Equations 4 Locker LESSON 7. Fidig Ratioal Solutios of Polyomial Equatios Name Class Date 7. Fidig Ratioal Solutios of Polyomial Equatios Essetial Questio: How do you fid the ratioal roots of a polyomial equatio?

More information

Concept: Types of algorithms

Concept: Types of algorithms Discrete Math for Bioiformatics WS 10/11:, by A. Bockmayr/K. Reiert, 18. Oktober 2010, 21:22 1001 Cocept: Types of algorithms The expositio is based o the followig sources, which are all required readig:

More information

Professional Networking

Professional Networking Professioal Networkig 1. Lear from people who ve bee where you are. Oe of your best resources for etworkig is alumi from your school. They ve take the classes you have take, they have bee o the job market

More information

3. Greatest Common Divisor - Least Common Multiple

3. Greatest Common Divisor - Least Common Multiple 3 Greatest Commo Divisor - Least Commo Multiple Defiitio 31: The greatest commo divisor of two atural umbers a ad b is the largest atural umber c which divides both a ad b We deote the greatest commo gcd

More information

Dynamic House Allocation

Dynamic House Allocation Dyamic House Allocatio Sujit Gujar 1 ad James Zou 2 ad David C. Parkes 3 Abstract. We study a dyamic variat o the house allocatio problem. Each aget ows a distict object (a house) ad is able to trade its

More information

Reliability Analysis in HPC clusters

Reliability Analysis in HPC clusters Reliability Aalysis i HPC clusters Narasimha Raju, Gottumukkala, Yuda Liu, Chokchai Box Leagsuksu 1, Raja Nassar, Stephe Scott 2 College of Egieerig & Sciece, Louisiaa ech Uiversity Oak Ridge Natioal Lab

More information

Solving Logarithms and Exponential Equations

Solving Logarithms and Exponential Equations Solvig Logarithms ad Epoetial Equatios Logarithmic Equatios There are two major ideas required whe solvig Logarithmic Equatios. The first is the Defiitio of a Logarithm. You may recall from a earlier topic:

More information

Convexity, Inequalities, and Norms

Convexity, Inequalities, and Norms Covexity, Iequalities, ad Norms Covex Fuctios You are probably familiar with the otio of cocavity of fuctios. Give a twicedifferetiable fuctio ϕ: R R, We say that ϕ is covex (or cocave up) if ϕ (x) 0 for

More information