The Power of Both Choices: Practical Load Balancing for Distributed Stream Processing Engines

Size: px
Start display at page:

Download "The Power of Both Choices: Practical Load Balancing for Distributed Stream Processing Engines"

Transcription

1 The Power of Both Choices: Practical Load Balacig for Distributed Stream Processig Egies Muhammad Ais Uddi Nasir #1, Giamarco De Fracisci Morales 2, David García-Soriao 3 Nicolas Kourtellis 4, Marco Serafii $5 # KTH Royal Istitute of Techology, Stockholm, Swede Yahoo Labs, Barceloa, Spai $ Qatar Computig Research Istitute, Doha, Qatar 1 aisu@kth.se, 2 gdfm@apache.org, 3 davidgs@yahoo-ic.com 4 kourtell@yahoo-ic.com, 5 mserafii@qf.org.qa arxiv: v1 [cs.dc] 3 Apr 2015 Abstract We study the problem of load balacig i distributed stream processig egies, which is exacerbated i the presece of skew. We itroduce PARTIAL KEY GROUPING (PKG), a ew stream partitioig scheme that adapts the classical power of two choices to a distributed streamig settig by leveragig two ovel techiques: key splittig ad local load estimatio. I so doig, it achieves better load balacig tha key groupig while beig more scalable tha shuffle groupig. We test PKG o several large datasets, both real-world ad sythetic. Compared to stadard hashig, PKG reduces the load imbalace by up to several orders of magitude, ad ofte achieves early-perfect load balace. This result traslates ito a improvemet of up to 60% i throughput ad up to 45% i latecy whe deployed o a real Storm cluster. I. INTRODUCTION Distributed stream processig egies (DSPEs) such as S4, 1 Storm, 2 ad Samza 3 have recetly gaied much attetio owig to their ability to process huge volumes of data with very low latecy o clusters of commodity hardware. Streamig applicatios are represeted by directed acyclic graphs (DAG) where vertices, called processig elemets (PEs), represet operators, ad edges, called streams, represet the data flow from oe PE to the ext. For scalability, streams are partitioed ito sub-streams ad processed i parallel o a replica of the PE called processig elemet istace (PEI). Applicatios of DSPEs, especially i data miig ad machie learig, typically require accumulatig state across the stream by groupig the data o commo fields [1, 2]. Aki to MapReduce, this groupig i DSPEs is usually implemeted by partitioig the stream o a key ad esurig that messages with the same key are processed by the same PEI. This partitioig scheme is called key groupig. Typically, it maps keys to sub-streams by usig a hash fuctio. Hash-based routig allows each source PEI to route each message solely via its key, without eedig to keep ay state or to coordiate amog PEIs. Alas, it also results i load imbalace as it represets a sigle-choice paradigm [3], ad because it disregards the popularity of a key, i.e., the umber of messages with the same key i the stream, as depicted i Figure Stream Source Source Worker Worker Worker Fig. 1: Load imbalace geerated by skew i the key distributio whe usig key groupig. The color of each message represets its key. Large web compaies ru massive deploymets of DSPEs i productio. Give their scale, good utilizatio of the resources is critical. However, the skewed distributio of may workloads causes a few PEIs to sustai a sigificatly higher load tha others. This suboptimal load balacig leads to poor resource utilizatio ad iefficiecy. Aother partitioig scheme called shuffle groupig achieves excellet load balacig by usig a roud-robi routig, i.e., by sedig a message to a ew PEI i cyclic order, irrespective of its key. However, this scheme is mostly suited for stateless computatios. Shuffle groupig may require a additioal aggregatio phase ad more memory to express stateful computatios (Sectio II). Additioally, it may cause a decrease i accuracy for data miig algorithms (Sectio VI). I this work, we focus o the problem of load balacig of stateful applicatios i DSPEs whe the iput stream follows a skewed key distributio. I this settig, load balacig is attaied by havig upstream PEIs create a balaced partitio of messages for dowstream PEIs, for each edge of the DAG. Ay practical solutio for this task eeds to be both streamig ad distributed: the former costrait eforces the use of a olie algorithm, as the distributio of keys is ot kow i advace, while the latter calls for a decetralized solutio with miimal coordiatio overhead i order to esure scalability.

2 To address this problem, we leverage the power of two choices [4] (PoTC), whereby the system picks the least loaded out of two cadidate PEIs for each key. However, to maitai the sematics of key groupig while usig PoTC (i.e., so that oe key is hadled by a sigle PEI), sources would eed to track which of the two possible choices has bee made for each key. This requiremet imposes a coordiatio overhead every time a ew key appears, so that all sources agree o the choice. I additio, sources should the store this choice i a routig table. Each edge i the DAG would thus require a routig table for every source, each with oe etry per key. Give that a typical stream may cotai billios of keys, this solutio is ot practical. Istead, we propose to relax the key groupig costrait ad allow each key to be hadled by both cadidate PEIs. We call this techique key splittig; it allows us to apply PoTC without the eed to agree o, or keep track of, the choices made. As show i Sectio V, key splittig guaratees good load balace eve i the presece of skew. A secod issue is how to estimate the load of a dowstream PEI. Traditioal work o PoTC assumes global kowledge of the curret load of each server, which is challegig i a distributed system. Additioally, it assumes that all messages origiate from a sigle source, whereas messages i a DSPE are geerated i parallel by multiple sources. I this paper we prove that, iterestigly, a simple local load estimatio techique, whereby each source idepedetly tracks the load of dowstream PEIs, performs very well i practice. This techique gives results that are almost idistiguishable from those give by a global load oracle. The combiatio of these two techiques (key splittig ad local load estimatio) eables a ew stream partitioig scheme amed PARTIAL KEY GROUPING. I summary, we make the followig cotributios. We study the problem of load balacig i moder distributed stream processig egies. We show how to apply PoTC to DSPEs i a pricipled ad practical way, ad propose two ovel techiques to do so: key splittig ad local load estimatio. We propose PARTIAL KEY GROUPING, a ovel ad simple stream partitioig scheme that applies to ay DSPE. Whe implemeted o top of Apache Storm, it requires a sigle fuctio ad less tha 20 lies of code. 4 We measure the impact of PKG o a real deploymet o Apache Storm. Compared to key groupig, it improves the throughput of a example applicatio o real-world datasets by up to 60%, ad the latecy by up to 45%. II. PRELIMINARIES AND MOTIVATION We cosider a DSPE ruig o a cluster of machies that commuicate by exchagig messages followig the flow of a DAG, as discussed. I this work, we focus o balacig the data trasmissio alog a sigle edge i a DAG. Load balacig across the whole DAG is achieved by balacig alog each 4 Available at edge idepedetly. Each edge represets a sigle stream of data, alog with its partitioig scheme. Give a stream uder cosideratio, let the set of upstream PEIs (sources) be S, ad the set of dowstream PEIs (workers) be W, ad their sizes be S = S ad W = W (see Figure 1). The iput to the egie is a sequece of messages m = t, k, v where t is the timestamp at which the message is received, k K, K = K is the message key, ad v is the value. The messages are preseted to the egie i ascedig order by timestamp. A stream partitioig fuctio P t : K N maps each key i the key space to a atural umber, at a give time t. This umber idetifies the worker resposible for processig the message. Each worker is associated to oe or more keys. We use a defiitio of load similar to others i the literature (e.g., Flux [5]). At time t, the load of a worker i is the umber of messages hadled by the worker up to t: L i (t) = { τ, k, v : P τ (k) = i τ t} I priciple, depedig o the applicatio, two differet messages might impose a differet load o workers. However, i most cases these differeces eve out ad modelig such applicatio-specific differeces is ot ecessary. We defie imbalace at time t as the differece betwee the maximum ad the average load of the workers: I(t) = max(l i (t)) avg(l i (t)), for i W i We tackle the problem of idetifyig a stream partitioig fuctio that miimizes the imbalace, while at the same time avoidig the dowsides of shuffle groupig. A. Existig Stream Partitioig Fuctios Data is set betwee PEs by exchagig messages over the etwork. Several primitives are offered by DSPEs for sources to partitio the stream, i.e., to route messages to differet workers. There are two mai primitives of iterest: key groupig (KG) ad shuffle groupig (SG). KG esures that messages with the same key are hadled by the same PEI (aalogous to MapReduce). It is usually implemeted through hashig. SG routes messages idepedetly, typically i a roudrobi fashio. SG provides excellet load balace by assigig a almost equal umber of messages to each PEI. However, o guaratee is made o the partitioig of the key space, as each occurrece of a key ca be assiged to ay PEIs. SG is the perfect choice for stateless operators. However, with stateful operators oe has to hadle, store ad aggregate multiple partial results for the same key, thus icurrig additioal costs. I geeral, whe the distributio of iput keys is skewed, the umber of messages that each PEI eeds to hadle ca vary greatly. While this problem is ot preset for stateless operators, which ca use SG to evely distribute messages, stateful operators implemeted via KG suffer from load imbalace. This issue geerates a degradatio of the service level, or reduces the utilizatio of the cluster which must be provisioed to hadle the peak load of the sigle most loaded server. i

3 Example. To make the discussio more cocrete, we itroduce a simple applicatio that will be our ruig example: streamig top-k word cout. This applicatio is a adaptatio of the classical MapReduce word cout to the streamig paradigm where we wat to geerate a list of top-k words by frequecy at periodic itervals (e.g., each T secods). It is also a commo applicatio i may domais, for example to idetify tredig topics i a stream of tweets. Implemetatio via key groupig. Followig the MapReduce paradigm, the implemetatio of word cout described by Neumeyer et al. [6] or Noll [7] uses KG o the source stream. The couter PE keeps a ruig couter for each word. KG esures that each word is hadled by a sigle PEI, which thus has the total cout for the word i the stream. At periodic itervals, the couter PEIs sed their top-k couters to a sigle dowstream aggregator to compute the top-k words. While this applicatio is clearly simplistic, it models quite well a geeral class of applicatios commo i data miig ad machie learig whose goal is to create a model by trackig aggregated statistics of the data. Clearly KG geerates load imbalace as, for istace, the PEI associated to the key the will receive may more messages tha the oe associated with Barceloa. This example captures the core of the problem we tackle: the distributio of word frequecies follows a Zipf law where few words are extremely commo while a large majority are rare. Therefore, a eve distributio of keys such as the oe geerated by KG results i a ueve distributio of messages. Implemetatio via shuffle groupig. A alterative implemetatio uses shuffle groupig o the source stream to get partial word couts. These couts are set dowstream to a aggregator every T secods via key groupig. The aggregator simply combies the couts for each key to get the total cout ad selects the top-k for the fial result. Usig SG requires a slightly more complex logic but it geerates a eve distributio of messages amog the couter PEIs. However, it suffers from other problems. Give that there is o guaratee which PEI will hadle a key, each PEI potetially eeds to keep a couter for every key i the stream. Therefore, the memory usage of the applicatio grows liearly with the parallelism level. Hece, it is ot possible to scale to a larger workload by addig more machies: the applicatio is ot scalable i terms of memory. Eve if we resort to approximatio algorithms, i geeral, the error depeds o the umber of aggregatios performed, thus it grows liearly with the parallelism level. We aalyze this case i further detail alog with other applicatio scearios i Sectio VI. B. Key groupig with rebalacig Oe commo solutio for load balacig i DSPEs is operator migratio [5, 8, 9, 10, 11, 12]. Oce a situatio of load imbalace is detected, the system activates a rebalacig routie that moves part of the keys, ad the state associated with them, away from a overloaded server. While this solutio is easy to uderstad, its applicatio i our cotext is ot straightforward for several reasos. Rebalacig requires settig a umber of parameters such as how ofte to check for imbalace ad how ofte to rebalace. These parameters are ofte applicatio-specific as they ivolve a trade-off betwee imbalace ad rebalacig cost that depeds o the size of the state to migrate. Further, implemetig a rebalacig mechaism usually requires major modificatios of the DSPE at had. This task may be hard, ad is usually see with suspicio by the commuity drivig ope source projects, as witessed by the may variats of Hadoop that were ever merged back ito the mai lie of developmet [13, 14, 15]. I our cotext, rebalacig implies migratig keys from oe sub-stream to aother. However, this migratio is ot directly supported by the programmig abstractios of some DSPEs. Storm ad Samza use a coarse-graied stream partitioig paradigm. Each stream is partitioed ito as may sub-streams as the umber of dowstream PEIs. Key migratio is ot compatible with this partitioig paradigm, as a key caot be ucoupled from its sub-stream. I cotrast, S4 employs a fie-graied paradigm where the stream is partitioed ito oe sub-stream per key value, ad there is a oe-to-oe mappig of a key to a PEI. The latter paradigm easily supports migratio, as each key is processed idepedetly. A major problem with mappig keys to PEIs explicitly is that the DSPE must maitai several routig tables: oe for each stream. Each routig table has oe etry for each key i the stream. Keepig these tables is impractical because the memory requiremets are staggerig. I a typical web miig applicatio, each routig table ca easily have billios of keys. For a moderately large DAG with tes of edges, each with tes of sources, the memory overhead easily becomes prohibitive. Fially, as already metioed, for each stream there are several sources sedig messages i parallel. Modificatios to the routig table must be cosistet across all sources, so they require coordiatio, which creates further overhead. For these reasos we cosider a alterative approach to load balacig. III. PARTIAL KEY GROUPING The problem described so far curretly lacks a satisfyig solutio. To solve this issue, we resort to a widely-used techique i the literature of load balacig: the so-called power of two choices (PoTC). While this techique is well-kow ad has bee aalyzed thoroughly both from a theoretical ad practical perspective [16, 17, 18, 19, 4, 20], its applicatio i the cotext of DSPEs is ot straightforward ad has ot bee previously studied. Itroduced by Azar et al. [17], PoTC is a simple ad elegat techique that allows to achieve load balace whe assigig uits of load to workers. It is best described i terms of balls ad bis. Imagie a process where a stream of balls (uits of work) is distributed to a set of bis (the workers) as evely as possible. The sigle-choice paradigm correspods to puttig each ball ito oe bi selected uiformly at radom. By cotrast, the power of two choices selects two bis uiformly at radom, ad puts the ball ito the least loaded oe. This

4 simple modificatio of the algorithm has powerful implicatios that are well kow i the literature (see Sectios IV, VII). Sigle choice. The curret solutio used by all DSPEs to partitio a stream with key groupig correspods to the siglechoice paradigm. The system has access to a sigle hash fuctio H 1 (k). The partitioig of keys ito sub-streams is determied by the fuctio P t (k) = H 1 (k) mod W, where mod is the modulo operator. The sigle-choice paradigm is attractive because of its simplicity: the routig does ot require to maitai ay state ad ca be doe idepedetly i parallel. However, it suffers from a problem of load imbalace [4]. This problem is exacerbated whe the distributio of iput keys is skewed. PoTC. Whe usig the power of two choices, we have two hash fuctios H 1 (k) ad H 2 (k). The algorithm maps each key to the sub-stream assiged to the least loaded worker betwee the two possible choices, that is: P t (k) = argmi i (L i (t) : H 1 (k) = i H 2 (k) = i). The theoretical gai i load balace with two choices is expoetial compared to a sigle choice. However, usig more tha two choices oly brigs costat factor improvemets [17]. Therefore, we restrict our study to two choices. PoTC itroduces two additioal complicatios. First, to maitai the sematics of key groupig, the system eeds to keep state ad track the choices made. Secod, the system has to kow the load of the workers i order to make the right choice. We discuss these two issues ext. A. Key Splittig A aïve applicatio of PoTC to key groupig requires the system to store a bit of iformatio for each key see, to keep track of which of the two choices eeds to be used thereafter. This variat is referred to as static PoTC. Static PoTC icurs some of the problems discussed for key groupig with rebalacig. Sice the actual worker to which a key is routed is determied dyamically, sources eed to keep a routig table with a etry per key. As already discussed, maitaiig this routig table is ofte impractical. I order to leverage PoTC ad make it viable for DSPEs, we relax the requiremet of key groupig. Rather tha mappig each key to oe of the two possible choices, we allow it to be mapped to both choices. Every time a source seds a message, it selects the worker with the lowest curret load amog the two cadidates associated to that key. This techique, called key splittig, itroduces several ew trade-offs. First, key splittig allows the system to operate i a decetralized maer, by allowig multiple sources to take decisios idepedetly i parallel. As i key groupig ad shuffle groupig, o state eeds to be kept by the system ad each message ca be routed idepedetly. Key splittig eables far better load balacig compared to key groupig. It allows usig PoTC to balace the load o the workers: by splittig each key o multiple workers, it hadles the skew i the key popularity. Moreover, give that all its decisios are dyamic ad based o the curret load of the system (as opposed to static PoTC), key splittig adapts to chages i the popularity of keys over time. Third, key splittig reduces the memory usage ad aggregatio overhead compared to shuffle groupig. Give that each key is assiged to exactly two PEIs, the memory to store its state is just a costat factor higher tha whe usig key groupig. Istead, with shuffle groupig the memory grows liearly with the umber of workers W. Additioally, state aggregatio eeds to happe oly oce for the two partial states, as opposed to W 1 times i shuffle groupig. This improvemet also allows to reduce the error icurred durig aggregatio for some algorithms, as discussed i Sectio VI. From the poit of view of the applicatio developer, key splittig gives rise to a ovel stream partitioig scheme called PARTIAL KEY GROUPING, which lies i-betwee key groupig ad shuffle groupig. Naturally, ot all algorithms ca be expressed via PKG. The fuctios that ca leverage PKG are the same oes that ca leverage a combier i MapReduce, i.e., associative fuctios ad mooids. Examples of applicatios iclude aïve Bayes, heavy hitters, ad streamig parallel decisio trees, as detailed i Sectio VI. O the cotrary, other fuctios such as computig the media caot be easily expressed via PKG. Example. Let us examie the streamig top-k word cout example usig PKG. I this case, each word is tracked by two couters o two differet PEIs. Each couter holds a partial cout for the word, while the total cout is the sum of the two partial couts. Therefore, the total memory usage is 2 K, i.e., O(K). Compare this result to SG where the memory is O(W K). Partial couts are set dowstream to a aggregator that computes the fial result. For each word, the applicatio seds two couters, ad the aggregator performs a costat time aggregatio. The total work for the aggregatio is O(K). Coversely, with SG the total work is agai O(W K). Compared to KG, the implemetatio with PKG requires additioal logic, some more memory ad has some aggregatio overhead. However, it also provides a much better load balace which maximizes the resource utilizatio of the cluster. The experimets i Sectio V prove that the beefits outweigh its cost. B. Local Load Estimatio PoTC requires kowledge of the load of each worker to take its routig decisio. A DSPE is a distributed system, ad, i geeral, sources ad workers are deployed o differet machies. Therefore, the load of each worker is ot readily available to each source. Iterestigly, we prove that o commuicatio betwee sources ad workers is eeded to effectively apply PoTC. We propose a local load estimatio techique, whereby each source idepedetly maitais a local load-estimate vector with oe elemet per worker. The load estimates are updated by usig oly local iformatio of the portio of stream set by each source. We argue that i order to achieve global load balace it is sufficiet that each source idepedetly balaces the load it geerates across all workers.

5 The correctess of local load estimatio directly follows from our stadard defiitio of load i Sectio II. The load o a worker L i is simply the sum of the loads that each source j imposes o the give worker: L i (t) = j S Lj i (t). Each source j ca keep a estimate of the load o each worker i based o the load it has geerated L j i. As log as each source keeps its ow portio of load balaced, the the overall load o the workers will also be balaced. Ideed, the maximum overall load is at most the sum of the maximum load that each source sees locally. It follows that the maximum imbalace is also at most the sum of the local imbalaces. IV. ANALYSIS We proceed to aalyze the coditios uder which PKG achieves good load balace. Recall from Sectio II that we have a set W of workers at our disposal ad receive a sequece of m messages k 1,..., k m with values from a key uiverse K. Upo receivig the i-th message with value k i K, we eed to decide its placemet amog the workers; decisios are irrevocable. We assume oe message arrives per uit of time. Our goal is to miimize the evetual maximum load L(m), which is the same as miimizig the imbalace I(m). A simple placemet scheme such as shuffle groupig provides a imbalace of at most oe, but we would like to limit the umber of workers processig each key to d N +. Chromatic balls ad bis. We model our problem i the framework of balls ad bis processes, where keys correspod to colors, messages to colored balls, ad workers to bis. Choose d idepedet hash fuctios H 1,..., H d : K [] uiformly at radom. Defie the Greedy-d scheme as follows: at time t, the t-th ball (whose color is k t ) is placed o the bi with miimum curret load amog H 1 (k t ),..., H d (k t ), i.e., P t (k t ) = argmi i {H1(k t),...,h d (k t)} L i (t). Recall that with key splittig there is o eed to remember the choice for the ext time a ball of the same color appears. Observe that whe d = 1, each ball color is assiged to a uique bi so o choice has to be made; this models hashbased key groupig. At the other extreme, whe d l, all bis are valid choices, ad we obtai shuffle groupig. Key distributio. Fially, we assume the existece of a uderlyig discrete distributio D supported o K from which ball colors are draw, i.e., k 1,..., k m is a sequece of m idepedet samples from D. Without loss of geerality, we idetify the set K of keys with N + or, if K is fiite of cardiality K = K, with [K] = {1,..., K}. We assume them ordered by decreasig probability: if p i is the probability of drawig key i from D, the p 1 p 2 p 3... ad i K p i = 1. We also idetify the set W of bis with []. A. Imbalace with PARTIAL KEY GROUPING Compariso with stadard problems. As log as we keep gettig balls of differet colors, our process is idetical to the stadard Greedy-d process of Azar et al. [17]. This occurs with high probability provided that m is small eough. But for sufficietly large m (e.g., whe m 1 p 1 ), repeated keys will start to arrive. Recall that for ay umber of choices d 2, the maximum imbalace after throwig m balls of differet colors l l ito bis with the stadard Greedy-d process is l d + m + O(1). Ufortuately, such strog bouds (idepedet of m) caot apply to our settig. To gai some ituitio o what may go wrog, cosider the followig examples where d=2. Note that for the maximum load ot to be much larger tha the average load, the umber of bis used must ot exceed O(1/p 1 ), where p 1 is the maximum key probability. Ideed, at ay time we expect the two bis h 1 (1), h 2 (1) to cotai together at least a p 1 fractio of all balls, just coutig the occurreces of a sigle key. Hece the expected maximum load amog the two grows at a rate of at least p 1 /2 per uit of time, while the overall average load icreases by exactly 1 per uit of time. Thus, if p 1 > 2/, the expected imbalace at time m will be lower bouded by ( p1 2 1 )m, which grows liearly with m. This holds irrespective of the placemet scheme used. However, requirig p 1 2/ is ot eough to prevet imbalace Ω(m). Cosider the uiform distributio over keys. Let B = i {H 1(i), H 2 (i)} be the set of all bis that belog to oe of the potetial choices for some key. As is well-kow, the expected size of B is ( 1 ) 1 2 (1 1 e ). So all 2 keys use oly a (1 1 e ) fractio of all bis, ad 2 roughly bis will remai uused. I fact the imbalace m m after m balls will be at least 0.156m. The problem is that most cocrete istatiatios of our two radom hash fuctios cause the existece of a overpopulated set B of bis iside which the average bi load must grow faster tha the average load across all bis. (I fact, this case subsumes our first example above, where B was {H 1 (1), H 2 (1)}.) Fially, eve i the absece of overpopulated bi subsets, some iheret imbalace is due to deviatios betwee the empirical ad true key distributios. For istace, suppose there are two keys 1, 2 with equal probability 1 2 ad = 4 bis. With costat probability, key 1 is assiged to bis 1, 2 ad key 2 to bis 3, 4. This situatio looks perfect because the Greedy-2 choice will sed each occurrece of key 1 to bis 1, 2 alterately so the loads of bis 1, 2 will always equal up to ±1. However, the umber of balls with key 1 see is likely to deviate from m/2 by roughly Θ( m), so either the top two or the bottom two bis will receive m/4 + Ω( m) balls, ad the imbalace will be Ω( m) with costat probability. I the remaider of this sectio we carry out our aalysis, which broadly costrued asserts that the above are the oly impedimets to achieve good balace. Statemet of results. We oted that oce the umber of bis exceeds 2/p 1 (where p 1 is the maximum key frequecy), the maximum load will be domiated by the loads of the bis to which the most frequet key is mapped. Hece the mai case of iterest is where p 1 = O( 1 ). We focus o the case where the umber of balls is large compared to the umber of bis. The followig results show that partial key groupig ca sigificatly reduce the maximum load (ad the imbalace), compared to key groupig.

6 Theorem 4.1: Suppose we use bis ad let m 2. Assume a key distributio D with maximum probability p The the imbalace after m steps of the Greedy-d process satisfies, with probability at least 1 1, { O ( m I(m) = l l l ), if d = 1 O ( ). m, if d 2 As the ext result shows, the bouds above are bestpossible. 5 Theorem 4.2: There is a distributio D satisfyig the hypothesis of Theorem 4.1 such that the imbalace after m steps of the Greedy-d process satisfies, with probability at least 1 1, { Ω ( m I(m) = l l l ), if d = 1 Ω ( ). m, if d 2 We omit the proof of Theorem 4.2 (it follows by cosiderig a uiform distributio over 5 keys). The ext sectio is devoted to the proof of the upper boud, Theorem 4.1. B. Proof Cocetratio iequalities. We recall the followig results, which we eed to prove our mai theorem. Theorem 4.3 (Cheroff bouds): Suppose {X i } is a fiite sequece of idepedet radom variables with X i [0, M] ad let Y = i X i, µ = i E[X i]. The for all β µ, Pr[Y β] C(µ, β, M), where ( C(µ, β, M) exp β l( β eµ ) + µ ). M Theorem 4.4 (McDiarmid s iequality): Let X 1,..., X be a vector of idepedet radom variables ad let f be a fuctio satisfyig f(a) f(a ) 1 wheever the vectors a ad a differ i just oe coordiate. The Pr[f(X 1,..., X ) > E[f(X 1,..., X )] + λ] exp( 2λ 2 ). The µ r measure of bi subsets. For every oempty set of bis S [] ad 1 r d, defie µ r (S) = {p i {H 1 (i),..., H r (i)} B}. We will be iterested i µ 1 (B) (which measures the probability that a radom key from D will have its choice iside B) ad µ d (B) (which measures the probability that a radom key from D will have all its choices iside B). Note that µ 1 (B) = j B µ 1({j}) ad µ d (B) µ 1 (B). Lemma 4.5: For every B [], E[µ 1 (B)] = B ad, if p 1 1, [ Pr µ 1 (B) B ] ( ) B 1 (eλ) λ λ. 5 However, the imbalace ca be much smaller tha the worst-case bouds from Theorem 4.1 if the probability of most keys is much smaller tha p 1, which is the case i may setups. Proof: The first claim follows from liearity of expectatio ad the fact that i p i = 1. For the secod, let B = k. Usig Theorem 4.3, Pr [ µ 1 (B) k (eλ)] is at most ( k C, k ) eλ, p 1 exp ( kp ) eλ l λ exp( kλ l λ), sice p 1 1. Lemma 4.6: For every B [], E[µ d (B)] = provided that p 1 1 5, [ Pr µ d (B) B ] ( ) 5 B e B. ( ) d B ad, Proof: Agai the first claim is easy. For the secod, let B = k. Usig Theorem 4.3, Pr [ µ d (B) ] k is at most ( ( k ) ) ( d, k C, p k(d 1) ( ) ) 1 exp l p 1 ek ( ( )) exp 5k l ek sice p Corollary 4.7: Assume p 1 1 4, d 2. The, with high probability, { µd (B) max B / B [], B } 1. 5 Proof: We use Lemma 4.5 ad the uio boud. The probability that the claim fails to hold is bouded by [ Pr µ d (B) k ] ( ) ( ) 5k ek k B /5 k /5 ( e ) ( ) k 5k ( ) ek 1 = o, k k /5 where we used ( ( k) e ) k, k valid for all k. For a schedulig algorithm A ad a set B [] of bis, write L A B (t) = max j B L j (t) for the maximum load amog the bis i B after t balls have bee processed by A. Lemma 4.8: Suppose there is a set A [] of bis such that for all T A, µ d (T ) T. The A = Greedy-d satisfies L A A (m) = O( m ) + LA []\A(m) with high probability. Proof: We use a couplig argumet. Cosider the followig two idepedet processes P ad Q: P proceeds as Greedy-d, while Q picks the bi for each ball idepedetly at radom from [] ad icreases its load. Cosider ay time t at which the load vector is ω t N ad M t = M(ω t ) is the set of bis with maximum load. After hadlig the t-th ball, let X t deote the evet that P icreases the maximum load i A because the ew ball has all choices i M t A, ad Y t deote the evet that Q icreases the maximum load i A. Fially, let Z t deote the evet that P icreases the maximum load i A because the ew ball has some choice i M t A ad some choice i M t \ A, but the load of oe of its choices i M t A is o larger. We idetify these evets with their idicator radom variables.

7 Note that the maximum load i A at the ed of Process P is L P A (m) = t [m] (X t + Z t ), while at the ed of Process Q is L Q A (m) = t [m] Y t. Coditioed o ay load vector ω t, the probability of X t is Pr[X t ω t ] = µ d (M t A) M t A M t = Pr[Y t ω t ], So Pr[X t ω t ] Pr[Y t ω t ], which implies that for ay b N, Pr[ t [m] X t b] Pr[ t [m] Y t b]. But with high probability, the maximum load of Process Q is b = O(m/), so t X t = O(m/) holds with at least the same probability. O the other had, t Z t L P []\A(m) because each occurrece of Z t icreases the maximum load o A, ad oce a time t is reached such that L P A (t) > LP []\A(m), evet Z t must cease to happe. Therefore L P A (m) = t [m] X t + t [m] Z t O(m/) + L P []\A(m), yieldig the result. Proof of Theorem 4.1: Let A = { j [] µ 1 ({j}) 3e }. Observe that every bi j / A has µ 1 ({j}) < 3e ad this implies that, coditioed o ay choice of hash fuctios, the maximum load of all bis outside A is at most 20m with high probability. 6 Therefore our task reduces to showig that the maximum load of the bis i A is O( m ). Cosider the sequece X 1,..., X K of radom variables give by X i = H 1 (i), ad let f(x 1, X 2,..., X K ) = A deote the umber of bis j with µ 1 ({j}) 3e. By Lemma 4.5, E[ A ] = E[f] Moreover, the fuctio f satisfies the hypothesis of Theorem 4.4.We coclude that, with high probability, A 5. Now assume that the thesis of Corollary 4.7 holds, which happes except with probability o(1/). The we have that for all B A, µ d (B) B. Thus Lemma 4.8 applies to A. This meas that after throwig m balls, the maximum load amog the bis i A is O( m ), as we wished to show. V. EVALUATION We assess the performace of our proposal by usig both simulatios ad a real deploymet. I so doig, we aswer the followig questios: Q1: What is the effect of key splittig o PoTC? Q2: How does local estimatio compare to a global oracle? Q3: How robust is PARTIAL KEY GROUPING? Q4: What is the overall effect of PARTIAL KEY GROUPING o applicatios deployed o a real DSPE? A. Experimetal Setup Datasets. Table I summarizes the datasets used. We use two mai real datasets, oe from Wikipedia ad oe from Twitter. These datasets were chose for their large size, their differet degree of skewess, ad because they are represetative of Web ad olie social etwork domais. The Wikipedia 6 This is by majorizatio with the process that just throws every ball to the first choice; see, e.g, Azar et al. [17]. TABLE I: Summary of the datasets used i the experimets: umber of messages, umber of keys ad percetage of messages havig the most frequet key (p 1 ). Dataset Symbol Messages Keys p 1(%) Wikipedia WP 22M 2.9M 9.32 Twitter TW 1.2G 31M 2.67 Cashtags CT 690k 2.9k 3.29 Sythetic 1 LN 1 10M 16k Sythetic 2 LN 2 10M 1.1k 7.01 LiveJoural LJ 69M 4.9M 0.29 Slashdot0811 SL 1 905k 77k 3.28 Slashdot0902 SL 2 948k 82k 3.11 dataset (WP) 7 is a log of the pages visited durig a day i Jauary Each visit is a message ad the page s URL represets its key. The Twitter dataset (TW) is a sample of tweets crawled durig July Each tweet is parsed ad split ito its words, which are used as the key for the message. A additioal Twitter dataset comprises a sample of tweets crawled i November The keys for the messages are the cashtags i these tweets. A cashtag is a ticker symbol used i the stock market to idetify a publicly traded compay preceded by the dollar sig (e.g., $AAPL for Apple). Popular cash tags chage from week to week. This dataset allows to study the effect of shift of skew i the key distributio. We also geerate two sythetic datasets (LN 1, LN 2 ) with keys followig a log-ormal distributio, a commoly used heavy-tailed skewed distributio [21]. The parameters of the distributio (µ 1 =1.789, σ 1 =2.366; µ 2 =2.245, σ 2 =1.133) come from a aalysis of Orkut, ad try to emulate workloads from the olie social etwork domai [22]. Fially, we experimet o three additioal datasets comprised of directed graphs 8 (LJ, SL 1, SL 2 ). We use the edges i the graph as messages ad the vertices as keys. These datasets are used to test the robustess of PKG to skew i partitioig the stream at the sources, as explaied ext. They also represet a differet kid of applicatio domai: streamig graph miig. Simulatio. We process the datasets by simulatig the DAG preseted i Figure 1. The stream is composed of timestamped keys that are read by multiple idepedet sources (S) via shuffle groupig, uless otherwise specified. The sources forward the received keys to the workers (W) dowstream. I our simulatios we assume that the sources perform data extractio ad trasformatio, while the workers perform data aggregatio, which is the most computatioally expesive part of the DAG. Thus, the workers are the bottleeck i the DAG ad the focus for the load balacig. B. Experimetal Results Q1. We measure the imbalace i the simulatios whe usig the followig techiques: 7 id=60 8

8 G L 5 L 10 L 15 L 20 H Fractio of Imbalace TW workers WP workers CT workers LN workers LN workers Fig. 2: Fractio of average imbalace with respect to total umber of messages for each dataset, for differet umber of workers ad umber of sources. TABLE II: Average imbalace whe varyig the umber of workers for the Wikipedia ad Twitter datasets. Dataset WP TW W PKG e5 8.0e e6 Off-Greedy e6 1.8e e6 2.0e7 O-Greedy e5 1.6e6 1.8e e7 2.0e7 PoTC e5 1.6e6 1.8e6 2.2e4 5.1e3 1.4e7 2.0e7 Hashig 1.4e6 1.7e6 2.0e6 2.0e6 4.1e7 3.7e7 2.4e7 3.3e7 H: Hashig, which represets stadard key groupig (KG) ad is our mai baselie. We use a 64-bit Murmur hash fuctio to miimize the probability of collisio. PoTC: Power of two choices without usig key splittig, i.e., traditioal PoTC applied to key groupig. O-Greedy: Olie greedy algorithm that picks the least loaded worker to hadle a ew key. Off-Greedy: Offlie greedy sorts the keys by decreasig frequecy ad executes O-Greedy. PKG: PoTC with key splittig. Note that PKG is the oly method that uses key splittig. Off-Greedy kows the whole distributio of keys so it represets a ufair compariso for olie algorithms. Table II shows the results of the compariso o the two mai datasets WP ad TW. Each value is the average imbalace measured throughout the simulatio. As expected, hashig performs the worst, creatig a large imbalace i all cases. While PoTC performs better tha hashig i all the experimets, it is outclassed by O-Greedy o TW. O-Greedy performs very close to Off-Greedy, which is a good result cosiderig that it is a olie algorithm. Iterestigly, PKG performs eve better tha Off-Greedy. Relaxig the costrait of KG allows to achieve a load balace comparable to offlie algorithms. We coclude that PoTC aloe is ot eough to guaratee good load balace, ad key splittig is fudametal ot oly to make the techique practical i a distributed system, but also to make it effective i a streamig settig. As expected, icreasig the umber of workers also icreases the average imbalace. The behavior of the system is biary: either well balaced or largely imbalaced. The trasitio betwee the two states happes whe the umber of workers surpasses the limit O(1/p 1 ) described i Sectio IV, which happes aroud 50 workers for WP ad 100 for TW. Q2. Give the aforemetioed results, we focus our attetio o PKG heceforth. So far, it still uses global iformatio about the load of the workers whe decidig which choice to make. Next, we experimet with local estimatio, i.e., each source performs its ow estimatio of the worker load, based o the sub-stream processed so far. We cosider the followig alteratives: G: PKG with global iformatio of worker load. L: PKG with local estimatio of worker load ad differet umber of sources, e.g., L 5 deotes S = 5. LP: PKG with local estimatio ad periodic probig of worker load every T p miutes. For istace, L 5 P 1 deotes S = 5 ad T p = 1. Whe probig is executed, the local estimate vector is set to the actual load of the workers. Figure 2 shows the average imbalace (ormalized to the size of the dataset) with differet techiques, for differet umber of sources ad workers, ad for several datasets. The baselie (H) always imposes very high load imbalace o the workers. Coversely, PKG with local estimatio (L) has always a lower imbalace. Furthermore, the differece from the global variat (G) is always less tha oe order of magitude. Fially, this result is robust to chages i the umber of sources. Figure 3 displays the imbalace of the system through time I(t) for TW, WP ad CT, 5 sources, ad for W = 10 ad 50. Results for W = 5 ad W = 100 are omitted as they are similar to W = 10 ad W = 50, respectively. PKG with global iformatio (G) ad its variat with local estimatio (L 5 ) perform best. Iterestigly, eve though both G ad L achieve very good load balace, their choices are quite differet. I a experimet measurig the agreemet o the destiatio of each message, G ad L have oly 47% Jaccard overlap. Hece, L reaches a local miimum which is very close i value to the oe obtaied by G, although differet. Also i this case, good balace ca oly be achieved up to a umber of workers that depeds o the dataset. Whe that umber is exceeded, the imbalace icreases rapidly, as see i the cases of WP ad partially for CT for W = 50, where all techiques lead to the same high load imbalace.

9 Fractio of Imbalace workers TW WP CT hours Fractio of Imbalace workers TW WP G L L 5 P hours Fig. 3: Fractio of imbalace through time for differet datasets, techiques, ad umber of workers, with S = 5. Fially, we compare our local estimatio strategy with a variat that makes use of periodic probig of workers load every miute (L 5 P 1 ). Probig removes ay icosistecy i the load estimates that the sources may have accumulated. However, iterestigly, this techique does ot improve the load balace, as show i Figure 3. Eve icreasig the frequecy of probig does ot reduce imbalace (results ot show i the figure for clarity). I coclusio, local iformatio is sufficiet to obtai good load balace, therefore it is ot ecessary to icur the overhead of probig. Q3. To operatioalize this questio, we use the directed graphs datasets. We use KG to distribute the messages to the sources to test the robustess of PKG to skew i the sources, i.e., whe each source forwards a ueve part of the stream. We simulate a simple applicatio that computes a fuctio of the icomig edges of a vertex (e.g., i-degree, PageRak). The iput keys for the source PE is the source vertex id, while the key set to the worker PE is the destiatio vertex id, that is, the source PE iverts the edge. This schema projects the outdegree distributio of the graph o sources, ad the i-degree distributio o workers, both of which are highly skewed. Figure 4 shows the average imbalace for the experimets with a skewed split of the keys to sources for the LJ social graph (results o SL 1 ad SL 2 are similar to LJ ad are omitted due to space costrait). For compariso, we iclude the results whe the split is performed uiformly usig shuffle groupig of keys o sources. O average, the imbalace geerated by the skew o sources is similar to the oe obtaied with uiform splittig. As expected, the imbalace slightly icreases as the umber of sources ad workers icrease, but, i geeral, it remais at very low absolute values. CT Fractio of Imbalace Uiform L 5 Skewed L 5 Uiform L 10 Skewed L 10 Uiform L 15 Skewed L 15 Uiform L 20 Skewed L workers Fig. 4: Fractio of average imbalace with uiform ad skewed splittig of the iput keys o the sources whe usig the LJ graph. To aswer Q3, we additioally experimet with drift i the skew distributio by usig the cashtag dataset (CT). The bottom row of Figure 3 demostrates that all techiques achieve a low imbalace, eve though the chage of key popularity through time geerates occasioal spikes. I coclusio, PKG is robust to skew o the sources, ad ca therefore be chaied to key groupig. It is also robust to the drift i key distributio commo of may real-world streams. Q4. We implemet ad test our techique o the streamig top-k word cout example, ad perform two experimets to compare PKG, KG, ad SG o WP. We choose word cout as it is oe of the simplest possible examples, thus limitig the umber of cofoudig factors. It is also represetative of may data miig algorithms as the oes described i Sectio VI (e.g., coutig frequet items or co-occurreces of feature-class pairs). Due to the requiremet of real-world deploymet o a DSPE, we igore techiques that require coordiatio (i.e., PoTC ad O-Greedy). We use a topology cofiguratio of a sigle source alog with 9 workers (couters) ruig o a storm cluster of 10 virtual servers. We report overall throughput, ed-to-ed latecy, ad memory usage. I the first experimet, we emulate differet levels of CPU cosumptio per key by addig a fixed delay to the processig. We prefer this solutio over implemetig a specific applicatio i order to be able to cotrol the load o the workers. We choose a rage that is able to brig our cofiguratio to a saturatio poit, although the raw umbers would vary for differet setups. Eve though real deploymets rarely operate at saturatio poit, PKG allows better resource utilizatio, therefore supportig the same workload o a smaller umber of machies. I this case, the miimum delay (0.1ms) correspods approximately to readig 400kB sequetially from memory, while the maximum delay (1ms) to 1 10-th of a disk seek. 9 Nevertheless, eve more expesive tasks exist: parsig a setece with NLP tools ca take up to 500ms. 10 The system does ot perform aggregatio i this setup, as we are oly iterested i the raw effect o the workers. Figure 5(a) shows the throughput achieved whe varyig the CPU delay for the three partitioig strategies. Regardless of the delay, SG ad PKG perform similarly, ad their throughput is higher tha KG. The throughput of KG is reduced by 60% 9 perf.html 10

10 Throughput (keys/s) PKG SG KG (a) CPU delay (ms) s 300s 600s 300s 600s 30s PKG SG s KG (b) Memory (couters) Fig. 5: (a) Throughput for PKG, SG ad KG for differet CPU delays. (b) Throughput for PKG ad SG vs. average memory for differet aggregatio periods. whe the CPU delay icreases tefold, while the impact o PKG ad SG is smaller ( 37% decrease). We deduce that reducig the imbalace is critical for clusters operatig close to their saturatio poit, ad that PKG is able to hadle bottleecks similarly to SG ad better tha KG. I additio, the imbalace geerated by KG traslates ito loger latecies for the applicatio. Whe the workers are heavily loaded, the average latecy with KG is up to 45% larger tha with PKG. Fially, the beefits of PKG over SG regardig memory are substatial. Overall, PKG (3.6M couters) requires about 30% more memory tha KG (2.9M couters), but about half the memory of SG (7.2M couters). I the secod experimet, we fix the CPU delay to 0.4ms per key, as it is the saturatio poit for KG i our setup. We activate the aggregatio of couters at differet time itervals T to emulate differet applicatio policies for whe to receive up-to-date top-k word couts. I this case, PKG ad SG eed additioal memory compared to KG to keep partial couters. Shorter aggregatio periods reduce the memory requiremets, as partial couters are flushed ofte, at the cost of a higher umber of aggregatio messages. Figure 5(b) shows the relatioship betwee throughput ad memory overhead for PKG ad SG. The throughput of KG is show for compariso. For all values of aggregatio period, PKG achieves higher throughput tha SG, with lower memory overhead ad similar average latecy per message. Whe the aggregatio period is above 30s, the beefits of PKG compesate its extra overhead ad its overall throughput is higher tha whe usig KG. VI. APPLICATIONS PKG is a ovel programmig primitive for stream partitioig ad ot every algorithm ca be expressed with it. I geeral, all algorithms that use shuffle groupig ca use PKG to reduce their memory footprit. I additio, may algorithms expressed via key groupig ca be rewritte to use PKG i order to get better load balacig. I this sectio we provide a few such examples of commo data miig algorithms, ad show the advatages of PKG. Heceforth, we assume that each message cotais a data poit for the applicatio, e.g., a feature vector i a high-dimesioal space. A. Naïve Bayes Classifier A aïve Bayes classifier is a probabilistic model that assumes idepedece of features. It estimates the probability of a class C give a feature vector X by usig Bayes theorem. I practice, the classifier works by coutig the frequecy of co-occurrece of each feature ad class values. The simplest way to parallelize this algorithm is to spread the couters across several workers via vertical parallelism, i.e., each feature is tracked idepedetly i parallel. Followig this desig, the algorithm ca be implemeted by the same patter used for the KG example i Sectio II-A. Sparse datasets ofte have a skewed distributio of features, e.g., for text classificatio. Therefore, this implemetatio suffers from the same load imbalace, which PKG solves. Horizotal parallelism ca also be used to parallelize the algorithm, i.e., by shufflig messages to separate workers. This implemetatio uses the same patter as the DAG i the SG example i Sectio II-A. The cout for a sigle featureclass pair is distributed across several workers, ad eeds to be combied at predictio (query) time. This combiatio requires broadcastig the query to all the workers, as a feature ca be tracked by ay worker. This implemetatio, while balacig the work better tha key groupig, requires a expesive query stage that may be affected by stragglers. PKG tracks each feature o two workers ad avoids replicatig couters o all workers. Furthermore, the two workers are determiistically assiged for each feature. Thus, at query time, the algorithm eeds to probe oly two workers for each feature, rather tha havig to broadcast it to all the workers. The resultig query phase is less expesive ad less sesitive to stragglers tha with shuffle groupig. B. Streamig Parallel Decisio Tree A decisio tree is a classificatio algorithm that uses a treelike model where odes are tests o features, braches are possible outcomes, ad leafs are class assigmets. Be-Haim ad Tom-Tov [1] propose a algorithm to build a streamig parallel decisio tree that uses approximated histograms to fid the test value for cotiuous features. Messages are shuffled amog W workers. Each worker geerates histograms idepedetly for its sub-stream, oe histogram for each feature-class-leaf triplet. These histograms are the periodically set to a sigle aggregator that merges them to get a approximated histogram for the whole stream. The aggregator uses this fial histogram to grow the model by takig split decisios for the curret leaves i the tree. Overall, the algorithm keeps W D C L histograms, where D is the umber of features, C is the umber of classes, ad L is the curret umber of leaves. The memory footprit of the algorithm depeds o W, so it is impossible to fit larger models by icreasig the parallelism. Moreover, the aggregator eeds to merge W D C histograms each time a split decisio is tried, ad mergig the histograms is oe of the most expesive operatios. Istead, PKG reduces both the space complexity ad aggregatio cost. If applied o the features of each message, a sigle

The Power of Both Choices: Practical Load Balancing for Distributed Stream Processing Engines

The Power of Both Choices: Practical Load Balancing for Distributed Stream Processing Engines The Power of Both Choices: Practical Load Balacig for Distributed Stream Processig Egies Muhammad Ais Uddi Nasir #1, Giamarco De Fracisci Morales 2, David García-Soriao 3 Nicolas Kourtellis 4, Marco Serafii

More information

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria

More information

Domain 1: Designing a SQL Server Instance and a Database Solution

Domain 1: Designing a SQL Server Instance and a Database Solution Maual SQL Server 2008 Desig, Optimize ad Maitai (70-450) 1-800-418-6789 Domai 1: Desigig a SQL Server Istace ad a Database Solutio Desigig for CPU, Memory ad Storage Capacity Requiremets Whe desigig a

More information

5 Boolean Decision Trees (February 11)

5 Boolean Decision Trees (February 11) 5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected

More information

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

I. Chi-squared Distributions

I. Chi-squared Distributions 1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may

More information

Chapter 7 Methods of Finding Estimators

Chapter 7 Methods of Finding Estimators Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

Incremental calculation of weighted mean and variance

Incremental calculation of weighted mean and variance Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically

More information

LECTURE 13: Cross-validation

LECTURE 13: Cross-validation LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M

More information

Department of Computer Science, University of Otago

Department of Computer Science, University of Otago Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly

More information

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis Ruig Time ( 3.) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

Lecture 2: Karger s Min Cut Algorithm

Lecture 2: Karger s Min Cut Algorithm priceto uiv. F 3 cos 5: Advaced Algorithm Desig Lecture : Karger s Mi Cut Algorithm Lecturer: Sajeev Arora Scribe:Sajeev Today s topic is simple but gorgeous: Karger s mi cut algorithm ad its extesio.

More information

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature. Itegrated Productio ad Ivetory Cotrol System MRP ad MRP II Framework of Maufacturig System Ivetory cotrol, productio schedulig, capacity plaig ad fiacial ad busiess decisios i a productio system are iterrelated.

More information

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

Properties of MLE: consistency, asymptotic normality. Fisher information.

Properties of MLE: consistency, asymptotic normality. Fisher information. Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout

More information

CHAPTER 3 DIGITAL CODING OF SIGNALS

CHAPTER 3 DIGITAL CODING OF SIGNALS CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity

More information

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5

More information

1 Computing the Standard Deviation of Sample Means

1 Computing the Standard Deviation of Sample Means Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.

More information

Asymptotic Growth of Functions

Asymptotic Growth of Functions CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll

More information

Tradigms of Astundithi and Toyota

Tradigms of Astundithi and Toyota Tradig the radomess - Desigig a optimal tradig strategy uder a drifted radom walk price model Yuao Wu Math 20 Project Paper Professor Zachary Hamaker Abstract: I this paper the author iteds to explore

More information

Determining the sample size

Determining the sample size Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors

More information

Hypothesis testing. Null and alternative hypotheses

Hypothesis testing. Null and alternative hypotheses Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows:

Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows: Subettig Subettig is used to subdivide a sigle class of etwork i to multiple smaller etworks. Example: Your orgaizatio has a Class B IP address of 166.144.0.0 Before you implemet subettig, the Network

More information

The Stable Marriage Problem

The Stable Marriage Problem The Stable Marriage Problem William Hut Lae Departmet of Computer Sciece ad Electrical Egieerig, West Virgiia Uiversity, Morgatow, WV William.Hut@mail.wvu.edu 1 Itroductio Imagie you are a matchmaker,

More information

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009) 18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the Bru-Mikowski iequality for boxes. Today we ll go over the

More information

The Forgotten Middle. research readiness results. Executive Summary

The Forgotten Middle. research readiness results. Executive Summary The Forgotte Middle Esurig that All Studets Are o Target for College ad Career Readiess before High School Executive Summary Today, college readiess also meas career readiess. While ot every high school

More information

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments Project Deliverables CS 361, Lecture 28 Jared Saia Uiversity of New Mexico Each Group should tur i oe group project cosistig of: About 6-12 pages of text (ca be loger with appedix) 6-12 figures (please

More information

Designing Incentives for Online Question and Answer Forums

Designing Incentives for Online Question and Answer Forums Desigig Icetives for Olie Questio ad Aswer Forums Shaili Jai School of Egieerig ad Applied Scieces Harvard Uiversity Cambridge, MA 0238 USA shailij@eecs.harvard.edu Yilig Che School of Egieerig ad Applied

More information

Sequences and Series

Sequences and Series CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

More information

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee

More information

On the Capacity of Hybrid Wireless Networks

On the Capacity of Hybrid Wireless Networks O the Capacity of Hybrid ireless Networks Beyua Liu,ZheLiu +,DoTowsley Departmet of Computer Sciece Uiversity of Massachusetts Amherst, MA 0002 + IBM T.J. atso Research Ceter P.O. Box 704 Yorktow Heights,

More information

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical

More information

ODBC. Getting Started With Sage Timberline Office ODBC

ODBC. Getting Started With Sage Timberline Office ODBC ODBC Gettig Started With Sage Timberlie Office ODBC NOTICE This documet ad the Sage Timberlie Office software may be used oly i accordace with the accompayig Sage Timberlie Office Ed User Licese Agreemet.

More information

Load Balancing in Stream Processing Engines. Anis Nasir Coauthors: Gianmarco, Nicolas, David, Marco

Load Balancing in Stream Processing Engines. Anis Nasir Coauthors: Gianmarco, Nicolas, David, Marco Engines Anis Nasir Coauthors: Gianmarco, Nicolas, David, Marco Stream Processing Engines Online Machine Learning Real Time Query Processing ConCnuous ComputaCon Distributed RPC 2 Stream Processing Engines

More information

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx SAMPLE QUESTIONS FOR FINAL EXAM REAL ANALYSIS I FALL 006 3 4 Fid the followig usig the defiitio of the Riema itegral: a 0 x + dx 3 Cosider the partitio P x 0 3, x 3 +, x 3 +,......, x 3 3 + 3 of the iterval

More information

Chatpun Khamyat Department of Industrial Engineering, Kasetsart University, Bangkok, Thailand ocpky@hotmail.com

Chatpun Khamyat Department of Industrial Engineering, Kasetsart University, Bangkok, Thailand ocpky@hotmail.com SOLVING THE OIL DELIVERY TRUCKS ROUTING PROBLEM WITH MODIFY MULTI-TRAVELING SALESMAN PROBLEM APPROACH CASE STUDY: THE SME'S OIL LOGISTIC COMPANY IN BANGKOK THAILAND Chatpu Khamyat Departmet of Idustrial

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

5: Introduction to Estimation

5: Introduction to Estimation 5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample

More information

Optimize your Network. In the Courier, Express and Parcel market ADDING CREDIBILITY

Optimize your Network. In the Courier, Express and Parcel market ADDING CREDIBILITY Optimize your Network I the Courier, Express ad Parcel market ADDING CREDIBILITY Meetig today s challeges ad tomorrow s demads Aswers to your key etwork challeges ORTEC kows the highly competitive Courier,

More information

INVESTMENT PERFORMANCE COUNCIL (IPC)

INVESTMENT PERFORMANCE COUNCIL (IPC) INVESTMENT PEFOMANCE COUNCIL (IPC) INVITATION TO COMMENT: Global Ivestmet Performace Stadards (GIPS ) Guidace Statemet o Calculatio Methodology The Associatio for Ivestmet Maagemet ad esearch (AIM) seeks

More information

Universal coding for classes of sources

Universal coding for classes of sources Coexios module: m46228 Uiversal codig for classes of sources Dever Greee This work is produced by The Coexios Project ad licesed uder the Creative Commos Attributio Licese We have discussed several parametric

More information

(VCP-310) 1-800-418-6789

(VCP-310) 1-800-418-6789 Maual VMware Lesso 1: Uderstadig the VMware Product Lie I this lesso, you will first lear what virtualizatio is. Next, you ll explore the products offered by VMware that provide virtualizatio services.

More information

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return EVALUATING ALTERNATIVE CAPITAL INVESTMENT PROGRAMS By Ke D. Duft, Extesio Ecoomist I the March 98 issue of this publicatio we reviewed the procedure by which a capital ivestmet project was assessed. The

More information

A probabilistic proof of a binomial identity

A probabilistic proof of a binomial identity A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two

More information

CHAPTER 3 THE TIME VALUE OF MONEY

CHAPTER 3 THE TIME VALUE OF MONEY CHAPTER 3 THE TIME VALUE OF MONEY OVERVIEW A dollar i the had today is worth more tha a dollar to be received i the future because, if you had it ow, you could ivest that dollar ad ear iterest. Of all

More information

The Power of Free Branching in a General Model of Backtracking and Dynamic Programming Algorithms

The Power of Free Branching in a General Model of Backtracking and Dynamic Programming Algorithms The Power of Free Brachig i a Geeral Model of Backtrackig ad Dyamic Programmig Algorithms SASHKA DAVIS IDA/Ceter for Computig Scieces Bowie, MD sashka.davis@gmail.com RUSSELL IMPAGLIAZZO Dept. of Computer

More information

1 Correlation and Regression Analysis

1 Correlation and Regression Analysis 1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio

More information

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S CONTROL CHART FOR THE CHANGES IN A PROCESS Supraee Lisawadi Departmet of Mathematics ad Statistics, Faculty of Sciece ad Techoology, Thammasat

More information

Perfect Packing Theorems and the Average-Case Behavior of Optimal and Online Bin Packing

Perfect Packing Theorems and the Average-Case Behavior of Optimal and Online Bin Packing SIAM REVIEW Vol. 44, No. 1, pp. 95 108 c 2002 Society for Idustrial ad Applied Mathematics Perfect Packig Theorems ad the Average-Case Behavior of Optimal ad Olie Bi Packig E. G. Coffma, Jr. C. Courcoubetis

More information

1. C. The formula for the confidence interval for a population mean is: x t, which was

1. C. The formula for the confidence interval for a population mean is: x t, which was s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : p-value

More information

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy

More information

Confidence Intervals for One Mean

Confidence Intervals for One Mean Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a

More information

Statistical inference: example 1. Inferential Statistics

Statistical inference: example 1. Inferential Statistics Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either

More information

Domain 1 - Describe Cisco VoIP Implementations

Domain 1 - Describe Cisco VoIP Implementations Maual ONT (642-8) 1-800-418-6789 Domai 1 - Describe Cisco VoIP Implemetatios Advatages of VoIP Over Traditioal Switches Voice over IP etworks have may advatages over traditioal circuit switched voice etworks.

More information

Load Balancing for Distributed Stream Processing Engines. Muhammad Anis Uddin Nasir EMDC 2011-13

Load Balancing for Distributed Stream Processing Engines. Muhammad Anis Uddin Nasir EMDC 2011-13 Load Balancing for Distributed Stream Processing Engines Muhammad Anis Uddin Nasir EMDC 011-13 About me Ex EMDC from Batch 011 (the party batch) Currently PhD Student at KTH Royal Institute of Technology

More information

CS103X: Discrete Structures Homework 4 Solutions

CS103X: Discrete Structures Homework 4 Solutions CS103X: Discrete Structures Homewor 4 Solutios Due February 22, 2008 Exercise 1 10 poits. Silico Valley questios: a How may possible six-figure salaries i whole dollar amouts are there that cotai at least

More information

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology Adoptio Date: 4 March 2004 Effective Date: 1 Jue 2004 Retroactive Applicatio: No Public Commet Period: Aug Nov 2002 INVESTMENT PERFORMANCE COUNCIL (IPC) Preface Guidace Statemet o Calculatio Methodology

More information

MTO-MTS Production Systems in Supply Chains

MTO-MTS Production Systems in Supply Chains NSF GRANT #0092854 NSF PROGRAM NAME: MES/OR MTO-MTS Productio Systems i Supply Chais Philip M. Kamisky Uiversity of Califoria, Berkeley Our Kaya Uiversity of Califoria, Berkeley Abstract: Icreasig cost

More information

Hypergeometric Distributions

Hypergeometric Distributions 7.4 Hypergeometric Distributios Whe choosig the startig lie-up for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you

More information

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth Questio 1: What is a ordiary auity? Let s look at a ordiary auity that is certai ad simple. By this, we mea a auity over a fixed term whose paymet period matches the iterest coversio period. Additioally,

More information

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5 Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.

More information

Security Functions and Purposes of Network Devices and Technologies (SY0-301) 1-800-418-6789. Firewalls. Audiobooks

Security Functions and Purposes of Network Devices and Technologies (SY0-301) 1-800-418-6789. Firewalls. Audiobooks Maual Security+ Domai 1 Network Security Every etwork is uique, ad architecturally defied physically by its equipmet ad coectios, ad logically through the applicatios, services, ad idustries it serves.

More information

Domain 1: Identifying Cause of and Resolving Desktop Application Issues Identifying and Resolving New Software Installation Issues

Domain 1: Identifying Cause of and Resolving Desktop Application Issues Identifying and Resolving New Software Installation Issues Maual Widows 7 Eterprise Desktop Support Techicia (70-685) 1-800-418-6789 Domai 1: Idetifyig Cause of ad Resolvig Desktop Applicatio Issues Idetifyig ad Resolvig New Software Istallatio Issues This sectio

More information

Infinite Sequences and Series

Infinite Sequences and Series CHAPTER 4 Ifiite Sequeces ad Series 4.1. Sequeces A sequece is a ifiite ordered list of umbers, for example the sequece of odd positive itegers: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29...

More information

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Case Study. Normal and t Distributions. Density Plot. Normal Distributions Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca

More information

Chapter 7: Confidence Interval and Sample Size

Chapter 7: Confidence Interval and Sample Size Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum

More information

The Fundamental Capacity-Delay Tradeoff in Large Mobile Ad Hoc Networks

The Fundamental Capacity-Delay Tradeoff in Large Mobile Ad Hoc Networks The Fudametal Capacity-Delay Tradeoff i Large Mobile Ad Hoc Networks Xiaoju Li ad Ness B. Shroff School of Electrical ad Computer Egieerig, Purdue Uiversity West Lafayette, IN 47907, U.S.A. {lix, shroff}@ec.purdue.edu

More information

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means) CHAPTER 7: Cetral Limit Theorem: CLT for Averages (Meas) X = the umber obtaied whe rollig oe six sided die oce. If we roll a six sided die oce, the mea of the probability distributio is X P(X = x) Simulatio:

More information

Capacity of Wireless Networks with Heterogeneous Traffic

Capacity of Wireless Networks with Heterogeneous Traffic Capacity of Wireless Networks with Heterogeeous Traffic Migyue Ji, Zheg Wag, Hamid R. Sadjadpour, J.J. Garcia-Lua-Aceves Departmet of Electrical Egieerig ad Computer Egieerig Uiversity of Califoria, Sata

More information

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics Chair for Network Architectures ad Services Istitute of Iformatics TU Müche Prof. Carle Network Security Chapter 2 Basics 2.4 Radom Number Geeratio for Cryptographic Protocols Motivatio It is crucial to

More information

Domain 1: Configuring Domain Name System (DNS) for Active Directory

Domain 1: Configuring Domain Name System (DNS) for Active Directory Maual Widows Domai 1: Cofigurig Domai Name System (DNS) for Active Directory Cofigure zoes I Domai Name System (DNS), a DNS amespace ca be divided ito zoes. The zoes store ame iformatio about oe or more

More information

Maximum Likelihood Estimators.

Maximum Likelihood Estimators. Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio

More information

Subject CT5 Contingencies Core Technical Syllabus

Subject CT5 Contingencies Core Technical Syllabus Subject CT5 Cotigecies Core Techical Syllabus for the 2015 exams 1 Jue 2014 Aim The aim of the Cotigecies subject is to provide a groudig i the mathematical techiques which ca be used to model ad value

More information

France caters to innovative companies and offers the best research tax credit in Europe

France caters to innovative companies and offers the best research tax credit in Europe 1/5 The Frech Govermet has three objectives : > improve Frace s fiscal competitiveess > cosolidate R&D activities > make Frace a attractive coutry for iovatio Tax icetives have become a key elemet of public

More information

Professional Networking

Professional Networking Professioal Networkig 1. Lear from people who ve bee where you are. Oe of your best resources for etworkig is alumi from your school. They ve take the classes you have take, they have bee o the job market

More information

THE HEIGHT OF q-binary SEARCH TREES

THE HEIGHT OF q-binary SEARCH TREES THE HEIGHT OF q-binary SEARCH TREES MICHAEL DRMOTA AND HELMUT PRODINGER Abstract. q biary search trees are obtaied from words, equipped with the geometric distributio istead of permutatios. The average

More information

MARTINGALES AND A BASIC APPLICATION

MARTINGALES AND A BASIC APPLICATION MARTINGALES AND A BASIC APPLICATION TURNER SMITH Abstract. This paper will develop the measure-theoretic approach to probability i order to preset the defiitio of martigales. From there we will apply this

More information

Class Meeting # 16: The Fourier Transform on R n

Class Meeting # 16: The Fourier Transform on R n MATH 18.152 COUSE NOTES - CLASS MEETING # 16 18.152 Itroductio to PDEs, Fall 2011 Professor: Jared Speck Class Meetig # 16: The Fourier Trasform o 1. Itroductio to the Fourier Trasform Earlier i the course,

More information

MapReduce Based Implementation of Aggregate Functions on Cassandra

MapReduce Based Implementation of Aggregate Functions on Cassandra Iteratioal Joural of Electroics Commuicatio ad Couter Techology (IJECCT) MapReduce Based Ilemetatio of Aggregate Fuctios o Cassadra Aseh Daesh Arasteh Departmet of Couter ad IT Islamic Azad Uiversity of

More information

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k. 18.409 A Algorithmist s Toolkit September 17, 009 Lecture 3 Lecturer: Joatha Keler Scribe: Adre Wibisoo 1 Outlie Today s lecture covers three mai parts: Courat-Fischer formula ad Rayleigh quotiets The

More information

Automatic Tuning for FOREX Trading System Using Fuzzy Time Series

Automatic Tuning for FOREX Trading System Using Fuzzy Time Series utomatic Tuig for FOREX Tradig System Usig Fuzzy Time Series Kraimo Maeesilp ad Pitihate Soorasa bstract Efficiecy of the automatic currecy tradig system is time depedet due to usig fixed parameters which

More information

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection The aalysis of the Courot oligopoly model cosiderig the subjective motive i the strategy selectio Shigehito Furuyama Teruhisa Nakai Departmet of Systems Maagemet Egieerig Faculty of Egieerig Kasai Uiversity

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. This documet was writte ad copyrighted by Paul Dawkis. Use of this documet ad its olie versio is govered by the Terms ad Coditios of Use located at http://tutorial.math.lamar.edu/terms.asp. The olie versio

More information

Quadrat Sampling in Population Ecology

Quadrat Sampling in Population Ecology Quadrat Samplig i Populatio Ecology Backgroud Estimatig the abudace of orgaisms. Ecology is ofte referred to as the "study of distributio ad abudace". This beig true, we would ofte like to kow how may

More information

Simple Efficient Load Balancing Algorithms for Peer-to-Peer Systems

Simple Efficient Load Balancing Algorithms for Peer-to-Peer Systems Simple Efficiet Load Balacig Algorithms for Peer-to-Peer Systems David R. Karger MIT Computer Sciece ad Artificial Itelligece Laboratory Cambridge, MA 0139, USA karger@csail.mit.edu Matthias Ruhl IBM Almade

More information

A Secure Implementation of Java Inner Classes

A Secure Implementation of Java Inner Classes A Secure Implemetatio of Java Ier Classes By Aasua Bhowmik ad William Pugh Departmet of Computer Sciece Uiversity of Marylad More ifo at: http://www.cs.umd.edu/~pugh/java Motivatio ad Overview Preset implemetatio

More information

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here). BEGINNING ALGEBRA Roots ad Radicals (revised summer, 00 Olso) Packet to Supplemet the Curret Textbook - Part Review of Square Roots & Irratioals (This portio ca be ay time before Part ad should mostly

More information

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork Solutios to Selected Problems I: Patter Classificatio by Duda, Hart, Stork Joh L. Weatherwax February 4, 008 Problem Solutios Chapter Bayesia Decisio Theory Problem radomized rules Part a: Let Rx be the

More information

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Istructor: Nicolas Christou Three importat distributios: Distributios related to the ormal distributio Chi-square (χ ) distributio.

More information

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics

More information

Configuring Additional Active Directory Server Roles

Configuring Additional Active Directory Server Roles Maual Upgradig your MCSE o Server 2003 to Server 2008 (70-649) 1-800-418-6789 Cofigurig Additioal Active Directory Server Roles Active Directory Lightweight Directory Services Backgroud ad Cofiguratio

More information

THE ABRACADABRA PROBLEM

THE ABRACADABRA PROBLEM THE ABRACADABRA PROBLEM FRANCESCO CARAVENNA Abstract. We preset a detailed solutio of Exercise E0.6 i [Wil9]: i a radom sequece of letters, draw idepedetly ad uiformly from the Eglish alphabet, the expected

More information

Notes on exponential generating functions and structures.

Notes on exponential generating functions and structures. Notes o expoetial geeratig fuctios ad structures. 1. The cocept of a structure. Cosider the followig coutig problems: (1) to fid for each the umber of partitios of a -elemet set, (2) to fid for each the

More information

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find 1.8 Approximatig Area uder a curve with rectagles 1.6 To fid the area uder a curve we approximate the area usig rectagles ad the use limits to fid 1.4 the area. Example 1 Suppose we wat to estimate 1.

More information

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations CS3A Hadout 3 Witer 00 February, 00 Solvig Recurrece Relatios Itroductio A wide variety of recurrece problems occur i models. Some of these recurrece relatios ca be solved usig iteratio or some other ad

More information

How to use what you OWN to reduce what you OWE

How to use what you OWN to reduce what you OWE How to use what you OWN to reduce what you OWE Maulife Oe A Overview Most Caadias maage their fiaces by doig two thigs: 1. Depositig their icome ad other short-term assets ito chequig ad savigs accouts.

More information