Sketching Sampled Data Streams

Size: px
Start display at page:

Download "Sketching Sampled Data Streams"

Transcription

1 Sketchng Sampled Data Streams Florn Rusu, Aln Dobra CISE Department Unversty of Florda Ganesvlle, FL, USA Abstract Samplng s used as a unversal method to reduce the runnng tme of computatons the computaton s performed on a much smaller sample and then the result s scaled to compensate for the dfference n sze. Sketches are a popular approxmaton method for data streams and they proved to be useful for estmatng frequency moments and aggregates over jons. A possblty to further mprove the tme performance of sketches s to compute the sketch over a sample of the stream rather than the entre data stream. In ths paper we analyze the behavor of the sketch estmator when computed over a sample of the stream, not the entre data stream, for the sze of jon and the self-jon sze problems. Our analyss s developed for a generc samplng process. We nstantate the results of the analyss for all three major types of samplng Bernoull samplng whch s used for load sheddng, samplng wth replacement whch s used to generate..d. samples from a dstrbuton, and samplng wthout replacement whch s used by onlne aggregaton engnes and compare these partcular results wth the results of the basc sketch estmator. Our expermental results show that the accuracy of the sketch computed over a small sample of the data s, n general, close to the accuracy of the sketch estmator computed over the entre data even when the sample sze s only 0% or less of the dataset sze. Ths s equvalent to a speed-up factor of at least 0 when updatng the sketch. I. INTRODUCTION Data streamng has receved a lot of attenton from the research communty n the last decade. The requrement to process fast data streams motvates the need for approxmaton methods that make use of both small space and small tme. AGMS sketches, and ther mproved varant F-AGMS sketches 3, 4 proved to be a vable soluton for estmatng aggregates over jons. The man strengths of the sketchng technques are the smple and fast update procedure, the small memory requrement, and provable error guarantees. When the data streams that need to be processed are extremely fast, for example n the case of networkng data or large datasets streamed over the Internet, t s desrable to further reduce the update tme of sketches n order to acheve the requred processng rates. Samplng s a unversal method for data reducton and, n prncple, t can be used to reduce the amount of data that needs to be sketched. If samples are sketched nstead of the orgnal data, an mmedate update tme reducton results. Ths s smlar to the exstng load sheddng technques employed n data stream processng engnes 5. The man concern when samples rather than the orgnal data are sketched s how to extend the error guarantees sketches provde to ths new stuaton. The formulas resultng from such an analyss could be used to determne how aggressve the load sheddng can be wthout a sgnfcant loss n the accuracy of the sketch over samples estmator. A seemngly unrelated, but, as shown n the paper, techncally related, problem s analyzng streams of samples from unknown dstrbutons. Samples from unknown dstrbutons the so called..d. samples are the nput to most of the onlne data-mnng algorthms 6. In ths case the samples are not used as a data reducton technque, but rather they are the only nformaton avalable about the unknown dstrbuton. A fundamental problem n ths context s how to characterze the unknown dstrbuton usng only the samples. Ths s one of the fundamental problems n statstcs 7. If the samples are streamed, as s the case n onlne data-mnng, the am s to characterze the unknown dstrbuton by usng small space only, thus makng sketches a natural canddate for computatons that nvolve aggregates. It s a smple matter to use sketches n order to estmate aggregates over the samples. If predctons about the unknown dstrbuton need to be made, the problem s sgnfcantly more dffcult. Interestngly, ths problem s mathematcally smlar to the load sheddng problem n whch samplng s used to reduce the update tme of sketchng. A thrd problem s sketchng tuples that are processed by an onlne aggregaton engne n order to compute statstcs useful for decson makng 8, 9. In ths paper we analyze the sketch over samples estmator for generc samplng. Then we nstantate the results for three dfferent types of samplng. Our techncal contrbutons are: We provde a generc analyss of the sketch over samples estmator. The analyss conssts n expressng the frst two frequency moments of the estmator n terms of the moments of the samplng frequency random varables. We nstantate the results for sketchng Bernoull samples. Ths mmedately ndcates how random load sheddng for sketchng data streams behaves. We nstantate the generc analyss for sketchng samples wth replacement from a large populaton. The analyss generalzes to sketchng..d. samples from an unknown dstrbuton. The ablty to sketch..d. data s mportant f sketches are to be used for data-mnng applcatons. We nstantate the generc analyss for sketchng samples wthout replacement. Such samples are processed by onlne aggregaton engnes. By sketchng the samples,

2 mportant statstcs can be derved wth lttle computatonal overhead. We present emprcal evdence that the analyss s necessary snce the error of the sketch over samples estmator s not smply the sum of the errors of the two ndvdual estmators. The nteracton, whch s predcted by the analyss, plays a major role. The experments also pont out that n the majorty of the cases a 0% sample results n mnmal error degradaton the sketchng of streams can thus be sped-up by a factor of 0. In the rest of the paper, we ntroduce the formal problem n Secton II. Secton III gves an overvew on samplng whle Secton IV ntroduces sketches. The formal analyss of the combned sketch over samples estmator s detaled n Secton V. We dscuss possble applcatons of sketchng sampled data n Secton VI. The emprcal evaluaton of the combned estmator s presented n Secton VII, whle Secton VIII concludes the paper. Related Work There exsts a large body of work on approxmate query processng methods. The dea of combnng two estmators to captalze on the strengths of both s not new. F-AGMS sketches 3 are essentally a combnaton of random hstograms and AGMS sketches. 0 presents a method to buld ncremental hstograms from samples. To the best of our knowledge, sketchng and samplng have not been combned n a prncpled fashon before. The man dffculty n characterzng sketches over samples s the fact that the samplng analyss 8, s performed n the tuple doman whle the sketch analyss s performed n the frequency doman. Ths s the frst obstacle we overcome n ths paper. The work on sketchng probablstc data streams, 3 s somehow smlar to our work. The mportant dfference s the fact that samplng s part of the estmate n our work whle t represents only a way to nterpret the probablstc data n the related work. The results n do not characterze the sketch over sample estmator but approxmate the probablstc aggregates usng sketches. The only overlap n terms of analyss seems to be the computaton of the expected value of sketch over samples for the second frequency moment computaton n 3. 4 presents an alternatve method to mprove the sketchng rate of a data stream by determnstcally skppng stream tems. In our work the tems that are sketched are randomly selected through a samplng process. II. PRELIMINARIES The general problem we dscuss throughout the paper s how to approxmate the sze of jon of two relatons and the self-jon sze or second frequency moment of a relaton. Let F and G be two streamng relatons, each havng a sngle attrbute A, wth doman I. Furthermore, let f and g be the frequency of the value n F and G, respectvely. Wth ths, the sze of jon of relatons F and G can be wrtten as the dot-product of ther frequency vectors: F A G = f g () When relatons F and G are dentcal, the quantty f s known as the self-jon sze of F. In order to compute the sze of jon exactly, the frequency vectors of the two relatons have to be stored n full. Ths requres space proportonal to the doman of the jonng attrbute A, whch s nfeasble for large domans, e.g., I = 64. Thus, randomzed solutons wth reduced space requrements and provable error guarantees have been proposed. The standard technques 7, 5 to derve error guarantees or confdence ntervals for an estmate s to compute the expected value and the varance and then to use ether dstrbuton-ndependent bounds gven by Chernoff s and Chebyshev s nequaltes, or to use dstrbuton-dependent bounds. In the latter case, usually the Central Lmt Theorem or one of ts generalzatons s used to argue that the dstrbuton of the estmate has a partcular shape, and then error bounds based on the assumed dstrbuton wth the same expected value and varance are computed. In order to smplfy the exposton, we provde results n the form of expected values and varances throughout the paper. Actual error guarantees can be obtaned straghtforwardly usng the mentoned technques. III. SAMPLING Samplng as an approxmaton technque conssts n obtanng samples F and G from relatons F and G, respectvely, computng the sze of jon aggregate over the samples, and applyng a correcton to ensure that the samplng estmator s unbased. Ths method s generc and apples to all types of samplng. To smplfy the theoretcal exposton, we keep the treatment of samplng as generc as possble. In Secton II we expressed the sze of jon aggregate as a functon of f and g, the frequences of value of the jon attrbute n relatons F and G, respectvely. If we defne f and g to be the frequences of n F and G, respectvely, the sze of jon of the sample relatons s: F A G = f g () f and g are random varables that depend on the type of samplng and the parameters of the samplng process. Interestngly, a large part of the characterzaton of samplng can be carred out wthout specfyng the type of samplng. Ths s also true for sketches over samples n Secton V. A. Generc Samplng In general, F A G s not an unbased estmator for the sze of jon F A G. Fortunately, n the majorty of the cases, a constant correcton that scales for the dfference n sze between the samples and the orgnal relatons can be made to obtan an unbased estmator. If we defne the estmator as X = C f g, where C s the scalng factor, we can determne the value of C such that X s unbased. In

3 order to derve error bounds for the estmator, the expectaton E X and the varance X have to be computed. It turns out that expressons for E X and X can be wrtten for generc samplng n terms of the moments of the frequency random varables f and g. There are two dstnct cases that need separate treatment. The frst case s when relatons F and G are dfferent and the samples are obtaned ndependently from the two relatons. The second case s when F and G are dentcal, thus only one sample s avalable. Ths stuaton arses n the case of self-jon sze. When F and G are obtaned ndependently, the random varables f and g are also ndependent. Proposton (Sze of Jon): Let X = C f g be the estmator for the sze of jon defned over the generc samples F and G. Then: E X = C E f E g X = C E ( ) f f j E g g j E f E g j I When F and G are dentcal and only the sample F s avalable, the random varables f and g are also dentcal. Proposton (Self-Jon Sze): Let X = C f be the estmator for the self-jon sze defned over the generc sample F. Then the expectaton and the varance of X are gven by: E X = C E f X = C E ( f f j E f ) (4) j I A specfc type of samplng determnes the values of the expectatons that appear n the formulas. These expectatons are moments of the frequency random varables. They can be derved from the moment generatng functon correspondng to the samplng process. For the types of samplng we consder n ths paper the moment generatng functon s well known (see for example 6). Dervng fnal formulas for E X and X and determnng the constant C mght seem just a matter of pluggng n these quanttes for a specfc type of samplng, but the actual process s ntrcate because the frequency moments have to be separately computed before the algebrac manpulatons are carred out. The advantage of analyzng samplng n the frequency doman, as we do n ths secton, s that t allows the analyss to be extended to sketches over samples. Samplng estmators lke the ones consdered here have been analyzed n the publshed lterature (the theory n 9 provdes the analyss for all types of smple unform samplng and for an arbtrary number of relatons), but the analyss s n the tuple doman not the frequency doman. Interestngly, the analyss n the frequency doman s smpler than the analyss n the tuple (3) doman snce, as we show n ths paper, the frequency random varables have easly dentfable dstrbutons for whch the moment generatng functons are avalable. Dervng formulas for expected value and varance becomes just a matter of carryng the necessary algebrac manpulatons. We consder three types of samplng: Bernoull samplng, samplng wth replacement, and samplng wthout replacement. We derve the formulas for expectaton and varance as a functon of the samplng frequences both for samplng and for sketchng over samples (Secton V). Ths parallel treatment smplfes the nterpretaton of the complex results obtaned for sketches over samples. B. Bernoull Samplng When the samplng process s Bernoull, each tuple n F and G s selected ndependently n the sample F and G wth probablty p or q, 0 p, 0 q, respectvely. Then, f and g are ndependent bnomal random varables 6, f = Bnomal(f, p) and g = Bnomal(g, q), respectvely, wth expected values: E f = pf, E g = qg (5) The scalng factor for the sze of jon estmator s n ths case C = pq. The expectaton and the varance for Bernoull samplng can be derved n a straghtforward manner usng the frequency moments of the bnomal random varables correspondng to the samplng frequences. Proposton 3 (Sze of Jon): Let X = pq f g be the estmator for the sze of jon defned over the Bernoull samples F and G. Then the expectaton and the varance of X are gven by: E X = f g X = p f g + q p q f g + ( p)( q) pq f g The stuaton s more complcated for self-jon sze because the generc estmator X = C f has a bas that cannot be corrected by smply multplyng wth a scalng factor. Nevertheless, the generc formula for the varance n Equaton (4) s stll applcable. Proposton 4 (Self-Jon Sze): Let X = p f p p f be the estmator for self-jon sze defned over the Bernoull sample F. Then: E X = f X = p p 3 4p f 3 + p( 3p) f p( 3p) f (7) (6)

4 C. Notaton for Samplng Coeffcents In order to wrte compact formulas for samplng wth and wthout replacement, we use the followng notaton throughout the rest of the paper: α = F F, α = F F, α = F F β = G G, β = G G, β = G G α and β are the samplng fractons from relatons F and G, respectvely. α, α, β, β are just small varatons that appear n formulas. D. Samplng wth replacement A sample of fxed sze can be generated by repeatedly choosng a random tuple from the base relaton for the specfed number of tmes. If the same tuple can appear n the sample multple tmes, the process s samplng wth replacement. In ths case the random varables correspondng to the frequences n the sample, f and g, respectvely, are the components of a multnomal random varable 6 wth parameters the sze of the sample and the probablty f g G (8) F and, respectvely, where F and G are the sze of F and G. Snce each component of a multnomal random varable s a bnomal random varable, the expectatons n Equaton 5 stll hold but wth dfferent probabltes: E f = αf, E g = βg (9) The exact formulas for expectaton and varance can be derved as for Bernoull samplng. The moments of a multnomal random varable have to be used nstead. Proposton 5 (Sze of Jon): Let X = αβ f g be the estmator for the sze of jon defned over the samples wth replacement F and G. Then the expectaton and the varance of X are gven by: E X = f g X = f g + F αβ f g αβ + G α β ( ) f g + (α β αβ) f g (0) An unbased estmator for self-jon sze defned over the sample wth replacement F s X = αα f α F. Notce that the estmator depends only on the sze of the base relaton and the sze of the sample. X can be derved from the formula of the varance for generc samplng n Equaton (4). We omt the actual formula here due to lack of space. E. Samplng wthout replacement A sample wthout replacement from a relaton conssts of a random subset of tuples selected from the relaton. The dfference between samplng wth replacement and samplng wthout replacement s that a tuple can appear at most once n a sample wthout replacement whle t can appear multple tmes n a sample wth replacement. In ths case the random varables correspondng to the frequences n the sample, f and g, respectvely, are the components of a multvarate hypergeometrc random varable 6. In order to derve the exact formulas for expectaton and varance, the actual moments of the multvarate hypergeometrc dstrbuton have to be plugged n. Proposton 6 (Sze of Jon): Let X = αβ f g be the estmator for the sze of jon defned over the samples wthout replacement F and G. Then the expectaton and the varance of X are gven by: E X = f g X = ( α )( β ) f g + ( α )β f g αβ +α ( β ) ( ) f g + (α β αβ) f g () The only dfference between samplng wth replacement and samplng wthout replacement s the coeffcents of the terms appearng n the varance formula. Whle the varance of samplng wthout replacement becomes 0 when the entre relaton s sampled, the varance of samplng wth replacement never becomes 0. An unbased estmator for self-jon sze defned over the sample wthout replacement F s X = α α αα f F. The varance of X can be derved from the formula of the varance for generc samplng n Equaton (4). We do not nclude the actual formula here. IV. SKETCHES Whle samplng technques select a random subset of tuples from the nput relaton, sketchng technques summarze all the tuples as a small number of random varables. Ths s accomplshed by projectng the doman of the nput relaton on a sgnfcantly smaller doman usng random functons. Multple sketchng technques are proposed n the lterature for estmatng the sze of jon and the second frequency moment (see 4 for detals). Although usng dfferent random functons,.e., {+, } or hashng, the exstng sketchng technques have smlar analytcal propertes,.e., the sketch estmators have the same varance. For ths reason we focus on the basc AGMS sketches, throughout the paper. The basc AGMS sketch of relaton F conssts of a sngle random varable S that summarzes all the tuples t from F. S s defned as: S = ξ t.a = f ξ () t F where ξ s a famly of {+, } random varables that are 4 wse ndependent. Essentally, a random value of ether + or s assocated to each pont n the doman of attrbute A.

5 Then, the correspondng random value s added to the sketch S for each tuple t n the relaton. We can defne a sketch T for relaton G n a smlar way and usng the same famly ξ. Proposton 7 (Sze of Jon): The sketch-based estmator X defned as: X = S T = f ξ g j ξ j (3) j I s an unbased estmator for the sze of jon F A G. The varance of the sketch estmator s gven by: X = ( ) f gj + f g f g (4) j I Proposton 8 (Self-Jon Sze): The unbased estmator for the self-jon sze s defned as: X = S = f f j ξ ξ j (5) j I The varance of the sketch estmator s gven by: ( ) X = (6) A common technque to reduce the varance of an estmator s to generate multple ndependent nstances of the basc estmator and then to buld a more complex estmator as the average of the basc estmators. Whle the expected value of the complex estmator s equal wth the expectaton of one basc estmator, the varance s reduced by a factor of = n snce n n k= X k = X k n, where n s the number of basc estmators beng averaged. Ths technque can be appled to reduce the varance of the sketch estmator f dfferent famles ξ are used for the basc estmators (see, for detals). n n k= X k f f 4 V. SKETCHES OVER SAMPLES Gven the ablty of samplng to make predctons about an entre dataset from a randomly selected subset and that sketches requre the entre dataset n order to determne any of ts propertes, an nterestng queston that mmedately arses s how to combne these two randomzed technques. Although the ntutve answer to ths queston seems to be smple the sketch s computed over a sample of the data nstead of the entre dataset the behavor of the combned estmator s not the smple composton of the ndvdual behavor of the ngredents. A careful analytcal characterzaton of the estmator needs to be carred out. Furthermore, the samplng process can be ether explct and executed as an ndvdual step before sketchng s done, or mplct, stuaton n whch the nput dataset s assumed to be a sample from a large populaton. In the frst case, a sgnfcant speedup n updatng the sketch structure can be obtaned snce only a random subset of the data s actually sketched. Ths process s essentally a load sheddng technque for sketchng extremely fast data streams that cannot be otherwse sketched. It can be mplemented as an explct Bernoull samplng that randomly flters the tuples that update the sketch structure. In the second case, the data s assumed to be a sample from a large populaton and the goal s to determne propertes of the populaton based on the sample. The sample tself s assumed to be large enough so t cannot be stored explctly, thus sketchng s requred. If the populaton s nfnte, the entre process can be seen as sketchng..d. samples from an unknown dstrbuton. We provde the analyss both for samplng wth replacement and samplng wthout replacement. The analyss straghtforwardly extends to..d. samples f all estmators are normalzed by the sze of the populaton and the lmt, when the populaton sze goes to nfnty, s taken. In such a crcumstance, the frequences n the orgnal unknown populaton become denstes of the unknown populaton, but everythng else remans the same. In ths secton, we provde a generc framework for sketchng sampled data streams n order to estmate the sze of jon and the self-jon sze. Then we compute the frst two frequency moments of the combned estmator for the most common types of samplng Bernoull samplng, samplng wth replacement, and samplng wthout replacement. Ths provdes suffcent nformaton to allow the dervaton of confdence bounds for the combned estmator. A. Sketches over Generc Samplng Consder F to be a generc sample obtaned from relaton F. Sketchng the sample F s smlar to sketchng the entre relaton F and conssts n summarzng the sampled tuples t as follows: S = ξ t.a = f ξ (7) t F where ξ s a famly of {+, } random varables that are 4 wse ndependent. A sample G from relaton G can be sketched n a smlar way usng the same famly ξ: T = ξ t.a = g ξ t G Sze of Jon We defne the estmator X for the sze of jon F A G based on the sketches computed over the samples as follows: X = C ST = C f ξ g jξ j (8) Notce that the estmator s smlar to the sketch estmator computed over the entre dataset n Proposton 7 multpled wth a constant scalng factor C that compensates for the dfference n sze. Self-Jon Sze The self-jon sze or second frequency moment of a relaton s the partcular case of sze of jon between two nstances of the same relaton. One way of analyzng the sketches over samples estmator for the self-jon sze problem s to buld two ndependent samples and two ndependent sketches from the same base relaton and then to apply the results correspondng to sze of jon. Although sound from an analytcal pont of j I

6 vew, ths soluton s neffcent n practce. In the followng we consder a practcal soluton that requres the constructon of only one sample and one sketch from the base relaton. A new estmator for the self-jon sze has to be defned nstead, but the analyss s closely related to the analyss of the sze of jon estmator. Wth S defned n Equaton 7, we defne the self-jon sze estmator X as follows: ( ) X = S = C f ξ = C f ξ f jξ j (9) where C s the same scalng factor compensatng for the dfference n sze. Notce that the dfference between the sze of jon estmator and the self-jon sze estmator s only at the samplng level snce the same famly of ξ random varables s used for sketchng n both cases. For ths reason we carry out the analyss for the two estmators n parallel and make the dstncton only when necessary. In order to derve confdence bounds for the estmator X, the frst two moments, expected value and varance, have to be computed. Intutvely, the scalng factor C should compensate for the dfference n sze and make the estmator unbased. Snce the two processes, samplng and sketchng, are ndependent and sequental, the nteracton between them s mnmal and the sum of the two varances should be a good estmator for the varance of the combned estmator. In the followng, we derve the exact formulas for the expectaton and the varance n the generc case. The ndependence of the famles of random varables correspondng to samplng and sketchng, f, g and ξ, respectvely, plays an mportant role n smplfyng the computaton. Ths ndependence s due to the ndependence of the two random processes. Whle the computaton of the expectaton s straghtforward, the computaton of the varance s more ntrcate snce the nteracton between sketchng and samplng s more complex and t can be characterzed only through a detaled analyss. The frst step n our analyss s to derve the formulas for the moments of the basc estmator. Proposton 9 and 0 characterze the behavor of the basc sketch over samples estmator. Proposton 9 (Sze of Jon): Let the sketch over samples estmator for the sze of jon to be defned as X = C f ξ j I g j ξ j. Then, the expectaton and the varance of X are gven by: E X = C E f E g X = C E f j I E g j + j I j I E ( f E g E f E g E f f j E g g j ) (0) Proposton 0 (Self-Jon Sze): Let the sketch over samples estmator for the self-jon sze to be defned as C f ξ j I f j ξ j. Then, the expectaton and the varance of X are gven by: E X = C E f X = C 3 E f f j j I ( E f ) E f 4 () Notce that the varance of sketchng over generc samplng s an expresson dependng only on the propertes of the samplng process. More precsely, n order to evaluate the varance, only expectatons of the form E f and E f f j have to be computed, where f and f j are random varables correspondng to the frequences n the sample. The averagng technque appled to reduce the varance of basc sketches n Secton IV cannot be used straghtforwardly n the case of sketches computed over samples. Ths s the case snce, although the basc sketch estmators are bult ndependently usng dfferent ξ famles of random varables, they are computed over the same sample and ths ntroduces correlatons between any two estmators. The varance of the average estmator s n ths case: n n X k = n X k + (n ) Cov k l X k, X l k= () where n s the number of basc estmators beng averaged and Cov X k, X l = E X k X l E X k E X l s the covarance between any two basc estmators. The next step n our analyss s to derve the formulas for the varance of the average sketch over samples estmator. Proposton and contan the derved formulas. Proposton (Sze of Jon): The varance of the average sketch over samples sze of jon estmator s gven by: n n X k = k= C E ( ) f f j E g g j E f E g j I + n E f j I E g j + j I E f f j E g g j E f ) E g (3) Essentally, the varance of the average estmator s the sum of the varance of the generc samplng estmator n Equaton 3, the varance of the sketch estmator n Equaton 4, and a

7 term correspondng to the nteracton between the two random processes. For the partcular types of samplng consdered n ths work, we derve the exact formula of the nteracton term. Notce that the mprovement obtaned by averagng s less sgnfcant than a factor of n obtaned n the case of ndependent estmators. Proposton (Self-Jon Sze): The varance of the average sketch over samples self-jon sze estmator s gven by: n n X k = C k= j I ( E f ) + n j I E f f j E f f j E f 4 (4) The ndependence of sketchng and samplng plays an mportant role n dervng the formulas for expectaton and varance. The ndependence has the effect of factorzng the expectatons over products of random varables correspondng to sketchng and samplng. Thus, only expectatons nvolvng the samplng frequency random varables appear n the fnal formulas snce the expectatons correspondng to sketches are ether 0 or,.e., E ξ ξ j = E ξ E ξ j = 0 whenever j due to the 4 wse ndependence of the famly ξ, and E ξ =. Usng these equaltes and some complex algebrac manpulatons, the gven formulas are obtaned. The domnant factor that smplfes the analyss s the modelng of samplng as frequency random varables. Ths s our man contrbuton. B. Bernoull Samplng We nstantate the formulas derved for generc samplng n Secton V-A wth the moments of the bnomal random varables correspondng to the Bernoull samplng frequences. Proposton 3 (Sze of Jon): Let the unbased sketch over Bernoull samples sze of jon estmator to be defned as X = pq f ξ j I g j ξ j. Then, the varance of the average estmator s gven by: n n X k = k= p f g + q f ( p)( q) g + f g p q pq + ( ) f gj + f g f g n j I + p f gj + q f g j n p q j I,j j I,j ( p)( q) + f g j pq j I,j (5) Proposton 4 (Self-Jon Sze): The varance of the average unbased self-jon sze estmator X = p ( f ξ ) s gven by: p p f n p p 3 n k= X k p p f = 4p f 3 + p( 3p) ( ) + f f 4 n + n ( p) p j I,j f f j + f p( 3p) ( p) p f j I,j f f j (6) The varance of the average estmator s, as derved for generc samplng, the sum of the average sketch estmator ndvdual varance, the Bernoull samplng estmator ndvdual varance, and an nteracton term. Snce dervng an exact analytcal relaton between these terms s a dauntng task, we consder some extreme scenaros that allow a partal characterzaton. Frst notce that n, the number of estmators beng averaged, can be gnored when comparng the sketch varance and the nteracton varance because t has the same effect on both (the varance s reduced by a factor of n). When the dstrbuton of the frequences s unform, the nteracton varance s the domnant term whenever the unque frequency has a smaller value than the sze of the doman I. At the other extreme, when the dstrbuton of the frequences s skewed, the sketch varance s the domnant term by far. These results suggest that the nteracton varance could represent a problem for unform-lke data. Ths s not necessarly the case because the value of the varance for unform dstrbutons s sgnfcantly smaller than for skewed data, thus, although the nteracton varance s the domnant term, ts absolute value s not large. To better understand the exact sgnfcance of each of the terms appearng n the varance and to confrm the analyss for the extreme cases, we desgned a set of smulatons to determne the relatve contrbuton of each of the terms. The expermental setup s descrbed n Secton VII. Fgure and depct the relatve contrbuton of each of the three terms appearng n the varance of the average estmator over Bernoull samples (Equaton 6 and 5). The relatve contrbuton s represented as a functon of the data skew for dfferent samplng probabltes. A common trend both for sze of jon and self-jon sze s that the nteracton term s hghly sgnfcant for low skew data. Ths completely justfes the analyss we develop throughout the paper snce an analyss assumng that the varance of the composed estmator s the sum of the varances of the basc estmators would be ncorrect. At the same tme, ths suggests that the accuracy of the sketch over samples estmator can be sgnfcantly

8 worse than the accuracy of the sketch estmator for nonskewed data. As already explaned, ths s not necessarly true. Moreover, the expermental results we provde n Secton VII show that ths s not the case for practcal scenaros. As expected, the mpact of the varance of the samplng estmator s more sgnfcant as the sze of the sample s smaller. For self-jon sze (Fgure ), the varance s domnated by the term correspondng to the samplng estmator, whle for sze of jon (Fgure ) the varance of the sketch estmator quantfes for almost the entre varance rrespectve of the samplng probablty. Ths s entrely supported by the exstng theoretcal results whch show that sketches are optmal for estmatng the second frequency moment whle samplng s optmal for the estmaton of sze of jon. C. Samplng wth replacement In a smlar way to Bernoull samplng, we nstantate the formula for the sze of jon estmator derved for generc samplng wth the moments of the multnomal random varables correspondng to the samplng frequences. The nteracton between two dfferent samplng frequences makes the dervaton of the formula more complcated. We do not provde the formula for self-jon sze varance due to space constrants. Proposton 5 (Sze of Jon): Let the unbased sketch over samples wth replacement sze of jon estmator to be defned as X = αβ f ξ j I g j ξ j. Then, the varance of the average estmator s gven by: n n X k k= = αβ f g + F αβ f g + G α β ( ) f g + (α β αβ) f g + α β ( ) f gj + f g f g n α β j I + f g j + F αβ f gj n αβ j I,j j I,j + G α β f g j j I,j (7) D. Samplng wthout replacement We apply the same procedure for samplng wthout replacement. We obtan smlar results to the other types of samplng the terms n the varance are the same, only the coeffcents are dependent on the samplng procedure. The formula for self-jon sze s not provded due to space lmtatons. Proposton 6 (Sze of Jon): Let the unbased sketch over samples wthout replacement sze of jon estmator to be defned as X = αβ f ξ j I g j ξ j. Then, the varance of the average estmator s gven by: n n X k = k= ( α )( β ) f g + ( α )β f g αβ +α ( β ) ( ) f g + (α β αβ) f g + α β ( ) f gj + f g f g n α β j I + ( α )( β ) f g j n αβ + ( α ) β E. Dscusson j I,j j I,j f g j + α ( β ) j I,j f g j (8) The result we derved n ths secton s somewhat surprsng: the varance of the combned sketch-samplng estmator can be wrtten as the sum of the varance of the sketch estmator, the varance of the samplng estmator, and an nteracton term. Ths separaton of the varance formula was accomplshed for all three types of samplng and both sze of jon and self-jon-sze problems. Although the sketch varance seems to be the domnant term, the exact sgnfcance of each of the terms s dependent on the actual dstrbuton of the data (see the expermental results for Bernoull samplng). When multple sketch estmators are averaged n order to decrease the varance, the covarance must also be consdered snce the sketch estmators are computed over the same sample, thus they are correlated. Our results show that the varance of the combned estmator does not decrease by a factor equal to the number of averages, only the sketch varance and the nteracton term do. VI. APPLICATIONS In ths secton, we dentfy applcatons for sketchng sampled data for each of the types of samplng dscussed n the paper. We also delve nto the algorthmc ssues correspondng to the combned randomzed process. A. Bernoull Samplng Even though sketchng can be mplemented for farly hghspeed data streams, the update tme could stll become the lmtng factor f all the tuples need to be sketched. In order to allevate ths problem, some of the tuples need to be dropped. Bernoull samplng provdes a prncpled approach to drop tuples and stll be able to characterze the result. Sketchng Bernoull samples s a method to further ncrease the rate of the data streams that can be sketched. Used along hashng, sketchng over Bernoull samples allows the processng of

9 Interacton Samplng Sketch Interacton Samplng Sketch ance terms dstrbuton ance terms dstrbuton Zpf coeffcent / p Zpf coeffcent / p Fg.. Sze of jon varance Fg.. Self-jon sze varance data streams havng arrval rates of bllons of tuples per second encountered n the exstng networkng equpment. The analyss n Secton V-B provdes a complete characterzaton of the error behavor for the overall process. The expermental results n Secton VII-A show that for a samplng fracton of % the decrease n accuracy over sketchng the entre data stream s nsgnfcant. The basc Bernoull samplng algorthm conssts n generatng a random number for each tuple n the dataset. The tuples for whch the random value s smaller than the samplng probablty are ncluded n the sample. The man drawback of Bernoull samplng s that the sze of the sample s unknown pror to runnng the process. Ths s not a problem anymore when the sample s sketched nstead of beng explctly stored. The algorthm for sketchng Bernoull samples s a smple extenson of the basc Bernoull samplng algorthm. The tuples that are selected n the sample are nserted nto the sketch data structure nstead of beng explctly stored. Snce updatng the sketch s an extremely fast operaton 7, t could be doubtful that the extra samplng s ndeed benefcal for speedng the entre process. Fortunately, there exst algorthms that avod the con tossng for each tuple by generatng the ntervals between the tuples that are selected n the sample 8. Ths way there s work to be done only for the tuples that are actually sampled and subsequently sketched. In ths stuaton the speedup over sketchng the entre data stream s clearly proportonal wth the samplng fracton. B. Samplng wth replacement Consder a generatve model that draws samples from a fnte populaton. The samples are draw wth replacement. A data stream contanng all the samples s generated. The objectve s to determne propertes, e.g., the second frequency moment, of the generatve model, or correlatons between two dfferent generatve models, e.g., the sze of jon, from the stream of samples. An addtonal requrement s that the stream s large enough that t cannot be stored n full. These knd of scenaros are frequent n data-mnng applcatons 6. Sketchng the stream of samples obtaned wth replacement from the fnte populaton whose sze s known represents a soluton to determnng propertes of the generatve model. The nput stream s the actual sample n ths case, so no explct samplng of the stream s requred. Thus, the standard updatng algorthm for sketches can be used n ths case. The estmaton algorthm s though dfferent because t has to take nto consderaton that the stream s only a sample. The formulas derved n Secton V-C have to be used for estmaton n ths case. The mportant queston n ths case s how accurate s the combned estmator. And how large has to be the sketched sample n order to obtan accurate estmatons. The expermental results n Secton VII-B show that for a samplng fracton of 0% the estmaton s accurate and stable. No sgnfcant ncrease n accuracy s obtaned f the sample sze s larger. C. Samplng wthout replacement The applcaton we have n mnd for sketchng samples obtaned wthout replacement s onlne aggregaton. In a tradtonal database engne, the exact answer to a query s provded only after the entre data s processed. The user does not get any clues about the result durng the query executon. Ths may take a long tme for complex queres over a datawarehouse. Onlne aggregaton has a dfferent strategy. Partal approxmate answers are provded to the user whle the query s processed by executng equvalent queres on a smaller fracton of the entre data and then scalng the result. As more data s processed, the accuracy of the approxmate result ncreases to the pont where the exact answer s returned (when the entre data s processed). The fundamental requrement for the partal results to be estmates of the fnal result s that the portons of the data the equvalent queres are executed on to represent random samples wthout replacement from the entre data. More detals on onlne aggregaton can be found n the lterature8,, 9. Sketchng samples obtaned wthout replacement represents a fast and nexpensve method to gather some of the statstcs

10 (second frequency moment, correlaton between attrbutes) used by an onlne aggregaton engne to take decsons and to compute the approxmate results. The dea s to buld sketches for the desred statstcs whle the relatons (materalzed or ntermedary results) are scanned. The fracton of the relaton seen at each pont durng the scan represents a sample wthout replacement of the entre relaton as long as the order of the tuples s random. More accurate estmates for the computed statstcs are avalable as the scannng advances. The goal s to obtan stable estmators as early as possble such that the onlne aggregaton engne takes the optmal decsons and provdes estmates as accurate as possble. The analyss n Secton V-D provdes a complete characterzaton of the combned estmators. In Secton VII-C we show expermental results for whch accurate estmates are avalable after only 0% of the relatons s scanned. From an algorthmc pont of vew, sketchng a relaton whle t s scanned ncurs almost no addtonal cost. The advantage over usng the samples to provde estmates s that the samples do not have to be explctly stored and processed. There s extra memory requred only to store the sketches. Sketchng can be done for arrval rates of tens of mllons of tuples per second wthout any tme penalty. Or t can be executed as a separate thread n parallel wth scannng, whch s necessary. On the modern mult-core processors, sketchng can be done essentally for free. VII. EXPERIMENTAL EVALUATION We pursue two man goals n the expermental evaluaton of the sketchng over samples estmators. Frst, we want to determne the behavor of the error of the sketch over samples estmator when compared wth the error of the sketch estmator. And second, we want to dentfy what s the behavor of the estmaton error as a functon of the sample sze. We desgn experments to determne these relatons for all three types of samplng presented n the paper. And both for the sze of jon and the self-jon sze problems. In order to accomplsh these goals, we desgned a seres of experments over both synthetc datasets and the TPC-H dataset. Synthetc datasets allow a better control of the mportant parameters that affect the results, whle the TPC-H dataset valdates the results for large scales. The synthetc datasets used n our experments contan ether 0 or 00 mllon tuples generated from a Zpfan dstrbuton wth the coeffcent rangng between 0 (unform) and 5 (skewed). The doman of the possble values s mllon. In the case of sze of jon, the tuples n the two relatons are generated completely ndependent. For the experments over the TPC-H dataset, we used the scale benchmark data. We used F-AGMS sketches 3 n all of the experments due to ther superor performance both n accuracy and update tme (see 4 for detals on sketchng technques). The number of buckets s ether 5, 000 or 0, 000. Ths s equvalent to averagng 5, 000 or 0, 000 basc estmators. In order to be statstcally sgnfcant, all the results presented n ths secton are the average of at least 00 ndependent experments. A. Bernoull Samplng estmaton true result The expermental relatve error,.e., true result, of the sketch over Bernoull samples estmator s depcted n Fgure 3 and 4 as a functon of the data skew for dfferent samplng probabltes. Probablty p =.0 corresponds to sketchng the entre dataset. These expermental results show that, wth some exceptons, the samplng rate does not sgnfcantly affect the accuracy of the sketch estmator. For Zpf coeffcents smaller than, n the case of self-jon sze, and smaller than 3, n the case of sze of jon, the error of the sketch estmator s almost the same both when the entre dataset s sketched or when only one tuple out of a thousand s sketched. The mpact of the samplng rate s sgnfcant only for hgh skew data n the case of self-jon sze. Ths s to be expected from the theoretcal analyss (Fgure ). What cannot be explaned from the theoretcal analyss s the effect of the samplng rate for skewed data n the case of sze of jon. As shown n 4, the expermental behavor of F-AGMS sketches s n some cases orders of magntude better than the theoretcal predctons, thus although the theoretcal varance s domnated by the varance of the sketch estmator, the emprcal absolute value s small when compared to the varance of the samplng estmator. In the lght of 4, the emprcal results for hgh samplng rates are much better than the theoretcal predctons, ncreasng thus the sgnfcance of the samplng rate for hghly skewed data. B. Samplng wth replacement In Fgure 5 and 6 we depct the expermental relatve error as a functon of the sample sze for samplng wth replacement. Snce the actual sze of the sample s dfferent for dfferent Zpf coeffcents, we represent on the x axs the sze of the sample as a fracton from the populaton sze, wth correspondng to a sample wth replacement of sze equal to the populaton sze. As expected, the error s decreasng as the sample sze becomes larger, but t stablzes after a certan sample sze (a fracton of the populaton sze for the ncluded fgures). Thus, sketchng more samples does not provde any ncrease n the accuracy after a certan pont. For the stuatons depcted n Fgure 5 and 6, the edge samplng fracton s around. C. Samplng wthout replacement We used the scale TPC-H data for our experments on sketchng samples wthout replacement. Fgure 7 depcts the error as a functon of the samplng rate for the sze of jon between the relatons lnetem and orders. In Fgure 8 we plot the error of the second frequency moment of relaton lnetem on the l orderkey attrbute. As expected, the error of the selfjon sze estmator decreases whle ncreasng the sample sze and t becomes stable for samplng rates larger than 0%. For sze of jon, the behavor of the error s somehow unexpected. The smallest error n Fgure 7 s obtaned for a samplng rate of 0%. Then, the error starts to ncrease whle ncreasng the samplng rate. Ths behavor s due to F-AGMS sketches.

11 0 p= p= p= p=.0 Relatve error (log scale) e-04 Relatve error (log scale) e-04 e-05 e-06 e-07 e-08 e-05 e ZIPF coeffcent e-09 p= e-0 p= p= p=.0 e ZIPF coeffcent Fg. 3. Sze of jon error (Bernoull) Fg. 4. Self-jon sze error (Bernoull) Relatve errorv (log scale) ZIPF=0.0 ZIPF=0.5 ZIPF=.0 ZIPF=.0 Relatve errorv (log scale) ZIPF=0.0 ZIPF=0.5 ZIPF=.0 ZIPF=.0 Samplng rate (log scale) e-04 Samplng rate (log scale) Fg. 5. Sze of jon sample sze (WR) Fg. 6. Self-jon sze sample sze (WR) 0 00 Relatve error (log scale) Relatve error (log scale) 0 Samplng rate (log scale) Samplng rate (log scale) Fg. 7. Sze of jon sample sze (WOR) Fg. 8. Self-jon sze sample sze (WOR)

12 D. Extreme behavor of F-AGMS sketches An nterestng trend n Fgure 7 s the ncrease of the error as the samplng rate ncreases over 0%. Ths s counter ntutve snce we would expect smaller error as more data s sketched. Ths behavor s not predcted by the theory and t must be due to the fact that the theoretcal formulas are derved for AGMS sketches, but F-AGMS sketches are used n the experments. To understand why ths s happenng we have to reffer to the analyss of F-AGMS sketches n 4. F-AGMS sketches use a combnaton of hashes and AGMS sketches wthn each hash bucket. As more data s sketched, the contenton n buckets ncreases and ths produces a wder varance of the estmates. Ths suggests that n some stuatons t s better to sketch only a sample of the data rather than the entre data for F-AGMS sketches. E. Dscusson We provde expermental results for each of the types of samplng presented n the paper. The goal s to compare the combned sketch over samples estmator wth the sketch estmator and to determne ther relaton. Our experments for Bernoull samplng show that a sgnfcant speed-up (a factor of 0 n general and a factor of up to 000 n some cases) can be obtaned by sketchng only a small sample of the data nstead of the entre data. The decrease n accuracy due to samplng seems to be nsgnfcant. For samplng wth replacement, the dfference n accuracy between sketchng only a small sample (a fracton of or less from the populaton sze) and sketchng the entre data s mnmal. Moreover, the error becomes stable and t does not decrease anymore f the sample sze s ncreased above a certan threshold (0% n our results). Although the stuaton s almost smlar for samplng wthout replacement, we also observed some unexpected results for ths type of samplng. The smallest error s obtaned when only a sample of the data (0%) s sketched, not the entre dataset. Ths s due to the extreme behavor of F- AGMS sketches n some partcular stuatons 4. Essentally, sketchng more data can actually decrease the accuracy of F- AGMS sketch estmators f the sketched sample captures well enough the dstrbuton of the entre dataset. Summarzng, the expermental results show that the sketch over samples estmator has almost smlar accuracy to the sketch estmator startng wth samplng rates of 0% or smaller. VIII. CONCLUSIONS In ths paper we ntroduce the sketch over samples estmator for sze of jon and self-jon sze. We provde a detaled analyss of the error of ths estmator for generc samplng based on the moment generatng functon of the samplng frequences. Then, we nstantate the general results for three dfferent types of samplng: Bernoull samplng, samplng wth replacement, and samplng wthout replacement. Our results show that t s possble to express the varance of the combned estmator as the sum of the varance of the samplng estmator, the varance of the sketch estmator, and an nteracton term. Although the sketch varance seems to be the domnant term, the exact mportance of each of the terms s hghly dependant on the exact dstrbuton of the actual data. We provde expermental results that show that the accuracy of the combned estmator s almost smlar (and sometmes better) to the accuracy of the sketch estmator even for small samplng rates of 0%. We also dentfy possble applcatons of the sketch over samples estmator for each type of samplng. In concluson, we beleve that the sketch over samples estmator can be used nstead of the sketch estmator wthout a sgnfcant degradaton n accuracy and wth a clear gan n processng tme as long as the sample rate s around 0%. IX. ACKNOWLEDGMENTS Materal n ths paper s based upon work supported by the Natonal Scence Foundaton under Grant No We would lke to thank the anonymous referees for ther useful revews that helped us to mprove the qualty of the paper. REFERENCES N. Alon, Y. Matas, and M. Szegedy, The space complexty of approxmatng the frequency moments, n STOC 996, pp N. Alon, P. Gbbons, Y. Matas, and M. Szegedy, Trackng jon and self-jon szes n lmted storage, J. Comput. Syst. Sc., vol. 64, no. 3, pp , G. Cormode and M. Garofalaks, Sketchng streams through the net: dstrbuted approxmate query trackng, n VLDB 005, pp F. Rusu and A. Dobra, Statstcal analyss of sketch estmators, n SIGMOD 007, pp N. Tatbul, U. Çetntemel, S. Zdonk, M. Chernack, and M. Stonebraker, Load sheddng n a data stream manager, n VLDB 003, pp P. Domngos and G. Hulten, Mnng hgh-speed data streams, n KDD 000, pp J. Shao, Mathematcal Statstcs. Sprnger-Verlag, P. Haas and J. Hellersten, Rpple jons for onlne aggregaton, n SIGMOD 999, pp C. Jermane, S. Arumugam, A. Pol, and A. Dobra, Scalable approxmate query processng wth the dbo engne, n SIGMOD 007, pp P. Gbbons, Y. Matas, and V. Poosala, Fast ncremental mantenance of approxmate hstograms, n VLDB 997, pp C. Jermane, A. Dobra, S. Arumugam, S. Josh, and A. Pol, The sortmerge-shrnk jon, ACM TODS, vol. 3, no. 4, pp G. Cormode and M. Garofalaks, Sketchng probablstc data streams, n SIGMOD 007, pp T. S. Jayram, A. McGregor, S. Muthukrshnan, and E. Vee, Estmatng statstcal aggregates on probablstc data streams, n PODS 007, pp S. Bhattacharyya, A. Madera, S. Muthukrshnan, and T. Ye, How to scalably and accurately skp past streams, n ICDE Workshops 007, pp R. Motwan and P. Raghavan, Randomzed Algorthms. Cambrdge Unversty Press, N. L. Johnson, S. Kotz, and N. Balakrshnan, Dscrete Multvarate Dstrbutons. John Wley & Sons, Inc., F. Rusu and A. Dobra, Pseudo-random number generaton for sketchbased estmatons, ACM TODS, vol. 3, no., p., F. Olken, Random Samplng from Databases Ph.D. Thess. UC Berkeley, 993.

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

A Probabilistic Theory of Coherence

A Probabilistic Theory of Coherence A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want

More information

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6 PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has

More information

Extending Probabilistic Dynamic Epistemic Logic

Extending Probabilistic Dynamic Epistemic Logic Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy 4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

More information

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University Characterzaton of Assembly Varaton Analyss Methods A Thess Presented to the Department of Mechancal Engneerng Brgham Young Unversty In Partal Fulfllment of the Requrements for the Degree Master of Scence

More information

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP) 6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

CHAPTER 14 MORE ABOUT REGRESSION

CHAPTER 14 MORE ABOUT REGRESSION CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

A Performance Analysis of View Maintenance Techniques for Data Warehouses

A Performance Analysis of View Maintenance Techniques for Data Warehouses A Performance Analyss of Vew Mantenance Technques for Data Warehouses Xng Wang Dell Computer Corporaton Round Roc, Texas Le Gruenwald The nversty of Olahoma School of Computer Scence orman, OK 739 Guangtao

More information

Analysis of Premium Liabilities for Australian Lines of Business

Analysis of Premium Liabilities for Australian Lines of Business Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton

More information

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION

More information

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining Rsk Model of Long-Term Producton Schedulng n Open Pt Gold Mnng R Halatchev 1 and P Lever 2 ABSTRACT Open pt gold mnng s an mportant sector of the Australan mnng ndustry. It uses large amounts of nvestments,

More information

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS Chrs Deeley* Last revsed: September 22, 200 * Chrs Deeley s a Senor Lecturer n the School of Accountng, Charles Sturt Unversty,

More information

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters Frequency Selectve IQ Phase and IQ Ampltude Imbalance Adjustments for OFDM Drect Converson ransmtters Edmund Coersmeer, Ernst Zelnsk Noka, Meesmannstrasse 103, 44807 Bochum, Germany edmund.coersmeer@noka.com,

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

Realistic Image Synthesis

Realistic Image Synthesis Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

STATISTICAL DATA ANALYSIS IN EXCEL

STATISTICAL DATA ANALYSIS IN EXCEL Mcroarray Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 6 Some Advanced Topcs Dr. Petr Nazarov 14-01-013 petr.nazarov@crp-sante.lu Statstcal data analyss n Ecel. 6. Some advanced topcs Correcton for

More information

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,

More information

How To Calculate The Accountng Perod Of Nequalty

How To Calculate The Accountng Perod Of Nequalty Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

Mining Multiple Large Data Sources

Mining Multiple Large Data Sources The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits Lnear Crcuts Analyss. Superposton, Theenn /Norton Equalent crcuts So far we hae explored tmendependent (resste) elements that are also lnear. A tmendependent elements s one for whch we can plot an / cure.

More information

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered

More information

Credit Limit Optimization (CLO) for Credit Cards

Credit Limit Optimization (CLO) for Credit Cards Credt Lmt Optmzaton (CLO) for Credt Cards Vay S. Desa CSCC IX, Ednburgh September 8, 2005 Copyrght 2003, SAS Insttute Inc. All rghts reserved. SAS Propretary Agenda Background Tradtonal approaches to credt

More information

Multiple-Period Attribution: Residuals and Compounding

Multiple-Period Attribution: Residuals and Compounding Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens

More information

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model

More information

Portfolio Loss Distribution

Portfolio Loss Distribution Portfolo Loss Dstrbuton Rsky assets n loan ortfolo hghly llqud assets hold-to-maturty n the bank s balance sheet Outstandngs The orton of the bank asset that has already been extended to borrowers. Commtment

More information

Social Nfluence and Its Models

Social Nfluence and Its Models Influence and Correlaton n Socal Networks Ars Anagnostopoulos Rav Kumar Mohammad Mahdan Yahoo! Research 701 Frst Ave. Sunnyvale, CA 94089. {ars,ravkumar,mahdan}@yahoo-nc.com ABSTRACT In many onlne socal

More information

MARKET SHARE CONSTRAINTS AND THE LOSS FUNCTION IN CHOICE BASED CONJOINT ANALYSIS

MARKET SHARE CONSTRAINTS AND THE LOSS FUNCTION IN CHOICE BASED CONJOINT ANALYSIS MARKET SHARE CONSTRAINTS AND THE LOSS FUNCTION IN CHOICE BASED CONJOINT ANALYSIS Tmothy J. Glbrde Assstant Professor of Marketng 315 Mendoza College of Busness Unversty of Notre Dame Notre Dame, IN 46556

More information

Properties of Indoor Received Signal Strength for WLAN Location Fingerprinting

Properties of Indoor Received Signal Strength for WLAN Location Fingerprinting Propertes of Indoor Receved Sgnal Strength for WLAN Locaton Fngerprntng Kamol Kaemarungs and Prashant Krshnamurthy Telecommuncatons Program, School of Informaton Scences, Unversty of Pttsburgh E-mal: kakst2,prashk@ptt.edu

More information

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy Fnancal Tme Seres Analyss Patrck McSharry patrck@mcsharry.net www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton

More information

Statistical algorithms in Review Manager 5

Statistical algorithms in Review Manager 5 Statstcal algorthms n Reve Manager 5 Jonathan J Deeks and Julan PT Hggns on behalf of the Statstcal Methods Group of The Cochrane Collaboraton August 00 Data structure Consder a meta-analyss of k studes

More information

An Empirical Study of Search Engine Advertising Effectiveness

An Empirical Study of Search Engine Advertising Effectiveness An Emprcal Study of Search Engne Advertsng Effectveness Sanjog Msra, Smon School of Busness Unversty of Rochester Edeal Pnker, Smon School of Busness Unversty of Rochester Alan Rmm-Kaufman, Rmm-Kaufman

More information

Section 5.4 Annuities, Present Value, and Amortization

Section 5.4 Annuities, Present Value, and Amortization Secton 5.4 Annutes, Present Value, and Amortzaton Present Value In Secton 5.2, we saw that the present value of A dollars at nterest rate per perod for n perods s the amount that must be deposted today

More information

Traffic-light a stress test for life insurance provisions

Traffic-light a stress test for life insurance provisions MEMORANDUM Date 006-09-7 Authors Bengt von Bahr, Göran Ronge Traffc-lght a stress test for lfe nsurance provsons Fnansnspetonen P.O. Box 6750 SE-113 85 Stocholm [Sveavägen 167] Tel +46 8 787 80 00 Fax

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

Efficient Striping Techniques for Variable Bit Rate Continuous Media File Servers æ

Efficient Striping Techniques for Variable Bit Rate Continuous Media File Servers æ Effcent Strpng Technques for Varable Bt Rate Contnuous Meda Fle Servers æ Prashant J. Shenoy Harrck M. Vn Department of Computer Scence, Department of Computer Scences, Unversty of Massachusetts at Amherst

More information

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Research Note APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES * Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

Abstract. Clustering ensembles have emerged as a powerful method for improving both the

Abstract. Clustering ensembles have emerged as a powerful method for improving both the Clusterng Ensembles: {topchyal, Models jan, of punch}@cse.msu.edu Consensus and Weak Parttons * Alexander Topchy, Anl K. Jan, and Wllam Punch Department of Computer Scence and Engneerng, Mchgan State Unversty

More information

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008 Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn

More information

RESEARCH DISCUSSION PAPER

RESEARCH DISCUSSION PAPER Reserve Bank of Australa RESEARCH DISCUSSION PAPER Competton Between Payment Systems George Gardner and Andrew Stone RDP 2009-02 COMPETITION BETWEEN PAYMENT SYSTEMS George Gardner and Andrew Stone Research

More information

Chapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract

Chapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract Household Sample Surveys n Developng and Transton Countres Chapter More advanced approaches to the analyss of survey data Gad Nathan Hebrew Unversty Jerusalem, Israel Abstract In the present chapter, we

More information

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000 Problem Set 5 Solutons 1 MIT s consderng buldng a new car park near Kendall Square. o unversty funds are avalable (overhead rates are under pressure and the new faclty would have to pay for tself from

More information

The Application of Fractional Brownian Motion in Option Pricing

The Application of Fractional Brownian Motion in Option Pricing Vol. 0, No. (05), pp. 73-8 http://dx.do.org/0.457/jmue.05.0..6 The Applcaton of Fractonal Brownan Moton n Opton Prcng Qng-xn Zhou School of Basc Scence,arbn Unversty of Commerce,arbn zhouqngxn98@6.com

More information

Efficient Project Portfolio as a tool for Enterprise Risk Management

Efficient Project Portfolio as a tool for Enterprise Risk Management Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse

More information

Time Value of Money Module

Time Value of Money Module Tme Value of Money Module O BJECTIVES After readng ths Module, you wll be able to: Understand smple nterest and compound nterest. 2 Compute and use the future value of a sngle sum. 3 Compute and use the

More information

Fragility Based Rehabilitation Decision Analysis

Fragility Based Rehabilitation Decision Analysis .171. Fraglty Based Rehabltaton Decson Analyss Cagdas Kafal Graduate Student, School of Cvl and Envronmental Engneerng, Cornell Unversty Research Supervsor: rcea Grgoru, Professor Summary A method s presented

More information

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems Jont Schedulng of Processng and Shuffle Phases n MapReduce Systems Fangfe Chen, Mural Kodalam, T. V. Lakshman Department of Computer Scence and Engneerng, The Penn State Unversty Bell Laboratores, Alcatel-Lucent

More information

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems STAN-CS-73-355 I SU-SE-73-013 An Analyss of Central Processor Schedulng n Multprogrammed Computer Systems (Dgest Edton) by Thomas G. Prce October 1972 Techncal Report No. 57 Reproducton n whole or n part

More information

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM BARRIOT Jean-Perre, SARRAILH Mchel BGI/CNES 18.av.E.Beln 31401 TOULOUSE Cedex 4 (France) Emal: jean-perre.barrot@cnes.fr 1/Introducton The

More information

Customer Lifetime Value Modeling and Its Use for Customer Retention Planning

Customer Lifetime Value Modeling and Its Use for Customer Retention Planning Customer Lfetme Value Modelng and Its Use for Customer Retenton Plannng Saharon Rosset Enat Neumann Ur Eck Nurt Vatnk Yzhak Idan Amdocs Ltd. 8 Hapnna St. Ra anana 43, Israel {saharonr, enatn, ureck, nurtv,

More information

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS 21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS

More information

Activity Scheduling for Cost-Time Investment Optimization in Project Management

Activity Scheduling for Cost-Time Investment Optimization in Project Management PROJECT MANAGEMENT 4 th Internatonal Conference on Industral Engneerng and Industral Management XIV Congreso de Ingenería de Organzacón Donosta- San Sebastán, September 8 th -10 th 010 Actvty Schedulng

More information

Use of Numerical Models as Data Proxies for Approximate Ad-Hoc Query Processing

Use of Numerical Models as Data Proxies for Approximate Ad-Hoc Query Processing Preprnt UCRL-JC-?????? Use of Numercal Models as Data Proxes for Approxmate Ad-Hoc Query Processng R. Kammura, G. Abdulla, C. Baldwn, T. Crtchlow, B. Lee, I. Lozares, R. Musck, and N. Tang U.S. Department

More information

FORMAL ANALYSIS FOR REAL-TIME SCHEDULING

FORMAL ANALYSIS FOR REAL-TIME SCHEDULING FORMAL ANALYSIS FOR REAL-TIME SCHEDULING Bruno Dutertre and Vctora Stavrdou, SRI Internatonal, Menlo Park, CA Introducton In modern avoncs archtectures, applcaton software ncreasngly reles on servces provded

More information

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

SUPPLIER FINANCING AND STOCK MANAGEMENT. A JOINT VIEW.

SUPPLIER FINANCING AND STOCK MANAGEMENT. A JOINT VIEW. SUPPLIER FINANCING AND STOCK MANAGEMENT. A JOINT VIEW. Lucía Isabel García Cebrán Departamento de Economía y Dreccón de Empresas Unversdad de Zaragoza Gran Vía, 2 50.005 Zaragoza (Span) Phone: 976-76-10-00

More information

The Current Employment Statistics (CES) survey,

The Current Employment Statistics (CES) survey, Busness Brths and Deaths Impact of busness brths and deaths n the payroll survey The CES probablty-based sample redesgn accounts for most busness brth employment through the mputaton of busness deaths,

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES In ths chapter, we wll learn how to descrbe the relatonshp between two quanttatve varables. Remember (from Chapter 2) that the terms quanttatve varable

More information

Joe Pimbley, unpublished, 2005. Yield Curve Calculations

Joe Pimbley, unpublished, 2005. Yield Curve Calculations Joe Pmbley, unpublshed, 005. Yeld Curve Calculatons Background: Everythng s dscount factors Yeld curve calculatons nclude valuaton of forward rate agreements (FRAs), swaps, nterest rate optons, and forward

More information

+ + + - - This circuit than can be reduced to a planar circuit

+ + + - - This circuit than can be reduced to a planar circuit MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to

More information

An Inductive Fuzzy Classification Approach applied to Individual Marketing

An Inductive Fuzzy Classification Approach applied to Individual Marketing An Inductve Fuzzy Classfcaton Approach appled to Indvdual Marketng Mchael Kaufmann, Andreas Meer Abstract A data mnng methodology for an nductve fuzzy classfcaton s ntroduced. The nducton step s based

More information

Demographic and Health Surveys Methodology

Demographic and Health Surveys Methodology samplng and household lstng manual Demographc and Health Surveys Methodology Ths document s part of the Demographc and Health Survey s DHS Toolkt of methodology for the MEASURE DHS Phase III project, mplemented

More information

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts Power-of-wo Polces for Sngle- Warehouse Mult-Retaler Inventory Systems wth Order Frequency Dscounts José A. Ventura Pennsylvana State Unversty (USA) Yale. Herer echnon Israel Insttute of echnology (Israel)

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

How To Solve An Onlne Control Polcy On A Vrtualzed Data Center

How To Solve An Onlne Control Polcy On A Vrtualzed Data Center Dynamc Resource Allocaton and Power Management n Vrtualzed Data Centers Rahul Urgaonkar, Ulas C. Kozat, Ken Igarash, Mchael J. Neely urgaonka@usc.edu, {kozat, garash}@docomolabs-usa.com, mjneely@usc.edu

More information