Space-Efficient Estimation of Statistics over Sub-Sampled Streams

Size: px
Start display at page:

Download "Space-Efficient Estimation of Statistics over Sub-Sampled Streams"

Transcription

1 Noame mauscript No. wi be iserted by the editor Space-Efficiet Estimatio of Statistics over Sub-Samped Streams Adrew McGregor A. Pava Srikata Tirthapura David Woodruff the date of receipt ad acceptace shoud be iserted ater Abstract I may stream moitorig situatios, the data arriva rate is so high that it is ot eve possibe to observe each eemet of the stream. The most commo soutio is to subsampe the data stream ad use the sampe to ifer properties ad estimate aggregates of the origia stream. However, i may cases, the estimatio of aggregates o the origia stream caot be accompished through simpy estimatig them o the samped stream, foowed by a ormaizatio. We preset agorithms for estimatig frequecy momets, support size, etropy, ad heavy hitters of the origia stream, through a sige pass over the samped stream. Keywords data streams, frequecy momets, sub-sampig 1 Itroductio I may stream moitorig situatios, the data arriva rate is so high that it is possibe to observe each eemet i the stream. The most commo soutio is to sub-sampe the data stream ad use the sampe to ifer properties of the origia stream. For exampe, i a IP router, aggregated statistics of the packet stream are maitaied through a protoco such as Netfow [9]. I high-ed routers, the oad due to statistics maiteace ca be so high that a variat of Netfow caed samped Netfow has bee deveoped. I radomy samped etfow, the moitor gets to view oy a radom sampe of the packet stream, ad must maitai statistics o the origia stream, usig this view. I such scearios of extreme data deuge, we are faced with two costraits o data processig. First, the etire data set is ot see by the moitor; oy a radom sampe is Adrew McGregor Uiversity of Massachusetts, E-mai: mcgregor@cs.umass.edu. Supported by NSF CAREER Award CCF A. Pava Iowa State Uiversity, E-mai: pava@cs.iastate.edu. Supported i part by NSF CCF Srikata Tirthapura Iowa State Uiversity, E-mai: st@iastate.edu. Supported i part by NSF CNS , CNS David P. Woodruff IBM Amade, E-mai: dpwoodru@us.ibm.com

2 Adrew McGregor et a. see. Secod, eve the radom sampe of the iput is too arge to be stored i mai memory or i secodary memory, ad must be processed i a sige pass through the data, as i the usua data stream mode. Whie there has bee a arge body of work that has deat with data processig usig a radom sampe see for exampe, [3, 4], ad extesive work o the oe-pass data stream mode see for exampe, [1, 9, 33], there has bee itte work so far o data processig i the presece of both costraits, where oy a radom sampe of the data set must be processed i a streamig fashio. We ote that the estimatio of frequecy momets over a samped stream is oe of the ope probems from [31], posed as Questio 13, Effects of Subsampig. 1.1 Probem Settig We assume the settig of Beroui sampig, described as foows. Cosider a iput stream P = a 1,a,...,a where a i {1,,...,m}. For a parameter p, 0 < p 1, a sub-stream of P, deoted L is costructed as foows. For 1 i, a i is icuded i L with probabiity p. The stream processor is oy aowed to see L, ad caot see P. The goa is to estimate properties of P through processig stream L. I the foowig discussio, L is caed the samped stream, ad P is caed the origia stream. 1. Our Resuts We preset agorithms ad ower bouds for estimatig key aggregates of a data stream by processig a radomy samped substream. We cosider the basic frequecy reated aggregates, icudig the umber of distict eemets, the frequecy momets, the empirica etropy of the frequecy distributio, ad the heavy hitters. 1. Frequecy Momets: For the frequecy momets F k for k, we preset 1 + ε,δ- approximatio agorithms with space compexity 1 Õp 1 m 1 /k. This resut yieds a iterestig tradeoff betwee the sampig probabiity ad the space used by the agorithm. The smaer the sampig probabiity up to a certai miimum probabiity, the greater is the streamig space compexity of our agorithm. The agorithm is preseted i Sectio 3.. Distict Eemets: For the umber of distict eemets, F 0, we show that the curret best offie methods for estimatig F 0 from a radom sampe ca be impemeted i a streamig fashio usig very sma space. Whie it is kow that radom sampig ca sigificaty reduce the accuracy of a estimate for F 0 [7], we show that the eed to process this stream usig sma space does ot. The upper ad ower bouds are preseted i Sectio Etropy: For estimatig etropy we first show that o mutipicative approximatio is possibe i geera eve whe p is costat. However, we show that estimatig the empirica etropy o the samped stream yieds a costat factor approximatio to the etropy of the origia stream if the etropy is arger tha some vaishigy sma fuctio of p ad. These resuts are preseted i Sectio 5. 1 Where Õ otatio suppresses factors poyomia i 1/ε ad 1/δ ad factors ogarithmic i m ad.

3 Space-Efficiet Estimatio of Statistics over Sub-Samped Streams 3 4. Heavy Hitters: We show tight bouds for idetifyig a set of O1/α eemets whose frequecy exceeds αf 1/k k for k {1,}. I the case of k = 1, we show that existig heavy hitter agorithms ca be used if the stream is sufficiety og compared with p. I the case of k =, we show how to adapt ideas used i Sectio 3 to arrive at a agorithm that uses space Õ1/p. Aother way of iterpretig our resuts is i terms of time-space tradeoffs for data stream probems. Amost every streamig agorithm has a time compexity of at east, sice the agorithm reads ad processes each stream update. We show that for estimatig F k ad other probems it is uecessary to process each update; istead, it suffices for the agorithm to read each item idepedety with probabiity p, ad maitai a data structure of size Õp 1 m 1 /k. Iterestigy, the time to update the data structure per samped stream item is sti oy Õ1. The time to output a estimate at the ed of observatio is Õp 1 m 1 /k, i.e., roughy iear i the size of the data structure. As a exampe of the type of tradeoffs that are achievabe, for estimatig F if = Θm we ca set p = Θ1/ ad obtai a agorithm usig Õ tota processig time ad Õ workspace. 1.3 Reated Work There is a arge body of prior work reated at the itersectio of radom sampig ad data stream processig. Some of this work is aog the ies of methods for radom sampig from a data stream, icudig the reservoir sampig agorithm, attributed to Waterma aso see [37]. There has bee much foow up o variats ad geeraizatios of reservoir sampig, see for exampe [,16,0,30,36]. Whie this ie of work focuses o how to efficiety sampe from a stream, our work focuses o how to process a stream that has aready bee samped. Stream sampig is a we-researched method for maagig the oad o etwork moitors, whie eabig accurate measuremet. Packets are grouped ito fows based o the vaues of certai attributes withi the packet header. Oe commoy used sampig method is the samped etfow mode NF [3], which is the same as the Beroui sampig that we cosider here, where packets are samped idepedet of each other. Other methods of sampig are aso cosidered uder the geera umbrea of samped etfow, such as determiistic sampig oe of out every packets. Aother sampig method is the sampe-ad-hod mode SH [], where, oce a packet is samped from a fow, a other packets beogig that fow are aso samped. The priority sampig procedure [19] is a method for sampig from a weighted stream so that we ca get ubiased estimators of idividua weights with sma variace. Szegedy [35] has show that the priority sampig method of [19] essetiay gets the smaest possibe variace, give a fixed sampe size. I additio, various combiatios ad ehacemets to these sampig mechaisms have bee proposed [10 1, 1]. I particuar, [1] presets methods for better tuig sampig parameters ad for exportig partia summaries to sower storage, [1] presets methods that dyamicay adapt the sampig rate to achieve a desired eve of accuracy, [10] preset structure-aware sampig methods that provide improved accuracy whe compared with NF o specific rage queries of iterest, ad [11] presets stream sampig schemes for variace-optima estimatio of the tota weight of a arbitrary subset of the stream of a certai size. There is much other work aog the ies of optimizig sampig methods for accurate estimatio of a specific cass of aggregates o the origia stream. Typica aggregates of iterest icude the distributio of the umber of packets i differet fows, ad

4 4 Adrew McGregor et a. aggregates over sub-popuatios of a fows. The above ie of work taiors the sampig scheme towards specific goas, whie we cosider a simpe but geera sampig scheme, Beroui sampig, ad expore how to efficiety process data uder this sampig strategy. I may situatios, icudig with samped etfow, the sampig strategy is aready decided by a extera etity, such as the router, over which we may ot have cotro. Duffied et a. [17] cosider the estimatio of the sizes of IP fows ad the umber of IP fows i a packet stream through observig the samped stream. I a foow up work [18], they provide methods for estimatig the distributio of the sizes of the iput fows by observig sampes of the origia stream; this ca be viewed as costructig a approximate histogram. The techiques used here are maximum ikeihood estimatio, as we as protoco eve detai at the IP ad TCP eve. Other work aog this ies icudes the work o ivertig samped traffic [6] which aims to recover the distributio of the origia traffic through aayzig the sampe, ad work i [5, 13] which seeks to aswer top-k queries ad rak fows through aayzig the sampe. Whie this ie of work deas with iferece from a radom sampe i detai, it does ot cosider the issue of processig the sampe i a streamig maer usig imited space, as we do here. Further, we cosider aggregates such as frequecy momets ad etropy, which do ot seem to have bee ivestigated i detai o samped streams i prior work o etwork moitorig. I particuar, eve whe the space compexity of a agorithm is high, we preset space ower bouds that hep uderstad the exted to which these aggregates ca be estimated. Rusu ad Dobra [34] cosider the estimatio of the secod frequecy momet of a stream, equivaety, the size of the sef-joi, through processig the samped stream. Our work differs from theirs i the foowig ways. Whie [34] do ot expicity metio the space boud of their agorithm, we derived a 1 + ε,δ estimator for F based o their agorithm ad foud that the estimator took Õ1/p space. We improve the depedece o the sampig probabiity ad obtai a agorithm that oy requires Õ1/p space. This depedece o the sampig probabiity p is optima. Our techique is aso differet from theirs. Ours reies o coutig the umber of coisios i the samped stream, whie theirs reies o scaig a estimate of the secod frequecy momet of the samped stream. We aso cosider higher frequecy momets F k, for k >, as we as the etropy, whie they do ot. Bhattacharya et a. [6] cosider stream processig i the mode where the stream processor ca adaptivey skip past stream eemets, ad ook at oy a fractio of the iput stream, thus speedig up stream computatio. I their mode, the stream processor has the power to decide which eemets to see ad which to skip past, hece it is adaptive ; i our mode, the stream processor does ot have such power, ad must dea with the radomy samped stream that is preseted to it. Our mode refects the setup i curret etwork moitorig equipmet, such as Radomy Samped Netfow [9]. They preset a costat factor approximatio for F, whie we preset 1+ε,δ approximatios for a frequecy momets F k for k. Bar-Yossef [3] presets ower bouds o the sampig probabiity, or equivaety, the umber of sampes eeded to estimate certai properties of a data set, icudig the frequecy momets. This yieds a miimum sampig probabiity for the Beroui samper that we cosider, beow which it is ot possibe to estimate aggregates accuratey, whether streamig or otherwise. This is reevat to Theorem 1 i our paper, which assumes that the sampig probabiity must be at east a certai vaue. There is work o probabiistic data streams [14,8], where the data stream itsef cosists of probabiistic data, ad each eemet of the stream is a probabiity distributio over a

5 Space-Efficiet Estimatio of Statistics over Sub-Samped Streams 5 set of possibe evets. Uike i our mode, the stream processor gets to see the etire iput i the probabiistic streams mode. Remark. The preimiary coferece versio of this paper caimed matchig ower bouds for estimatig F k ad heavy hitters [3]. The caimed ower bouds cruciay deped o ower bouds obtaied i a earier work of Guha ad Huag [4]. However, a probem has bee foud with the bouds of [4]. Thus the ower boud proofs that were preseted i [3] do ot hod. Notatio ad Preimiaries Throughout this paper, we wi deote the origia egth- stream by P = a 1,a,...,a ad wi assume that each eemet a i {1,,...,m}. We deote the sampig probabiity with p. The samped stream L is costructed by icudig each a i i L with probabiity p, idepedet of the other eemets. It is assumed that the sampig probabiity p is fixed i advace ad is kow to the agorithm. Throughout et f i be the frequecy of item i i the origia stream P. Let g i be the frequecy i the sub-samped stream ad ote that g i Bi f i, p. The streams P ad L defie frequecy vectors f = f 1, f,..., f m ad g = g 1,g,...,g m respectivey. Whe cosiderig a fuctio F o a stream e.g., a frequecy momet or the etropy we wi deote FP ad FL to idicate that vaue of the fuctio o the origia ad samped stream respectivey. Whe the cotext is cear, we wi aso abuse otatio ad use F to idicate FP. We are primariy iterested i radomized mutipicative approximatios. Defiitio 1 For α > 1 ad δ [0,1], we say X is a a α,δ-estimator for X if Pr [ α 1 X/ X α ] 1 δ. We use the otatio Õ to suppress factors poyomia i 1/ε, 1/δ ad ogarithmic i. More precisey, give two fuctios f ad g ad costats ε > 0, ad δ > 0, we write f Õg to deote f Opoy1/ε, 1/δ, og g. Simiary we write f Ωg to deote f Ωpoy1/ε,1/δ,ogg. 3 Frequecy Momets I this sectio, we preset a agorithm for estimatig the kth frequecy momet F k. The mai theorem of this sectio is as foows. Theorem 1 For k, there is a oe pass streamig agorithm which observes L ad outputs a 1+ε,δ-estimator for F k P usig Õp 1 m 1 /k space, assumig p = Ωmim, 1/k. For p = õmim, 1/k there is ot eough iformatio i the samped stream to obtai a 1 + ε,δ approximatio to F k P with ay amout of space, see Theorem 4.33 of [3]. Defiitio For 1 k defie the umber of -wise coisios to be C P = m fi ad C L = m gi.

6 6 Adrew McGregor et a. Our agorithm is based o the foowig coectio betwee the th frequecy momet of a stream ad the -wise coisios i the stream. Lemma 1 For 1 k, 1 F P =! C P + β F P 1 =1 where β = j1 <...< j 1 j 1 j j. Proof The reatioship foows from! C P = = = m m f i f i 1... f i 1 m fi f i f 1 i 1 j 1 1 j 1 + fi 1 j 1 1 j 1 1 = F P β F P. =1 m f 1 i + j 1 j... 1 j 1 < j 1 1 j 1 < j 1 j 1 j m f i... The foowig emma reates the expectatio of C L to C P ad bouds the variace. Lemma For 1 k, E[C L] = p C P ad V[C L] = Op 1 F 1/. Proof Let C deote C L. Sice each -wise coisio i P appears i L with probabiity p, we have E[C] = p C P. For each i [m], et C i be the umber of -wise coisios i L amog items that equa i. The C = i [m] C i. By idepedece of the C i, V[C] = V[C i ]. i [m] Fix a i [m]. Let S i be the set of idices i the origia stream equa to i. For each J S i with J =, et X J be a idicator radom variabe if each of the stream eemets i J appears i the samped stream. The C i = J X J. Hece, V[C i ] = J,J E[X J X J ] E[X J ]E[X J ] = p J J p J,J fi = j = j=1 j=1 fi j j O f j i p j. j p j p j

7 Space-Efficiet Estimatio of Statistics over Sub-Samped Streams 7 1/ j Sice F j F 1/ for a j = 1,...,, we have V[C] = O1 j=1 F j p j = O1 j=1 j/ F p j. If we ca show that the first term of this sum domiates, the desired variace boud foows. This is the case if p F 1/ 1, sice this is the ratio of two cosecutive summads. Note that F is miimized for a fixed F 0 ad F 1 whe there are F 0 frequecies each of vaue F 1 /F 0. I this case, Hece, p 1/F 1/ F 1/ = F 0 F 1 /F 0 1/ = F 1 /F 1 1/ 0. if p F 1 1/ 0 /F 1, which hods by assumptio. We ext describe the ituitio behid our agorithm. To estimate F k P, by Eq. 1, it suffices to obtai estimates for F 1 P, F P,...,F k 1 P ad C k P oe of the caveats is that some of the coefficiets of F i P are egative, which we hade as expaied beow. Our agorithm attempts to estimate F P for = 1,,... iductivey. Sice, by Cheroff bouds, F 1 P is very cose to F 1 L/p, F 1 P ca be estimated easiy. Thus our probem reduces to estimatig C k P by observig the sub-samped stream L. Sice the expected umber of coisios i L equas p k C k P, our agorithm wi attempt to estimate C k L, the umber of k-wise coisios i the sub-samped stream. However, it is ot possibe to fid a good reative approximatio of C k L i sma space if C k L is sma. However, whe C k L is sma, it does ot cotribute sigificaty to the fia aswer ad we do ot eed a good reative error approximatio! We oy eed that our estimator does ot grossy over estimate C k L. Our agorithm to estimate C k L wi have the foowig property: If C k L is arge, the it outputs a good reative error approximatio, ad if C k L is sma the it outputs a vaue that is at most 3C k L. Aother caveat is that some of the βi s coud be egative. Thus apriori it is ot cear that our strategy of estimatig F P by estimatig F 1 P, F P,...,F k 1 P, C k P, ad appyig Equatio 1 works. However, by usig a carefu choice of approximatio errors ad the fact that F i P F j P, whe i > j, we argue that this approach succeeds i obtaiig a good approximatio of F P. 3.1 The Agorithm Defie a sequece of radom variabes φ : φ 1 = F 1L p, ad φ = C L! 1 p + β i φ i for > 1. Agorithm 1 iductivey computes a estimate φ i for each φ i. Note that if C L/p takes its expected vaue of C P ad we coud compute C L exacty, the Eq. 1 impies that the agorithm woud retur F k P exacty. Whie this is excessivey optimistic we wi show that C L/p is sufficiety cose to C P with high probabiity ad that we ca costruct a estimate for C L for C L such that the fia resut retured is sti a 1+ε approximatio for F k P with probabiity at east 1 δ.

8 8 Adrew McGregor et a. Agorithm 1: F k P 1 Compute F 1 L exacty ad set φ 1 = F 1 L/p. for = to k do 3 Let C L be a estimate for C L, computed as described i the text. 4 Compute 5 ed 6 Retur φ k. φ C = L! p + 1 βi φ i We compute our estimate of C L via a agorithm by Idyk ad Woodruff [7]. This agorithm attempts to obtai a 1 + ε 1 approximatio of C L for some vaue of ε 1 to be determied. The estimator is as foows. For i = 0,1,,... defie S i = { j [m] : η1 + ε i g j < η1 + ε i+1 } where η is radomy chose betwee 0 ad 1 ad ε = ε 1 /4. The agorithm of Idyk ad Woodruff [7] returs a estimate s i for S i ad our estimate for C L is defied as η1 + ε C L := i s i i The space used by the agorithm is Õp 1 m 1 /. We defer the detais to Sectio 3.. We ext defie a evet E that correspods to our coisio estimates beig sufficiety accurate ad the samped stream beig we-behaved. The ext emma estabishes that Pr[E ] 1 δ. We wi defer the proof uti Sectio 3.. Lemma 3 Defie the evet E = E 1 E... E k where where ε k = ε, ε 1 = E 1 : φ 1 1 ± ε 1 F 1 P E : C L/p C P ε 1 F P/! for ε A +1, ad A = 1 β i. The Pr[E ] 1 δ. The ext theorem estabishes that, coditioed o the evet E, the agorithm returs a 1 ± ε approximatio of F k P as required. Lemma 4 Coditioed o E, we have φ 1 ± ε F P for a [k]. Proof The proof is by iductio o. Sice we are coditioig o evet E ad thus evet E 1, we have that φ 1 is a 1 ± ε 1 approximatio of F 1 P. Thus the iductio hypothesis esures that φ i, 1 i 1, is a 1 ± ε i approximatio of F i P. Therefore, φ C L! F P = p +!C P + 1 β i 1 β 1 = ε 1 F P + βi ε i F i P φ i F P i F i P F P + ε 1F P + 1 β i F i P

9 Space-Efficiet Estimatio of Statistics over Sub-Samped Streams 9 where the first iequaity foows sice we are coditioig o evet E which esures that C L! p!c P ε 1F P, ad the iductio hypothesis esures that 1 βi 1 φ i βi 1 F i P β i ε i F i P. The secod equaity foows due to Equatio 1. Note that i j impies ε i ε j ad F i P F j P. Hece, by the defiitio of ε, 1 ε 1 F P + βi 1 ε i F i P ε 1 F P 1 + βi = ε F P. Therefore φ 1 ± ε F P as required. 3. Proof of Lemma 3. Our goa is to show that Pr[E 1 E... E k ] 1 δ. To do this it wi suffice to show that for each [k], Pr[E ] 1 δ/k ad appea to the uio boud. We first observe that, by Cheroff bouds, the evet E 1 happes with probabiity at east 1 δ/k. Let X i deote the 0-1 radom variabe whose vaue if 1 if the i item of the origia stream appears i the samped stream. Note that E[X i ] = 1, 1 i, ad F 1 L = X i. Sice φ 1 = F 1 L/p, we have φ 1 = X i/p. Reca that = F 1 P. Pr [ ] [ ] E 1 = Pr φ 1 F 1 P F 1 Pε 1 [ = Pr X ] i p F 1P F 1 Pε 1 [ = Pr X ] i F 1 P p pε 1 e ε 1 F 1Pp/ By Cheroff Boud δ/k The ast iequaity foows because our coditio o p impies p > poy1/εog1/δ F 1 p. To aayze Pr[E ] for k we cosider the evets: E 1 : C L/p C P ε 1F P! E : C L/p C L/p ε 1 F P.! By the triage iequaity it is easy to see that Pr [ E 1 E ] Pr[E ] ad hece it suffices to show that Pr [ E 1 ] [ ] 1 δ/k ad Pr E 1 δ/k. The first part foows easiy from the variace boud i Lemma. Lemma 5 Pr [ E 1 ] 1 δ 4k.

10 10 Adrew McGregor et a. Proof There are two cases depedig o the vaue of E[C L]. Case I: First assume E[C L] δε 1 p F 8k!. Therefore, by Lemma, we aso kow that By Markov s boud C P δε 1F 8k!. [ Pr C L ε 1 p ] F 1 δ! 4k. 3 Eq. ad Eq. 3 together impy that with probabiity at east 1 δ 4k C L/p C P max C L/p,C P ε 1F! Case II: Next assume E[C L] > δε 1 p F 8k!. By Chebyshev s boud, ad usig Lemma, we get: Pr [ C L E[C L] ε ] 1E[C L] 4V[C L] ε 1 E[C L] Dk! δ ε 4 1 pf1/ Dk! F 1 1/ 0 δ ε 1 4 p F 1 Dk! 1 δ ε 1 4 p mif 1/ 0,F 1/ 1 = Dk! 1 δ H 4 ε 4 p mif 1/ 0,F 1/ 1 δ 4k where D ad H are sufficiety arge costats. The third iequaity foows because F 1/ F 1 /F 1 1/ 0. The equaity foows because ε = H ε 1. The ast iequaity foows because our assumptio o p impies that p poy1/ε,1/δmif 0,F 1 1/k. Sice E[C L] = p C P ad C P F P/!, we have that C Pr[ L/p C P ε ] 1F P 1 δ! 4k as required. We wi ow show that E happes with high probabiity by aayzig the agorithm that computes C L. We eed the foowig resut due to Idyk ad Woodruff [7]. Reca that ε = ε 1 /4. Theorem Idyk ad Woodruff [7] Let G be the set of idices i for which S i 1 + ε i γf L poyε 1 og, 4

11 Space-Efficiet Estimatio of Statistics over Sub-Samped Streams 11 the Pr [ i G, s i 1 ± ε S i ] 1 δ 8k. For every i whether it is i G or ot s i 3 S i. Moreover, the agorithm rus i space Õ1/γ. We say that a set S i cotributes if 1 + ε i S i > C L B. where B = poyε 1 og. Give i the evet that S i cotributes hods with certai coceivaby 0 probabiity. We first show that if S i cotributes, the S i is a good set with high probabiity. More precisey, we show that for every S i that cotributes, Eq. 4 hods with high probabiity with γ = pm 1+/. Lemma 6 Suppose that C L > ε 1 p F P 4!, ad aso suppose that the evet S i cotributes happeed. The [ Pr S i 1 + ε i ] δ pf L m 1 / poyε 1 1 δ og 8k. Proof Cosider a set S i that cotributes. Note that the probabiity that η < 1/poyδ 1 ε 1 og with is at most 1/poyδ 1 ε 1 og. Without oss of geeraity we ca take this probabiity to be ess tha δ/16k. By our assumptio o C L ad the fact that S i cotributes, S i 1 + ε i ε p F P B! hods with probabiity at east 1 δ/8k. Thus S i 1 + ε i ε / p F / P p F P B! / m 1 / poyε 1 og where the secod iequaity is a appicatio of Höder s iequaity. Note that E[F L] = p F P + p1 pf 1 P pf P. Thus, a appicatio of the Markov boud, [ Pr F L 16kpF ] P 1 δ δ 16k. 5 The emma foows as the foowig iequaities hod with probabiity at east 1 δ/8k. S i 1 + ε i p F P m 1 / poyε 1 og δ p16kpf P 16km 1 / poyε 1 og δ pf L m 1 / poyε 1 By 5 og

12 1 Adrew McGregor et a. Now we are ready to prove that the evet E Lemma 7 Pr [ E ] 1 δ k Proof There are two cases depedig o the size of C L. hods with high probabiity. Case I: Assume C L ε 1 p F P 4!. By Theorem, it foows that C L 3C L. Thus C L C L C L ε 1 p F P! Case : Assume C L > ε 1 p F 4!. By Lemma 6, for every S i that cotributes, [ ] Pr S i 1 + ε i δ pf L m 1 / poyε 1 1 δ og 8k. Now by Theorem for each S i that cotributes s i 1 ± ε S i, with probabiity at east 1 δ 8k. Therefore, If E 1 is true, the: C Pr [ C L C L ε C L ] 1 δ 4k. L C Pp ± ε 1F Pp.! Sice E 1 hods with probabiity at east 1 4k δ, the foowig iequaities hod with probabiity at east 1 k δ. C L C L ε C L ε C Pp + ε 1ε F Pp! ε F Pp! + ε 1ε F Pp! F Pp ε 1 + ε 1 ε 1 4! F Pp ε 1! 4 Distict Eemets There are strog ower bouds for the accuracy of estimatig the umber of distict vaues through radom sampig. The foowig theorem is from Charikar et a. [7], which we have restated sighty to fit our otatio the origia theorem is about database tabes. Let F 0 be the umber of eemets i a data set T of tota size. Note that T maybe a stored data set, ad eed ot be processed i a oe-pass streamig maer. Theorem 3 Charikar et a. [7] Cosider ay radomized estimator ˆF 0 for the umber of distict vaues F 0 of T, that examies at most r out of the eemets i T. For ay γ > e r, there exists a choice of the iput T such that with probabiity at east γ, the mutipicative error is at east r/rγ 1.

13 Space-Efficiet Estimatio of Statistics over Sub-Samped Streams 13 The above theorem impies that if we observe o eemets of P, the it is ot possibe to get eve a estimate with a costat mutipicative error. This ower boud for the ostreamig mode eads to the foowig ower boud for samped streams. Theorem 4 F 0 Lower Boud For sampig probabiity p 0,1/1], ay agorithm that estimates F 0 by observig L, there is a iput stream such that the agorithm wi have a mutipicative error of Ω 1/ p with probabiity at east 1 e p /. Proof Let E 1 deote the evet L 6p. Let β deote the mutipicative error of ay agorithm perhaps o-streamig that estimates F 0 P by observig L. Let α = 1p. Let E deote the evet β α. Note that L is a biomia radom variabe. The expected size of the samped stream is E[ L ] = p. By usig a Cheroff boud: Pr[E 1 ] = 1 Pr[ L > 6E[ L ]] 1 6E[ L ] > 1 e p If E 1 is true, the the umber of eemets i the samped stream is o more tha 6p. Substitutig r = 6p ad γ = 1/ i Theorem 3, we get: [ ] 6p Pr[E E 1 ] Pr β > 1p E 1 1 Simpifyig, ad usig p 1/1, we get: Pr[E ] Pr[E 1 E ] = Pr[E 1 ] Pr[E E 1 ] 1 1 e p We ow describe a simpe streamig agorithm for estimatig F 0 P by observig LP, p, which has a error of O1/ p with high probabiity. Agorithm : F 0 P 1 Let X deote a 1/,δ-estimate of F 0 L, derived usig ay streamig agorithm for F 0 such as [9]. Retur X/ p Lemma 8 F 0 Upper Boud Agorithm returs a estimate Y for F 0 P such that the mutipicative error of Y is o more tha 4/ p with probabiity at east 1 δ +e pf 0P/8. Proof Let D = F 0 P, ad D L = F 0 L. Let E 1 deote the evet D L pd/, E deote X D L /, ad E 3 deote the evet X 3D L /. Let E = 3 E i. Without oss of geeraity, et 1,,...,D deote the distict items that occurred i stream P. Defie X i = 1 if at east oe copy of item i appeared i L, ad 0 otherwise. The differet X i s are a idepedet. Thus D L = D X i is a the sum of idepedet Beroui radom variabes ad E[D L ] = D Pr[X i = 1].

14 14 Adrew McGregor et a. Sice each copy of item i is icuded i D L with probabiity p, we have Pr[X i = 1] p. Thus, E[D L ] pd. Appyig a Cheroff boud, Pr [ E 1 ] = Pr [ D L < pd Suppose E is true. The we have the foowig: ] [ Pr D L < E[D ] L] e E[DL]/8 e pd/8. 6 pd 4 D L X 3D L 3D The ast iequaity is because D L is at most D. Therefore X/ p has a mutipicative error of o more tha 4/ p. We ow boud the probabiity that E is fase. Pr [ E ] 3 Pr [ E i ] δ + e pd/8 where we have used the uio boud, Eq. 6, ad the fact that X is a 1/,δ-estimator of D L. 5 Etropy I this sectio we cosider approximatig the etropy of a stream. Defiitio 3 The etropy of a frequecy vector f = f 1, f,..., f m is defied as Hf = m f i g f i where = m f i. Ufortuatey, i cotrast to F 0 ad F k, it is ot possibe to mutipicativey approximate Hf eve if p is costat. Lemma 9 No mutipicative error approximatio is possibe with probabiity 9/10 eve with p > 1/. Furthermore, 1. There exists f such that Hf = Θog/p but Hg = 0 with probabiity at east 9/10.. There exists f such that Hf Hg gp with probabiity at east 9/10. Proof First cosider the foowig two scearios for the cotets of the stream. I Sceario 1, f 1 = ad i Sceario, f 1 = k ad f = f 3 =... = f k+1 = 1. I the first case the etropy Hf = 0 whereas i the secod, Hf = k ge k + k g = k Θk/ k + k g = Θ1 + g k.

15 Space-Efficiet Estimatio of Statistics over Sub-Samped Streams 15 Distiguishig these streams requires that at east oe vaue other that 1 is preset i the subsamped stream. This happes with probabiity 1 p k > 1 pk ad hece with k = p 1 /10 this probabiity is ess tha 9/10. For the secod part of the emma cosider the stream with f 1 = f =... = f m = 1 ad hece Hf = gm. But Hg = g L where L is the umber of eemets i the samped stream. By a appicatio of the Cheroff boud L is at most pm with probabiity at east 9/10 ad the resut foows. Istead we wi show that it is possibe to approximate Hf up to a costat factor with a additioa additive error term that teds to zero if p = ω 1/3. It wi aso be coveiet to cosider the foowig quatity: H p g = m g i p g. p g i The foowig propositios estabishes that H p g is a very good approximatio to Hg. Propositio 1 With probabiity 199/00, H p g Hg = Oogm/ p. Proof By a appicatio of the Cheroff boud, with probabiity 199/00 p m g i c p for some costat c > 0. Hece, if = m g i ad γ = /p it foows that γ = 1 ± O1/ p. The H p g = m g i p g = p g i m γg i g γg i = Hg + O1/ p + OHg/ p. The ext emma estabishes that the etropy of g is withi a costat factor of the etropy of f pus a sma additive term. Lemma 10 With probabiity 99/100, if p = ω 1/3, 1. H p g OHf.. H p g Hf/ O 1 p 1/ 1/6 Proof For the first part of the emma, first ote that E[H p g] = m [ ] gi p E g p g i m E[g i ] p g p m E[g i ] = p f i p g = Hf p p f i where the iequaity foows from Jese s iequaity sice the fuctio xgx 1 is cocave. Hece, by Markov s iequaity Pr[H p g 100Hf] 99/100.

16 16 Adrew McGregor et a. To prove the secod part of the emma, defie f = cp 1 ε og for some sufficiety arge costat c ad ε 0,1. We the partitio [m] ito A = {i : f i < f } ad B = {i : f i f } ad cosider Hf = H A f + H B f where H A f i f = i A g ad H B f i f = f i i B g. f i By appicatios of the Cheroff ad uio bouds, with probabiity at east 99/300, { ε p f if i A g i p f i ε p f i if i B. Hece, Hpg B g i p = g = i B p g i i B f i 1 ± ε g = 1 ± εh B f + Oε. 1 ± ε f i For H A pg we have two cases depedig o whether i A f i is smaer or arger tha θ := cp 1 ε. If i A f i θ the H A f i f = i A g θ g. f i O the other had if i A f i θ the by a appicatio of the Cheroff boud, ad hece i A g i p i A Hpg A g i = i A p g Combiig the above cases we deduce that g p g i 1 + ε f i A f i ε p f i i A g i p 1 εg 1 + ε f i A g1 + ε f 1 ε g f i H A f. H p g 1 ε gp 1 ε og Hf Oε ε. g p Settig ε = p 1/ 1/6 we get H p g 1 p 1/ 1/6 g1/3 og og Hf Op 1/ 1/6 O g /3 Hf/ Op 1/ 1/6. Therefore, by usig a existig etropy estimatio agorithm e.g., [5] to mutipicativey estimate Hg we have a costat factor approximatio to Hf if Hf = ωp 1/ 1/6. The ext theorem foows directy from Propositio 1 ad Lemma 10. Theorem 5 It is possibe to approximate Hf up to a costat factor i Opoyogm, space if Hf = ωp 1/ 1/6.

17 Space-Efficiet Estimatio of Statistics over Sub-Samped Streams 17 6 Heavy Hitters There are two commo otios for fidig heavy hitters i a stream: the F 1 -heavy hitters, ad the F -heavy hitters. Defiitio 4 I the F k -heavy hitters probem, k {1,} we are give a stream of updates to a uderyig frequecy vector f ad parameters α,ε, ad δ. The agorithm is required to output a set S of O1/α items such that: 1 every item i for which f i αf k 1/k is icuded i S, ad ay item i for which f i < 1 εαf k 1/k is ot icuded i S. The agorithm is additioay required to output approximatios f i with i S, f i [1 ε f i,1 + ε f i ]. The overa success probabiity shoud be at east 1 δ. The ituitio behid the agorithm for heavy hitters is as foows. Suppose a item i was a F k heavy hitter i the origia stream P, i.e. f i αf k 1/k. The, by a Cheroff boud, it ca be argued that with high probabiity, g i the frequecy of i i the samped stream is cose to p f i. I such a case, it ca be show that i is aso a heavy hitter i the samped stream ad wi be detected by a agorithm that idetifies heavy hitters o the samped stream with the right choice of parameters. Simiary, it ca be argued that a item i such that f i < 1 εαf k 1/k caot reach the required frequecy threshod o the samped stream, ad wi ot be retured by the agorithm. We preset the aaysis beow assumig that the heavy hitter agorithm o the samped stream is the CoutMi sketch. Other agorithms for heavy hitters ca be used too, such as the Misra-Gries agorithm [33]; ote that the Misra- Gries agorithm works o isert-oy streams, whie the CoutMi sketch works o geera update streams, with additios as we as deetios. Theorem 6 Suppose that F 1 P Cp 1 α 1 ε og/δ for a sufficiety arge costat C > 0. There is a oe pass streamig agorithm which observes the samped stream L ad computes the F 1 heavy hitters of the origia stream P with probabiity at east 1 δ. This agorithm uses Oε 1 og /αδ bits of space. Proof The agorithm rus the CoutMiα,ε,δ agorithm of [15] for fidig the F 1 - heavy hitters probem o the samped stream, for α = 1 ε/5 α, ε = ε/, ad δ = δ/4. We retur the set S of items i foud by CoutMi, ad we scae each of the f i by 1/p. Reca that g i the frequecy of item i i the samped stream L. The for sufficiety arge C > 0 give i the theorem statemet, for ay i, by a Cheroff boud, [ { Pr g i > max p 1 + ε f i, 5 C }] ε og δ δ 4. By a uio boud, with probabiity at east 1 δ/4, for a i [], { g i max p 1 + ε f i, 5 C } ε og. 7 δ

18 18 Adrew McGregor et a. We aso eed the property that if f i 1 εαf 1 P, the g i p1 ε/5 f i. For such i, by the premise of the theorem we have E[g i ] p1 εαf 1 P C1 εε og/δ. Hece, for sufficiety arge C, appyig a Cheroff ad a uio boud is eough to cocude that with probabiity at east 1 δ/4, for a such i, g i p1 ε/5 f i. We set the parameter δ of CoutMi to equa δ/4, ad so CoutMi succeeds with probabiity at east 1 δ/4. Aso, E[[F 1 L] = pf 1 P Cα 1 ε og/δ, the iequaity foowig from the premise of the theorem. By a Cheroff boud, [ Pr 1 ε pf 1 P F 1 L 1 + ε ] pf 1 P 1 δ By a uio boud, a evets discussed thus far joity occur with probabiity at east 1 δ, ad we coditio o their joit occurrece i the remaider of the proof. Lemma 11 If f i αf 1 P, the g i 1 ε/5 αf 1 L. If f i < 1 εαf 1 P, the g i 1 ε/αf 1 L. Proof Sice g i p1 ε/5 f i ad aso F 1 L p1 + ε/5f 1 P. Hece, g i 1 ε/5 1 + ε/5 αf 1L 1 ε/5 αf 1 L. Next cosider ay i for which f i < 1 εαf 1 P. The { g i max p 1 + ε 1 εαf 1 P, 5 { max 1 3ε αf 1 L, 5 { max { max 1 ε 1 ε 1 ε αf 1 L. C } ε og δ C } ε og δ αf 1 L, α } E[F 1L] αf 1 L, 1 + ε α } 5 F 1L It foows that by settig α = 1 ε/5 α ad ε = ε/, CoutMiα,ε,δ does ot retur ay i S for which f i < 1 εαf 1 P, sice for such i we have g i 1 ε/αf 1 L, ad so g i < 1 ε/10α F 1 L. O the other had, for every i S for which f i αf 1 P, we have i S, sice for such i we have g i α F 1 L. It remais to show that for every i S, we have f i [1 ε f i,1+ε f i ]. By the previous paragraph, for such i we have f i 1 εαf 1 P. By the above coditioig, this meas

19 Space-Efficiet Estimatio of Statistics over Sub-Samped Streams 19 that g i p1 ε/5 f i. We wi aso have g i p1 + ε/5 f i if p 1 + ε 5 fi C og ε δ. Sice f i 1 εαf 1 P, this i tur hods if F 1 P 1 1 ε1 + ε/5 Cp 1 α 1 ε og, δ which hods by the theorem premise provided ε is ess tha a sufficiety sma costat. This competes the proof. Theorem 7 Suppose that F 1/ Cp 3/ α 1 ε og/δ ad p = Ωm 1/. There is a oe pass streamig agorithm which observes the samped stream L ad computes α, 1 p 1/ 1 ε F -heavy hitters of the origia stream with high probabiity. Proof The agorithm rus the CoutSketchα,ε,δ agorithm [8] for fidig the F -heavy hitters o the samped stream, for appropriate α,ε, ad δ specified beow. We retur the set S of items i foud by CoutSketch. As before we ca show that if f i 1 εαf 1/, the with probabiity at east 1 δ/4, g i p1 ε/5 f i. Next we boud the variace of F L. Sice each g i is draw from a biomia distributio Bi f i, p o f i items with probabiity p, Moreover, E[F L] = Var[F L] = E [ g i ] = Var[g i ] It is kow that the 4-th momet of Bi f i, p is p fi + p1 p f i = p F P + p1 pf 1 P. E [ g i 4] p fi + p1 p f i E [ g i 4] p 4 f 4 i. f i p1 7p + 7 f i p + 1p 18 f i p + 6 f i p 6p f i p 3 6 f i p 3 + f 3 i p 3, ad subtractig p 4 f 4 i from this, we obtai f i p 7 f i p + 7 f i p + 1 f i p 3 18 f i p f 3 i p 3 6 f i p f i p 4 6 f 3 i p 4 + f 4 i p 4 f 4 i p 4, which is O f i p + fi p + fi 3 p 3. Hece, Var[F L] = OpF 1 + p F P + p 3 F 3 P. By Chebyshev s iequaity, Pr [ F L E[F L] ε p ] pf1 + p F + p 3 F 3 F = O ε p 4 F 1 = O ε + 1 pf ε + pf3/ F ε F F1 = O ε pf Thus with probabiity at east 1 δ/4 1 ε pf 1/ F L 1/ p 1/ F 1/. 5 = O + 1 ε + pf 3 F ε F 1 ε + 1 pf ε + F By uio boud a evets discussed so far joity occur with probabiity at east 1 δ, ad we coditio o them occurrig i the remaider of the aaysis. p ε F 1/

20 0 Adrew McGregor et a. Suppose that f i αf 1/ i the origia stream. The g i p1 ε/5 f i αf 1/ p1 ε/5 α p 1/ 1 ε/5f 1/ L Next cosider ay i for which f i < 1 εp 1/ αf 1/. The { g i max p 1 + ε 1 εp 1/ αf 1/ P, 5 { max 1 + ε 1 3ε 5 1 4ε ε C } ε og δ 1 ε p 3/ αf 1/ P, 5 p 3/ F 1/ P 5 1 ε p 1/ αf L 1/ C ε og δ } It foows that by settig α = 1 ε/5 α p 1/, δ = δ/4, ad ε = ε/10, CoutSketchα,ε,δ does ot retur ay i S for which f i < 1 εp 1/ αf 1/ P, sice for such i we have g i 1 ε/p 1/ αf L 1/. O the other had, for every i S for which f i αf 1/, we have i S, sice for such i we have g i α F L 1/. 7 Cocusio We preseted sma-space stream agorithms ad ower bouds for estimatig fuctios of iterest whe observig a radom sampe of the origia stream. The are umerous directios for future work, ad we metio some of them. As we have see, our resuts impy time/space tradeoffs for severa atura streamig probems. What other data stream probems have iterestig time/space tradeoffs? Aso, we have so far assumed that the sampig probabiity p is fixed, ad that the agorithm has o cotro over it. Suppose this was ot the case, ad the agorithm ca chage the sampig probabiity i a adaptive maer, depedig o the curret state of the stream. Is it possibe to get agorithms that ca observe fewer eemets overa ad get the same accuracy as our agorithms? For which precise modes ad probems is adaptivity usefu? It is aso iterestig to obtai matchig space ower bouds for the case of estimatig frequecy momets. Refereces 1. Ao, N., Matias, Y., Szegedy, M.: The Space Compexity of Approximatig the Frequecy Momets. Joura of Computer ad System Scieces 581, Babcock, B., Datar, M., Motwai, R.: Sampig from a movig widow over streamig data. I: Proc. ACM-SIAM Symposium o Discrete Agorithms SODA, pp Bar-Yossef, Z.: The compexity of massive dataset computatios. Ph.D. thesis, Uiversity of Caiforia at Berkeey 00

21 Space-Efficiet Estimatio of Statistics over Sub-Samped Streams 1 4. Bar-Yossef, Z.: Sampig ower bouds via iformatio theory. I: Proc. 35th Aua ACM Symposium o Theory Of Computig STOC, pp Barakat, C., Iaaccoe, G., Diot, C.: Rakig fows from samped traffic. I: Proc. ACM Coferece o Emergig Network Experimet ad Techoogy CoNEXT, pp Bhattacharyya, S., Madeira, A., Muthukrisha, S., Ye, T.: How to scaaby ad accuratey skip past streams. I: Proc. 3rd Iteratioa Coferece o Data Egieerig ICDE Workshops, pp Charikar, M., Chaudhuri, S., Motwai, R., Narasayya, V.R.: Towards estimatio error guaratees for distict vaues. I: Proc. 19th ACM Symposium o Pricipes of Database Systems PODS, pp Charikar, M., Che, K., Farach-Coto, M.: Fidig frequet items i data streams. Theoretica Computer Sciece 311, Cisco Systems: Radom Samped NetFow. feature/guide/fstatsa.htm 10. Cohe, E., Cormode, G., Duffied, N.G.: Structure-aware sampig: Fexibe ad accurate summarizatio. Proceedigs of the VLDB Edowmet 411, Cohe, E., Duffied, N.G., Kapa, H., Lud, C., Thorup, M.: Efficiet stream sampig for variaceoptima estimatio of subset sums. SIAM J. Comput. 405, Cohe, E., Duffied, N.G., Kapa, H., Lud, C., Thorup, M.: Agorithms ad estimators for summarizatio of uaggregated data streams. Joura of Computer ad System Scieces 807, Cohe, E., Grossaug, N., Kapa, H.: Processig top-k queries from sampes. Computer Networks 514, Cormode, G., Garofaakis, M.: Sketchig probabiistic data streams. I: Proc. 6th ACM Iteratioa Coferece o Maagemet of Data SIGMOD, pp Cormode, G., Muthukrisha, S.: A improved data stream summary: the cout-mi sketch ad its appicatios. Joura of Agorithms 551, Cormode, G., Muthukrisha, S., Yi, K., Zhag, Q.: Optima sampig from distributed streams. I: Proc. ACM Symposium o Pricipes of Database Systems PODS, pp Duffied, N.G., Lud, C., Thorup, M.: Properties ad predictio of fow statistics from samped packet streams. I: Proc. Iteret Measuremet Workshop, pp Duffied, N.G., Lud, C., Thorup, M.: Estimatig fow distributios from samped fow statistics. IEEE/ACM Trasactios o Networkig 135, Duffied, N.G., Lud, C., Thorup, M.: Priority sampig for estimatio of arbitrary subset sums. Joura of the ACM Efraimidis, P., Spirakis, P.G.: Weighted radom sampig with a reservoir. Iformatio Processig Letters 975, Esta, C., Keys, K., Moore, D., Varghese, G.: Buidig a better etfow. I: Proc. ACM Coferece o Appicatios, Techoogies, Architectures, ad Protocos for Computer Commuicatio SIGCOMM, pp Esta, C., Varghese, G.: New directios i traffic measuremet ad accoutig. I: Proc. ACM Coferece o Appicatios, Techoogies, Architectures, ad Protocos for Computer Commuicatio SIG- COMM, pp Gibbos, P.B., Matias, Y.: New sampig-based summary statistics for improvig approximate query aswers. I: Proc. ACM SIGMOD Iteratioa Coferece o Maagemet of Data, pp Guha, S., Huag, Z.: Revisitig the direct sum theorem ad space ower bouds i radom order streams. I: Automata, Laguages ad Programmig, 36th Iteratioa Cooquium, ICALP 1, pp Harvey, N.J.A., Neso, J., Oak, K.: Sketchig ad streamig etropy via approximatio theory. I: PRoc. 49th IEEE Coferece o Foudatios Of Computer Sciece FOCS, pp Hoh, N., Veitch, D.: Ivertig samped traffic. IEEE/ACM Trasactios o Networkig 141, Idyk, P., Woodruff, D.P.: Optima approximatios of the frequecy momets of data streams. I: Proc. 37th Aua ACM Symposium o Theory of Computig STOC, pp Jayram, T.S., McGregor, A., Muthukrisha, S., Vee, E.: Estimatig statistica aggregates o probabiistic data streams. ACM Trasactios o Database Systems 33, 6:1 6: Kae, D.M., Neso, J., Woodruff, D.P.: O the exact space compexity of sketchig ad streamig sma orms. I: Proc. 1st ACM-SIAM Symposium o Discrete Agorithms SODA, pp Lahiri, B., Tirthapura, S.: Stream sampig. I: L. Liu, M.T. Özsu eds. Ecycopedia of Database Systems, pp Spriger US 009

22 Adrew McGregor et a. 31. McGregor, A. ed.: Ope Probems i Data Streams ad Reated Topics iitk.ac.i/users/sgaguy/data-stream-probs.pdf 3. McGregor, A., Pava, A., Tirthapura, S., Woodruff, D.: Space-efficiet estimatio of statistics over subsamped streams. I: Proc. 31st ACM Symposium o Pricipes of Database Systems PODS, pp Misra, J., Gries, D.: Fidig repeated eemets. Sciece of Computer Programmig, Rusu, F., Dobra, A.: Sketchig samped data streams. I: Proc. 5th IEEE Iteratioa Coferece o Data Egieerig ICDE, pp Szegedy, M.: The dt priority sampig is essetiay optima. I: Proc. Aua ACM Symposium o Theory of Computig STOC, pp Tirthapura, S., Woodruff, D.P.: Optima radom sampig from distributed streams revisited. I: Proc. Iteratioa Symposium o Distributed Computig DISC, pp Vitter, J.S.: Radom sampig with a reservoir. ACM Trasactios o Mathematica Software 111,

The Sample Complexity of Exploration in the Multi-Armed Bandit Problem

The Sample Complexity of Exploration in the Multi-Armed Bandit Problem Joura of Machie Learig Research 5 004) 63-648 Submitted 1/04; Pubished 6/04 The Sampe Compexity of Exporatio i the Muti-Armed Badit Probem Shie Maor Joh N. Tsitsikis Laboratory for Iformatio ad Decisio

More information

I. Chi-squared Distributions

I. Chi-squared Distributions 1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

More information

Properties of MLE: consistency, asymptotic normality. Fisher information.

Properties of MLE: consistency, asymptotic normality. Fisher information. Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may

More information

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value

More information

A probabilistic proof of a binomial identity

A probabilistic proof of a binomial identity A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two

More information

Hypothesis testing. Null and alternative hypotheses

Hypothesis testing. Null and alternative hypotheses Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

More information

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5

More information

Chapter 7 Methods of Finding Estimators

Chapter 7 Methods of Finding Estimators Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of

More information

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas: Chapter 7 - Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries

More information

CHAPTER FIVE Network Hydraulics

CHAPTER FIVE Network Hydraulics . ETWOR YDRAULICSE CATER IVE Network ydrauics The fudameta reatioships of coservatio of mass ad eergy mathematicay describe the fow ad pressure distributio withi a pipe etwork uder steady state coditios.

More information

Department of Computer Science, University of Otago

Department of Computer Science, University of Otago Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly

More information

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

Incremental calculation of weighted mean and variance

Incremental calculation of weighted mean and variance Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically

More information

The Stable Marriage Problem

The Stable Marriage Problem The Stable Marriage Problem William Hut Lae Departmet of Computer Sciece ad Electrical Egieerig, West Virgiia Uiversity, Morgatow, WV William.Hut@mail.wvu.edu 1 Itroductio Imagie you are a matchmaker,

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5 Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.

More information

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee

More information

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria

More information

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Case Study. Normal and t Distributions. Density Plot. Normal Distributions Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca

More information

MARTINGALES AND A BASIC APPLICATION

MARTINGALES AND A BASIC APPLICATION MARTINGALES AND A BASIC APPLICATION TURNER SMITH Abstract. This paper will develop the measure-theoretic approach to probability i order to preset the defiitio of martigales. From there we will apply this

More information

5 Boolean Decision Trees (February 11)

5 Boolean Decision Trees (February 11) 5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected

More information

Irreducible polynomials with consecutive zero coefficients

Irreducible polynomials with consecutive zero coefficients Irreducible polyomials with cosecutive zero coefficiets Theodoulos Garefalakis Departmet of Mathematics, Uiversity of Crete, 71409 Heraklio, Greece Abstract Let q be a prime power. We cosider the problem

More information

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics

More information

Lecture 2: Karger s Min Cut Algorithm

Lecture 2: Karger s Min Cut Algorithm priceto uiv. F 3 cos 5: Advaced Algorithm Desig Lecture : Karger s Mi Cut Algorithm Lecturer: Sajeev Arora Scribe:Sajeev Today s topic is simple but gorgeous: Karger s mi cut algorithm ad its extesio.

More information

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009) 18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the Bru-Mikowski iequality for boxes. Today we ll go over the

More information

A short note on quantile and expectile estimation in unequal probability samples

A short note on quantile and expectile estimation in unequal probability samples Cataogue o. 2-00-X ISS 492-092 Survey Methodoogy A short ote o quatie ad expectie estimatio i uequa probabiity sampes by Lida Schuze Watrup ad Göra Kauerma eease date: Jue 22, 206 How to obtai more iformatio

More information

Chapter 5 O A Cojecture Of Erdíos Proceedigs NCUR VIII è1994è, Vol II, pp 794í798 Jeærey F Gold Departmet of Mathematics, Departmet of Physics Uiversity of Utah Do H Tucker Departmet of Mathematics Uiversity

More information

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations CS3A Hadout 3 Witer 00 February, 00 Solvig Recurrece Relatios Itroductio A wide variety of recurrece problems occur i models. Some of these recurrece relatios ca be solved usig iteratio or some other ad

More information

1 Computing the Standard Deviation of Sample Means

1 Computing the Standard Deviation of Sample Means Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.

More information

Overview of some probability distributions.

Overview of some probability distributions. Lecture Overview of some probability distributios. I this lecture we will review several commo distributios that will be used ofte throughtout the class. Each distributio is usually described by its probability

More information

Universal coding for classes of sources

Universal coding for classes of sources Coexios module: m46228 Uiversal codig for classes of sources Dever Greee This work is produced by The Coexios Project ad licesed uder the Creative Commos Attributio Licese We have discussed several parametric

More information

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length Joural o Satisfiability, Boolea Modelig ad Computatio 1 2005) 49-60 A Faster Clause-Shorteig Algorithm for SAT with No Restrictio o Clause Legth Evgey Datsi Alexader Wolpert Departmet of Computer Sciece

More information

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx SAMPLE QUESTIONS FOR FINAL EXAM REAL ANALYSIS I FALL 006 3 4 Fid the followig usig the defiitio of the Riema itegral: a 0 x + dx 3 Cosider the partitio P x 0 3, x 3 +, x 3 +,......, x 3 3 + 3 of the iterval

More information

CHAPTER 3 THE TIME VALUE OF MONEY

CHAPTER 3 THE TIME VALUE OF MONEY CHAPTER 3 THE TIME VALUE OF MONEY OVERVIEW A dollar i the had today is worth more tha a dollar to be received i the future because, if you had it ow, you could ivest that dollar ad ear iterest. Of all

More information

How To Solve The Homewor Problem Beautifully

How To Solve The Homewor Problem Beautifully Egieerig 33 eautiful Homewor et 3 of 7 Kuszmar roblem.5.5 large departmet store sells sport shirts i three sizes small, medium, ad large, three patters plaid, prit, ad stripe, ad two sleeve legths log

More information

A Recursive Formula for Moments of a Binomial Distribution

A Recursive Formula for Moments of a Binomial Distribution A Recursive Formula for Momets of a Biomial Distributio Árpád Béyi beyi@mathumassedu, Uiversity of Massachusetts, Amherst, MA 01003 ad Saverio M Maago smmaago@psavymil Naval Postgraduate School, Moterey,

More information

5: Introduction to Estimation

5: Introduction to Estimation 5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample

More information

Perfect Packing Theorems and the Average-Case Behavior of Optimal and Online Bin Packing

Perfect Packing Theorems and the Average-Case Behavior of Optimal and Online Bin Packing SIAM REVIEW Vol. 44, No. 1, pp. 95 108 c 2002 Society for Idustrial ad Applied Mathematics Perfect Packig Theorems ad the Average-Case Behavior of Optimal ad Olie Bi Packig E. G. Coffma, Jr. C. Courcoubetis

More information

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S CONTROL CHART FOR THE CHANGES IN A PROCESS Supraee Lisawadi Departmet of Mathematics ad Statistics, Faculty of Sciece ad Techoology, Thammasat

More information

Maximum Likelihood Estimators.

Maximum Likelihood Estimators. Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio

More information

Estimating Probability Distributions by Observing Betting Practices

Estimating Probability Distributions by Observing Betting Practices 5th Iteratioal Symposium o Imprecise Probability: Theories ad Applicatios, Prague, Czech Republic, 007 Estimatig Probability Distributios by Observig Bettig Practices Dr C Lych Natioal Uiversity of Irelad,

More information

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006 Exam format UC Bereley Departmet of Electrical Egieerig ad Computer Sciece EE 6: Probablity ad Radom Processes Solutios 9 Sprig 006 The secod midterm will be held o Wedesday May 7; CHECK the fial exam

More information

INVESTMENT PERFORMANCE COUNCIL (IPC)

INVESTMENT PERFORMANCE COUNCIL (IPC) INVESTMENT PEFOMANCE COUNCIL (IPC) INVITATION TO COMMENT: Global Ivestmet Performace Stadards (GIPS ) Guidace Statemet o Calculatio Methodology The Associatio for Ivestmet Maagemet ad esearch (AIM) seeks

More information

Lesson 15 ANOVA (analysis of variance)

Lesson 15 ANOVA (analysis of variance) Outlie Variability -betwee group variability -withi group variability -total variability -F-ratio Computatio -sums of squares (betwee/withi/total -degrees of freedom (betwee/withi/total -mea square (betwee/withi

More information

LECTURE 13: Cross-validation

LECTURE 13: Cross-validation LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M

More information

3 Basic Definitions of Probability Theory

3 Basic Definitions of Probability Theory 3 Basic Defiitios of Probability Theory 3defprob.tex: Feb 10, 2003 Classical probability Frequecy probability axiomatic probability Historical developemet: Classical Frequecy Axiomatic The Axiomatic defiitio

More information

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Istructor: Nicolas Christou Three importat distributios: Distributios related to the ormal distributio Chi-square (χ ) distributio.

More information

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu Multi-server Optimal Badwidth Moitorig for QoS based Multimedia Delivery Aup Basu, Iree Cheg ad Yizhe Yu Departmet of Computig Sciece U. of Alberta Architecture Applicatio Layer Request receptio -coectio

More information

Infinite Sequences and Series

Infinite Sequences and Series CHAPTER 4 Ifiite Sequeces ad Series 4.1. Sequeces A sequece is a ifiite ordered list of umbers, for example the sequece of odd positive itegers: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29...

More information

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k. 18.409 A Algorithmist s Toolkit September 17, 009 Lecture 3 Lecturer: Joatha Keler Scribe: Adre Wibisoo 1 Outlie Today s lecture covers three mai parts: Courat-Fischer formula ad Rayleigh quotiets The

More information

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics Chair for Network Architectures ad Services Istitute of Iformatics TU Müche Prof. Carle Network Security Chapter 2 Basics 2.4 Radom Number Geeratio for Cryptographic Protocols Motivatio It is crucial to

More information

1. C. The formula for the confidence interval for a population mean is: x t, which was

1. C. The formula for the confidence interval for a population mean is: x t, which was s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : p-value

More information

Confidence Intervals for One Mean

Confidence Intervals for One Mean Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis Ruig Time ( 3.) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

Normal Distribution.

Normal Distribution. Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued

More information

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the

More information

Tradigms of Astundithi and Toyota

Tradigms of Astundithi and Toyota Tradig the radomess - Desigig a optimal tradig strategy uder a drifted radom walk price model Yuao Wu Math 20 Project Paper Professor Zachary Hamaker Abstract: I this paper the author iteds to explore

More information

Convention Paper 6764

Convention Paper 6764 Audio Egieerig Society Covetio Paper 6764 Preseted at the 10th Covetio 006 May 0 3 Paris, Frace This covetio paper has bee reproduced from the author's advace mauscript, without editig, correctios, or

More information

1. MATHEMATICAL INDUCTION

1. MATHEMATICAL INDUCTION 1. MATHEMATICAL INDUCTION EXAMPLE 1: Prove that for ay iteger 1. Proof: 1 + 2 + 3 +... + ( + 1 2 (1.1 STEP 1: For 1 (1.1 is true, sice 1 1(1 + 1. 2 STEP 2: Suppose (1.1 is true for some k 1, that is 1

More information

Asymptotic Growth of Functions

Asymptotic Growth of Functions CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

THE HEIGHT OF q-binary SEARCH TREES

THE HEIGHT OF q-binary SEARCH TREES THE HEIGHT OF q-binary SEARCH TREES MICHAEL DRMOTA AND HELMUT PRODINGER Abstract. q biary search trees are obtaied from words, equipped with the geometric distributio istead of permutatios. The average

More information

Determining the sample size

Determining the sample size Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors

More information

Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows:

Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows: Subettig Subettig is used to subdivide a sigle class of etwork i to multiple smaller etworks. Example: Your orgaizatio has a Class B IP address of 166.144.0.0 Before you implemet subettig, the Network

More information

CHAPTER 3 DIGITAL CODING OF SIGNALS

CHAPTER 3 DIGITAL CODING OF SIGNALS CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity

More information

THE ABRACADABRA PROBLEM

THE ABRACADABRA PROBLEM THE ABRACADABRA PROBLEM FRANCESCO CARAVENNA Abstract. We preset a detailed solutio of Exercise E0.6 i [Wil9]: i a radom sequece of letters, draw idepedetly ad uiformly from the Eglish alphabet, the expected

More information

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL. Auities Uder Radom Rates of Iterest II By Abraham Zas Techio I.I.T. Haifa ISRAEL ad Haifa Uiversity Haifa ISRAEL Departmet of Mathematics, Techio - Israel Istitute of Techology, 3000, Haifa, Israel I memory

More information

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction THE ARITHMETIC OF INTEGERS - multiplicatio, expoetiatio, divisio, additio, ad subtractio What to do ad what ot to do. THE INTEGERS Recall that a iteger is oe of the whole umbers, which may be either positive,

More information

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample

More information

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature. Itegrated Productio ad Ivetory Cotrol System MRP ad MRP II Framework of Maufacturig System Ivetory cotrol, productio schedulig, capacity plaig ad fiacial ad busiess decisios i a productio system are iterrelated.

More information

Section 11.3: The Integral Test

Section 11.3: The Integral Test Sectio.3: The Itegral Test Most of the series we have looked at have either diverged or have coverged ad we have bee able to fid what they coverge to. I geeral however, the problem is much more difficult

More information

Lesson 17 Pearson s Correlation Coefficient

Lesson 17 Pearson s Correlation Coefficient Outlie Measures of Relatioships Pearso s Correlatio Coefficiet (r) -types of data -scatter plots -measure of directio -measure of stregth Computatio -covariatio of X ad Y -uique variatio i X ad Y -measurig

More information

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth Questio 1: What is a ordiary auity? Let s look at a ordiary auity that is certai ad simple. By this, we mea a auity over a fixed term whose paymet period matches the iterest coversio period. Additioally,

More information

TO: Users of the ACTEX Review Seminar on DVD for SOA Exam MLC

TO: Users of the ACTEX Review Seminar on DVD for SOA Exam MLC TO: Users of the ACTEX Review Semiar o DVD for SOA Eam MLC FROM: Richard L. (Dick) Lodo, FSA Dear Studets, Thak you for purchasig the DVD recordig of the ACTEX Review Semiar for SOA Eam M, Life Cotigecies

More information

Supervised Rank Aggregation

Supervised Rank Aggregation Sessio: Search Quaity ad Precisio Supervised Rak Aggregatio Yu-Tig Liu,*, Tie-Ya Liu, Tao Qi,3*, Zhi-Mig Ma 4, ad Hag Li Microsoft Research Asia 4F, Sigma Ceter, No. 49, Zhichu Road, Haidia District, Beijig,

More information

Trackless online algorithms for the server problem

Trackless online algorithms for the server problem Iformatio Processig Letters 74 (2000) 73 79 Trackless olie algorithms for the server problem Wolfgag W. Bei,LawreceL.Larmore 1 Departmet of Computer Sciece, Uiversity of Nevada, Las Vegas, NV 89154, USA

More information

Measures of Spread and Boxplots Discrete Math, Section 9.4

Measures of Spread and Boxplots Discrete Math, Section 9.4 Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,

More information

3. Greatest Common Divisor - Least Common Multiple

3. Greatest Common Divisor - Least Common Multiple 3 Greatest Commo Divisor - Least Commo Multiple Defiitio 31: The greatest commo divisor of two atural umbers a ad b is the largest atural umber c which divides both a ad b We deote the greatest commo gcd

More information

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork Solutios to Selected Problems I: Patter Classificatio by Duda, Hart, Stork Joh L. Weatherwax February 4, 008 Problem Solutios Chapter Bayesia Decisio Theory Problem radomized rules Part a: Let Rx be the

More information

Optimal Adaptive Bandwidth Monitoring for QoS Based Retrieval

Optimal Adaptive Bandwidth Monitoring for QoS Based Retrieval 1 Optimal Adaptive Badwidth Moitorig for QoS Based Retrieval Yizhe Yu, Iree Cheg ad Aup Basu (Seior Member) Departmet of Computig Sciece Uiversity of Alberta Edmoto, AB, T6G E8, CAADA {yizhe, aup, li}@cs.ualberta.ca

More information

Simple Annuities Present Value.

Simple Annuities Present Value. Simple Auities Preset Value. OBJECTIVES (i) To uderstad the uderlyig priciple of a preset value auity. (ii) To use a CASIO CFX-9850GB PLUS to efficietly compute values associated with preset value auities.

More information

The Fundamental Capacity-Delay Tradeoff in Large Mobile Ad Hoc Networks

The Fundamental Capacity-Delay Tradeoff in Large Mobile Ad Hoc Networks The Fudametal Capacity-Delay Tradeoff i Large Mobile Ad Hoc Networks Xiaoju Li ad Ness B. Shroff School of Electrical ad Computer Egieerig, Purdue Uiversity West Lafayette, IN 47907, U.S.A. {lix, shroff}@ec.purdue.edu

More information

Performance Modelling of W-CDMA Networks Supporting Elastic and Adaptive Traffic

Performance Modelling of W-CDMA Networks Supporting Elastic and Adaptive Traffic Performace Modeig of W-CDMA Networks Supportig Eastic ad Adaptive Traffic Georgios A. Kaos, Vassiios G. Vassiakis, Ioais D. Moschoios ad Michae D. Logothetis* WCL, Dept. of Eectrica & Computer Egieerig,

More information

Chapter 7: Confidence Interval and Sample Size

Chapter 7: Confidence Interval and Sample Size Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum

More information

Math C067 Sampling Distributions

Math C067 Sampling Distributions Math C067 Samplig Distributios Sample Mea ad Sample Proportio Richard Beigel Some time betwee April 16, 2007 ad April 16, 2007 Examples of Samplig A pollster may try to estimate the proportio of voters

More information

Capacity of Wireless Networks with Heterogeneous Traffic

Capacity of Wireless Networks with Heterogeneous Traffic Capacity of Wireless Networks with Heterogeeous Traffic Migyue Ji, Zheg Wag, Hamid R. Sadjadpour, J.J. Garcia-Lua-Aceves Departmet of Electrical Egieerig ad Computer Egieerig Uiversity of Califoria, Sata

More information

Plug-in martingales for testing exchangeability on-line

Plug-in martingales for testing exchangeability on-line Plug-i martigales for testig exchageability o-lie Valetia Fedorova, Alex Gammerma, Ilia Nouretdiov, ad Vladimir Vovk Computer Learig Research Cetre Royal Holloway, Uiversity of Lodo, UK {valetia,ilia,alex,vovk}@cs.rhul.ac.uk

More information

ODBC. Getting Started With Sage Timberline Office ODBC

ODBC. Getting Started With Sage Timberline Office ODBC ODBC Gettig Started With Sage Timberlie Office ODBC NOTICE This documet ad the Sage Timberlie Office software may be used oly i accordace with the accompayig Sage Timberlie Office Ed User Licese Agreemet.

More information

Chapter 14 Nonparametric Statistics

Chapter 14 Nonparametric Statistics Chapter 14 Noparametric Statistics A.K.A. distributio-free statistics! Does ot deped o the populatio fittig ay particular type of distributio (e.g, ormal). Sice these methods make fewer assumptios, they

More information

PSYCHOLOGICAL STATISTICS

PSYCHOLOGICAL STATISTICS UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc. Cousellig Psychology (0 Adm.) IV SEMESTER COMPLEMENTARY COURSE PSYCHOLOGICAL STATISTICS QUESTION BANK. Iferetial statistics is the brach of statistics

More information

Sequences and Series

Sequences and Series CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

More information

Lecture 4: Cheeger s Inequality

Lecture 4: Cheeger s Inequality Spectral Graph Theory ad Applicatios WS 0/0 Lecture 4: Cheeger s Iequality Lecturer: Thomas Sauerwald & He Su Statemet of Cheeger s Iequality I this lecture we assume for simplicity that G is a d-regular

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here). BEGINNING ALGEBRA Roots ad Radicals (revised summer, 00 Olso) Packet to Supplemet the Curret Textbook - Part Review of Square Roots & Irratioals (This portio ca be ay time before Part ad should mostly

More information

3D Partitioning for Interference and Area Minimization

3D Partitioning for Interference and Area Minimization D Partitioig for Iterferece ad Area Miimizatio Hsi-Hsiug Huag ad Tsai-Mig Hsieh Abstract This work defies a ove probem i which a set of modues is assiged to a set of siico ayers i order to miimize the

More information

Research Article Sign Data Derivative Recovery

Research Article Sign Data Derivative Recovery Iteratioal Scholarly Research Network ISRN Applied Mathematics Volume 0, Article ID 63070, 7 pages doi:0.540/0/63070 Research Article Sig Data Derivative Recovery L. M. Housto, G. A. Glass, ad A. D. Dymikov

More information

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1) BASIC STATISTICS. SAMPLES, RANDOM SAMPLING AND SAMPLE STATISTICS.. Radom Sample. The radom variables X,X 2,..., X are called a radom sample of size from the populatio f(x if X,X 2,..., X are mutually idepedet

More information

A Mathematical Perspective on Gambling

A Mathematical Perspective on Gambling A Mathematical Perspective o Gamblig Molly Maxwell Abstract. This paper presets some basic topics i probability ad statistics, icludig sample spaces, probabilistic evets, expectatios, the biomial ad ormal

More information