Stable Distributions, Pseudorandom Generators, Embeddings, and Data Stream Computation

Transcription

1 Stable Dstrbutons, Pseudorandom Generators, Embeddngs, and Data Stream Computaton PIOTR INDYK MIT, Cambrdge, Massachusetts Abstract. In ths artcle, we show several results obtaned by combnng the use of stable dstrbutons wth pseudorandom generators for bounded space. In partcular: We show that, for any p (0, 2, one can mantan (usng only O(log n/ɛ 2 ) words of storage) a sketch C(q) of a pont q l n p under dynamc updates of ts coordnates. The sketch has the property that, gven C(q) and C(s), one can estmate q s p up to a factor of (1+ɛ) wth large probablty. Ths solves the man open problem of Fegenbaum et al. [1999. We show that the aforementoned sketchng approach drectly translates nto an approxmate algorthm that, for a fxed lnear mappng A, and gven x R n and y R m, estmates Ax y p n O(n + m) tme, for any p (0, 2. Ths generalzes an earler algorthm of Wasserman and Blum [1997 whch worked for the case p = 2. We obtan another sketch functon C whch probablstcally embeds l1 n nto a normed space lm 1. The embeddng guarantees that, f we set m = log(1/δ) O(1/ɛ), then for any par of ponts q, s l1 n, the dstance between q and s does not ncrease by more than (1 + ɛ) wth constant probablty, and t does not decrease by more than (1 ɛ) wth probablty 1 δ. Ths s the only known dmensonalty reducton theorem for the l 1 norm. In fact, stronger theorems of ths type (.e., that guarantee very low probablty of expanson as well as of contracton) cannot exst [Brnkman and Charkar We gve an explct embeddng of l2 n n) nto lno(log 1 wth dstorton (1 + 1/n (1) ). Categores and Subect Descrptors: F.2.1 [Analyss of Algorthms and Problem Complexty: Numercal Algorthms and Problems General Terms: Algorthms, Theory Addtonal Key Words and Phrases: sketchng, dmensonalty reducton, embeddngs, data streams, norms 1. Introducton Stable dstrbutons [Zolotarev 1986 are defned as lmts of normalzed sums of ndependent dentcally dstrbuted varables. In partcular, a stable dstrbuton Part of ths work was done whle the author was at Stanford Unversty and vstng AT&T Shannon Labs. P. Indyk was supported n part by Natonal Scence Foundaton (NSF) ITR grant CCR , Davd and Luclle Packard Fellowshp and Alfred P. Sloan Fellowshp. Author s address: 32 Vassar Street, Cambrdge, MA 02139, e-mal: ndyk@mt.edu. Permsson to make dgtal or hard copes of part or all of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or drect commercal advantage and that copes show ths notce on the frst page or ntal screen of a dsplay along wth the full ctaton. Copyrghts for components of ths work owned by others than ACM must be honored. Abstractng wth credt s permtted. To copy otherwse, to republsh, to post on servers, to redstrbute to lsts, or to use any component of ths work n other works requres pror specfc permsson and/or a fee. Permssons may be requested from Publcatons Dept., ACM, Inc., 1515 Broadway, New York, NY USA, fax: +1 (212) , or permssons@acm.org. C 2006 ACM /06/ $5.00 Journal of the ACM, Vol. 53, No. 3, May 2006, pp

2 308 PIOTR INDYK wth parameter p has the property that for any three ndependent random varables X, Y, Z drawn from that dstrbuton, and any a, b R, the varables ax + by and ( a p + b p ) 1/p Z are dentcally dstrbuted. The most well-known example of a stable dstrbuton s Gaussan (or normal) dstrbuton. However, the class s much wder; for example, t ncludes heavy-taled dstrbutons. Stable dstrbutons have found numerous applcatons n many areas. They are partcularly useful n local theory of Banach spaces [Lndenstrauss and Mlman, where, for example, they have been used to show a low-dstorton embeddng of l p nto l 1 for p (1, 2 [Johnson and Schechtman However, pror to ths work, few applcatons to theoretcal computer scence have been known. In ths artcle, we show that the combnaton of stable dstrbutons and bounded space pseudorandom generators [Nsan 1990 forms a powerful tool for provng a varety of algorthmc results. The hgh-level dea behnd ths approach s as follows. Assume that we would lke to construct a compact representaton of a vector u l n p. It s known (e.g., see Johnson and Schechtman [1982) that an nner product of u wth a sequence r of n..d. random varables drawn from p-stable dstrbuton has magntude proportonal to u p. Ths mples that the dot product can be used to recover an approxmate value of u p. Snce the nner product u r can be computed n small space, one can use pseudorandom generators to reduce the number of requred truly random bts. Ths n turn translates nto reducton of storage or dmensonalty or other parameters of nterest, dependng on the applcaton. In the followng, we descrbe n more detal applcatons of ths technque to computng wth data streams, space-effcent dmensonalty reducton n l 1 and l 2 and explct embeddngs of l 2 nto l 1. Further applcatons are sketched n Secton STREAM COMPUTATION. The frst problem we address s defned as follows (see Henznger et al. [1998, and Muthukrshnan [2003 for a background on stream computaton). Assume that we have an access to a stream S of data, where each chunk of data s of the form (, a), where [n ={0 n 1} and a { M...M}. We see the elements of the stream one by one. Our goal s to approxmate (up to the multplcatve factor (1 ± ɛ)) the l p norm of the stream S, that s, the quantty L p (S) = V (S) p, where V (S) = a. (,a) S Estmatng the norm of a stream s a fundamental prmtve n the growng area of data stream computaton, and s used as a subroutne n many streamng algorthms (Secton 6.2 for examples).statstcs of Net-Flow data [Fegenbaum et al An obvous soluton to ths problem s to mantan a counter c for each and compute the sum of c p s at the end. Unfortunately, ths soluton requres (n) words of storage. In ther semnal paper, Alon et al. [1996 proposed a randomzed scheme for approxmatng L 2 (S) usng O(1/ɛ 2 ) ntegers, each O(log(n + M))-bts long. Fegenbaum et al. [1999 proposed a dfferent algorthm for estmatng L 1 (S). Ther algorthm works n a restrcted settng where (roughly) for each, the stream S contans at most two pars (, a). An alternatve way to vew ther result s to assume two streams, one (S r ) contanng red pars and another one (S b ) contanng blue pars; for each there s at most one par (, a) of each color. The goal

3 Dstrbutons, Generators, Embeddngs, and Computaton 309 s to compute sketches C(S r ) and C(S b ) of small sze, such that the approxmate value L 1 (S r, S b ) = (,a) S r a (,a) S b a can be quckly evaluated from C(S r ) and C(S b ) by applyng some functon F (see Fegenbaum et al. [1999 for more detals of the model). Computng sketches of normed vectors enables us to compress the data and speed-up computaton, for example, see Indyk et al. [2000 where ths approach was shown to gve up to an order of magntude speed-up for varous data-mnng problems; see also Broder et al. [1997, Broder [1998, and Cohen et al. [2000 (where a somewhat dfferent smlarty measure has been used). In ths artcle, we propose a unfed framework for approxmatng L p (S) for p (0, 2 n small space. As ndcated earler, our algorthm proceeds by mantanng a dot product of the vector V (S) wth a vector r of n ndependent random varables, each drawn from a p-stable dstrbuton. Snce the dot product can be computed n small space, we can generate the random varables usng only small number of truly random bts. In ths way, we make sure that the total storage use s low. Our algorthm does not have the aforementoned restrctons of Fegenbaum et al. [1999; thus, t solves the man open problem from that artcle. Moreover, our algorthm mantans only lnear combnatons of the nput values, and therefore extends also to the sketch model. We note that the algorthms of Alon et al. [1996 also mantaned a dot product r V (S). However, n ther case, the vector r had entres from { 1, 1} and was nstead drawn from a 4-wse ndependent famly. In ths case, the dstrbuton of the dot product s = V (S) r s not easy to predct. However, t can be shown [Alon et al that the second moment of s s equal to V (S) 2 2,soL 2(S) can be estmated (roughly) from the medan of squares of several dot products. The advantage of that approach s that 4-wse ndependent random varables can be generated from only O(log(n + M)) random bts. The dsadvantage s that t s not known how to generalze ther technque to other L p (S) s for p < DIMENSIONALITY REDUCTION. Dmensonalty reducton s a technque that enables to map a set of hgh-dmensonal ponts nto a set of ponts n lowdmenson, such that both sets have smlar dstance propertes. Ths technque, especally the result of Johnson and Lndenstrauss [1984 for the l 2 norm, found numerous applcatons n theoretcal computer scence (cf. Indyk [2001). We observe that the aforementoned sketchng results can be vewed as low-storage dmensonalty reducton theorems. Indeed, the streams S b and S r can be vewed as ponts n n-dmensonal space and L p (S r, S b ) s ust the l p dstance between the ponts. Then, the sketch operator C can be vewed as a mappng of l n p nto the sketch space (say C), such that each pont n C can be descrbed usng only m numbers, where m s small the value of L p (S r, S b ) s approxmately equal to F(C(S r ), C(S b )) Unfortunately, our sketches (as well as the sketches of Alon et al. [1996) have the undesrable property that the par (C, F) s not a normed space. Specfcally, the defnton of F nvolves the medan operator; for example, for l 1 we have F((x 1,...,x m ), (y 1,...,y m )) = medan( x 1 y 1,..., x m y m )

4 310 PIOTR INDYK The fact that F s not a norm sgnfcantly restrcts the applcatons of the mappng C as a dmensonalty reducton technque. Ths s because t prohbts the usage of a large number of algorthms desgned for normed spaces. To overcome ths obstacle we proceed as follows. For l 2, we observe that f we modfy our algorthm by replacng the medan by 2, then the accuracy of the estmaton does not change (ths follows by observng that the dmensonalty reducton lemma of Johnson-Lndenstrauss requres few truly random bts). Ths gves a smallspace/streamng verson of the Johnson-Lndenstrauss lemma. For l 1, the stuaton s more complcated, snce for sketch ponts (x 1,...,x m ), (y 1,...,y m ) the expectaton E x y s undefned (.e., s equal to ). Thus, we cannot smply replace the medan by 1. However, we are able to show that for any γ > 0 there exsts a sketch functon C whch maps the ponts nto m = (ln(1/δ) 1/(ɛ γ ) )-dmensonal space R m such that for any par of ponts p, q: C(p) C(q) 1 (1 ɛ) p q 1 wth probablty at least 1 δ (.e., C s almost noncontractve wth hgh probablty) C(p) C(q) 1 (1 + ɛ) p q 1 wth probablty at least 1 (1 + γ )/(1 + ɛ) (.e., s almost non-expansve wth a constant probablty) Note, that ths can be vewed as a one-sded analog of Johnson-Lndenstrauss dmensonalty reducton for the l 1 norm. The two-sded analog s mpossble by the result of Brnkman and Charkar [2003. Although we cannot ensure that the mappng does not expand a fxed par of ponts wth hgh probablty, the one-sded guarantee s good enough for several purposes. In partcular, consder searchng for the nearest neghbor (say of pont q): f the dstance from q to ts nearest neghbor p does not expand much, and the dstance to any other pont p does not contract much, we are stll guaranteed to return an approxmate nearest neghbor of q (note that we can ensure ths happens wth constant probablty, whch can be amplfed by usng multple data structures) DETERMINISTIC EMBEDDINGS OF l 2 INTO l 1. The study of low-dstorton embeddngs between normed spaces s a rch area of study n mathematcs. One of the maor results n that area (e.g., see Fgel et al. [1977 and references theren) s that l2 n O(n) can be embedded nto l1 wth dstorton (1 + ɛ) (the O() constant depends on ɛ). Unfortunately, none of the many proofs of ths theorem s constructve, snce they use probablstc method to construct the embeddngs. To our knowledge, the only constructve result of ths type [Berger 1997; Lnal et al embeds l2 n nto l O(n2 ) 1 wth 3 dstorton. We provde an explct embeddng of l2 n O(log n) nto ln 1 wth dstorton (1 + 1/n (1) ). By combnng the result wth the determnstc nearest neghbor algorthm of Indyk [2000 we obtan a (3 + ɛ)-approxmate determnstc algorthm for the nearest neghbor search n l2 n. The algorthm uses preprocessng and storage that s polynomal n the number N of nput ponts and n log n, and has query tme that s polynomal n n log n and log N. 2. Prelmnares 2.1. STABLE DISTRIBUTION. A dstrbuton D over R s called p-stable, f there exsts p 0 such that for any n real numbers a 1...a n and..d. varables X 1...X n

5 Dstrbutons, Generators, Embeddngs, and Computaton 311 varables wth dstrbuton D, the random varable a X has the same dstrbuton as the varable ( a p ) 1/p X, where X s a random varable wth dstrbuton D. It s known [Zolotarev 1986 that stable dstrbutons exst for any p (0, 2. In partcular: a Cauchy dstrbuton D C, defned by the densty functon c(x) = 1 π 1-stable 1 1+x 2,s a Gaussan (normal) dstrbuton D G, defned by the densty functon g(x) = 1 2π e x2 /2, s 2-stable In general, a random varable X from a p-stable dstrbuton can be generated [Chambers et al by takng: X = sn(p ) ( ) cos( (1 p)) (1 p)/p, cos 1/p ln r where s unform on [ π/2,π/2 and r s unform on [0, PSEUDORANDOM GENERATORS (PRGS). To reduce the randomness needed to generate a vector of random varables from a stable dstrbuton, we use pseudorandom generators for bounded space computaton [Nsan The ntuton s that our algorthms perform only a dot product of the random vector wth the vector V (S), and the dot product can be computed n small space. As n Nsan [1990, we consder PRGs whch fool any Fnte State Machne (FSM) whch uses at most O(S) bts of space (or 2 O(S) states). Assume that a FSM Q space(s) uses at most k chunks of random bts, where each chunk s of length b. The generator G : {0, 1} m ({0, 1} b ) k expands a small number m of truly random bts nto kb bts whch look random for Q. Formally, t s defned as follows. Let D t be the unform dstrbuton over {0, 1} t. For any (dscrete) random varable X let D[X be the dstrbuton of X, nterpreted as a vector of probabltes. Let Q(x) denote the state of Q after usng the random bts sequence x. Then we say that G s a PRG wth parameter ɛ>0 for a class C of FSMs, f for every Q C D[Q x D bk(x) D[Q x D m(g(x)) 1 ɛ. FACT 1. [Nsan 1990 There exsts a PRG G for space(s) wth parameter ɛ = 2 O(S) such that: G expands O(S log R) bts nto O(R) bts G requres only O(S) bts of storage (n addton to ts random nput) any length-o(s) chunk of G(x) can be computed usng O(log R) arthmetc operatons on O(S)-bt words 2.3. OTHER ASSUMPTIONS AND NOTATION. To smplfy expressons we assume that M n, and that the number of pars n the stream S s polynomal n n. Also, we wll assume that the processor can operate on log M-bt words n unt cost. One can easly modfy our upper bounds for the case when ether of these assumptons s not true.

6 312 PIOTR INDYK 3. Approxmaton of the L p Norm of Data Streams Let S be the data stream sequence contanng pars (, a), for [n and a { M...M}. We present the algorthm for calculatng L 1 (S); the extenson to p 1 s dscussed at the end. For smplcty, we focus exclusvely on the problem of estmatng L p (S); the algorthms automatcally generate the sketches of S as well. We present our algorthm n three steps. In the frst step, we present an algorthm whch approxmates L 1 (S), but suffers from two drawbacks: (1) It assumes nfnte precson of the calculatons (.e., t uses arthmetc operatons on real numbers) (2) Although t uses only O(1/ɛ 2 ) words for storage, t performs random (and multple) access to as many as (n) random numbers. Thus a natural mplementaton of the algorthm would requre (n) storage. Despte these lmtatons, the algorthm wll serve well as an llustraton of our man deas. In the next two steps, we wll remove the lmtatons AN IDEAL ALGORITHM. Let l = c/ɛ 2 log 1/δ for a constant c > 1 specfed later. The algorthm works as follows. (1) Intalze nl ndependent random varables X, [n, [l drawn from Cauchy dstrbuton; set S = 0, for [l (2) For each new par (, a): perform S = S + ax for all [l (3) Return medan( S 0,... S l 1 ) Let c = (,a) S a; f there s no (, a) S, we defne c = 0. Thus L 1 (S) = C = c. The followng clam ustfes the correctness of the algorthm. CLAIM 1. Each S has the same dstrbuton as C X where X has Cauchy dstrbuton. PROOF. Follows from the 1-stablty of Cauchy dstrbuton. Therefore, t s suffcent to estmate C from ndependent samples of CX, that s, from S 0 S l 1. To ths end, we use the followng Lemmas. LEMMA 1. If X has Cauchy dstrbuton, then medan( X ) = 1. Therefore, medan(a X ) = a, for any a > 0. PROOF. IfX has Cauchy dstrbuton, then the densty functon of X s f (x) = 2 1. Therefore, the dstrbuton functon of X s equal to π 1+x 2 F(z) = z 0 f (x)dx = 2 π arctan(z). Snce tan(π/4) = 1, we have F(1) = 1/2. Thus medan( X ) = 1. CLAIM 2. For any dstrbuton D on R wth the dstrbuton functon F, take l = c/ɛ 2 log 1/δ ndependent samples X 0...X l 1 of D; also, let X = medan (X 0...X l 1 ). Then, for a sutable constant c, we have Pr[F(X) [1/2 ɛ, 1/2 + ɛ > 1 δ. PROOF. An easy applcaton of Chernoff bound.

7 Dstrbutons, Generators, Embeddngs, and Computaton 313 LEMMA 2. Let F be the dstrbuton functon of X where X has Cauchy dstrbuton, and let z > 0 be such that F(z) [1/2 ɛ, 1/2 + ɛ. Then, f ɛ s small enough, we have z [1 4ɛ, 1 + 4ɛ. PROOF. Follows from the fact that F 1 (x) = tan(xπ/2) has bounded dervatve around the pont 1/2. In partcular, (F 1 ) (1/2) = π. Therefore, for a sutable constant c, we have the followng theorem. THEOREM 3. The deal algorthm correctly estmates L 1 (S) up to the factor (1 ± ɛ) wth probablty at least 1 δ BOUNDED PRECISION. Now we show how to remove the assumpton that the numbers have nfnte precson. Snce the numbers n the data stream are nteger, we only need to take care of the random varables X. Specfcally, we need to show that t s suffcent to assume that the random varables can be represented usng O(log(n + M)) bts. Frst, we state the followng CLAIM 3. Let f :[0, 1 d Rbe a functon computed by an algorthm that, gven an nput x [0, 1 d where each coordnate x s represented usng b bts of precson, computes f (x) wth an addtve error β>0, usng O(d(b + log(1/β))) bts of space. In addton, assume that there exsts P > 0, such that for all x [P, 1 P d the absolute values of the frst order partal dervatves of f at x are at most B. Defne a random varable X = f (U), where U s chosen from the unform dstrbuton over [0, 1 d. There s an algorthm A that for any α>0 generates a random varable X, such that there s a ont probablty space of X and X so that: Pr[ X X >α 2d(P + α Bd ) A uses only O(d[log(1/α + B + d) + b) random and storage bts PROOF. Follows from a standard dscretzaton argument. Let s = α. Impose Bd a cubc grd on [0, 1 d, where each cell has sde length s. Note that the total volume of cells fully contaned n [P, 1 P d s at least 1 2d(P + s). We defne X = f (Ũ), where each coordnate of Ũ s chosen unformly at random from {0, 1/2 b,...,1 1/2 b }. Ths corresponds to choosng a random U from [0, 1 d, and roundng each coordnate down to the nearest multple of 1/2 b.if the grd cell contanng U s fully contaned n [P, 1 P d, then f (U) f (Ũ) Bds = α. In our case, U [0, 1, and we have X = f (U) = tan(πu/2). One can observe that the dervatve of f s bounded by O(1/P 2 ) n the nterval [P, 1 P. Thus, for each,, we can generate an approxmaton X to each X usng only O(log(1/P + 1/α) bts, such that for each of them X X α wth probablty 1 P. It follows that, wth probablty at least 1 nl P, for all we have S = a X = c X = ( c X ± α ) = S ± α c (,a) S

8 314 PIOTR INDYK Snce medan(s ) = c,we can set α = ɛ to ensure that the estmaton of L 1 (S) s wthn a factor of (1 ± 2ɛ) from the true value. We also set P = δ to nl ensure that the probablty of correct estmaton s at least 1 2δ. Fnally, we need only O(log(n + 1/δ + 1/ɛ)) random bts to generate each random varable RANDOMNESS REDUCTION. Consder a fxed S. From the above, t follows that the value of S can be represented usng small number of bts; also, we need only small number of bts to generate each X. Unfortunately, we stll need O(n) memory words to make sure that f we access a specfc X several tmes, ts value s always the same. We avod ths problem n the followng way. LEMMA 3. Consder an algorthm A thats, gven a stream S of pars (, a), and a functon f :[n {0, 1} R { M...M} { M O(1)...M O(1) }, does the followng: Set O = 0; Intalze length-r chunks R 0...R M of ndependent random bts For each new par (, a): perform O = O + f (, R, a) Output A(S) = O Assume that the functon f (,, ) s computed by an algorthm usng O(C + R) space and O(T ) tme. Then there s an algorthm A producng output A (S), that uses only O(C + R + log(mn)) bts of storage and O([C + R + log(mn) log(nr)) random bts, such that Pr[A(S) A (S) 1/n over some ont probablty space of randomness of A and A. Then, the algorthm A uses O(T + log(nr)) arthmetc operatons per each par (, a). PROOF. Consder a stream sort(s) n whch (, a) S appear n the ncreasng order of s. In ths case we do not have to store the chunks R, snce we can generate them on the fly. Thus, the algorthm uses only O(log(nM) + C + R) storage and O(nR) bts of randomness. Therefore, by Fact 1, there exsts a PRG whch, gven a random seed of sze O([C + R + log(mn) log(nr)), expands t to a sequence R 0 R n 1, such that usng R s nstead of R s results n neglgble probablty of error. That s, f A s the algorthm usng varables R, then Pr[A(sort(S)) A (sort(s)) 1/n. However, snce the addton s commutatve, we have A(sort(S)) = A(S) and A (sort(s)) = A (S). The lemma follows. The theorem statng the correctness of the fnal algorthm for estmatng L 1 (S) s deferred to the next secton COMPUTING L 2 (S). In ths secton we descrbe the modfcatons for the case of p = 2. Note that the algorthms gven n ths secton use more space than the earler algorthm of Alon et al. [1996. However, the second algorthm has the followng appealng property: the sketch of the stream S s computed by takng y = AV(S) where A s an (mplctly defned) matrx, and L 2 (S) s estmated by takng y 2. In other words, the algorthm provdes a streamng verson of the dmensonalty reducton theorem by Johnson and Lndenstrauss [1984, whch has the benefts as stated n the ntroducton.

9 Dstrbutons, Generators, Embeddngs, and Computaton 315 The frst algorthm s obtaned by replacng Cauchy dstrbuton by Gaussan dstrbuton. As before, the fnal estmator s a medan of S 0 S l 1. The dstrbuton functon of the normal dstrbuton s dfferentable and has non-zero dervatve around ts medan, so the analog of Lemma 2 stll holds. Moreover, a random varable havng normal dstrbuton can be generated from a par chosen unformly at random from [0, 1 2 usng the formula gven n Prelmnares. Then, usng elementary analyss, one can verfy that Clam 3 holds wth B = 1/P O(1). THEOREM 2. There s an algorthm whch for any 0 <ɛ,δ<1 estmates L 1 (S) or L 2 (S) up to a factor (1 ± ɛ) wth probablty 1 δ 1/n and uses O(log(Mn/(δɛ)) log(1/δ)/ɛ 2 ) bts of random access storage O(log(Mn/(δɛ)) log(n/(δɛ)) log(1/δ)/ɛ 2 ) random bts (whch can be stored n a random access storage) O(log(n/(δɛ)) log(1/δ)/ɛ 2 ) arthmetc operatons per par (, a) However, a more elegant approach to estmatng L 2 (S) s to replace the medan operator n the algorthm by 2. Specfcally, the modfed algorthm returns (S 0,...,S l 1 ) 2 as the estmaton of L 2 (S). The correctness of the algorthm follows by a combnaton of two facts: If we use truly ndependent normal varables X, then the algorthm s correct [Indyk and Motwan If the random varables are nstead created usng Nsan s generator, the resultng dfference n the probablty of correctness s neglgble. Ths can be shown n the same way as for the medan-based algorthm. Ths gves us the followng streamng verson of Johnson-Lndenstrauss lemma: THEOREM 3. There s an algorthm that for any 0 <ɛ,δ<1 constructs an mplct representaton of a k n matrx A, k = O(log(1/δ)/ɛ 2 ), such that: Gven any = 1 k, = 1 n, the algorthms returns A[, after performng O(log n) arthmetc operatons. The algorthm uses O(log(Mn/(δɛ)) log(n/(ɛδ)) log(1/δ)/ɛ 2 ) bts of space. Each entry of A can be represented usng O(log(n/(δɛ)) bts. For any fxed vector x R n, we have Pr[ Ax 2 x 2 >ɛ x 2 δ 3.5. COMPUTING L p (S). For general p (0, 2, the algorthm and analyss become more nvolved, manly due to the fact that no exact formulas are known for denstes and/or dstrbuton functons of general p-stable dstrbuton. However, one can generate p-stable random varables as n Prelmnares. Therefore, the algorthm from earler secton can be mplemented for general p. As far as the analyss s concerned, t seems that an analog of Lemma 2 does hold for any p (0, 2). Unfortunately, we are not aware of any proof of ths fact. Instead, we show the followng lemma. LEMMA 4. Let F be a c.d.f. of a random varable Z, where Z s drawn from a p-stable dstrbuton. There exst constants c 1, c 2, c 3 > 0, such that for any p and ɛ, there exsts t [c 1, c 2 such that F 1 (t ɛ/c 3 ) F 1 (t + ɛ/c 3 ) ɛ.

10 316 PIOTR INDYK PROOF. Let b 1, b 2 > 0 be constants such that Pr[ Z b 1 1 b 2. Consder v > 0 such that F(v) = b 2. Clearly, we have v b 1. Also, let u > 0 be such that F(u) = b 2 /2. Decompose the nterval [b 2 /2, b 2 nto b 1 /ɛ dsont ntervals of the form [t ɛ/(4b 1 /b 2 ), t + ɛ/(4b 1 /b 2 ). Assume the lemma does not hold wth constants c 1 = b 2 /2, c 2 = b 2, c 3 = (4b 1 /b 2 ). Ths mples that F 1 ncreases on each of the ntervals by more than ɛ. But ths would mply that F 1 (b 2 ) > b 1 /ɛ ɛ = b 1, whch yelds a contradcton. Gven the lemma, we can estmate the value of L p (S) by takng the t-quantle (nstead of the medan) of varables S 0 S l 1. Note that, unlke for p = 1, 2, the value of t (and therefore the algorthm) depends on ɛ>0. Moreover, we do not specfy a method to compute t gven p and ɛ, although presumably ths task can be accomplshed by usng numercal approxmatons of the denstes of p-stable dstrbutons. Ths means that our algorthm s not unform. Other ssues are taken care of as for p = 2. We only observe that for general p, the dervatve of F 1 depends on p. However, snce we consder p to be a constant, we suppress the dependence on p n the O( ) notaton below. THEOREM 4. For any p (0, 2) and any 0 <ɛ,δ<1, there s a non-unform algorthm that estmates L p (S) up to a factor (1 ± ɛ) wth probablty 1 δ and uses O(log(Mn/(ɛδ)) log(1/δ)/ɛ 2 ) bts of random access storage O(log(Mn/(ɛδ))log(n/(ɛδ)) log(1/δ)/ɛ 2 ) random bts (whch can be stored n a random access storage) O(log(n/(δɛ))) arthmetc operatons per par (, a) The O( ) notaton subsumes constants dependng on p. 4. Dmensonalty Reducton for L 1 In ths secton, we show how to obtan the sketch functon C that maps the ponts nto a normed space l1 m. We wll descrbe the mappng n terms of dmensonalty reducton of l1 n ; the adaptaton to the stream model can be done as n the prevous secton. THEOREM 5. For any 1/2 ɛ, δ > 0, and ɛ>γ>0, there s a probablty space over lnear mappngs f : l1 n lk 1, where k = (ln(1/δ))1/(ɛ γ ) /c(γ ), for a functon c(γ ) > 0 dependng only on γ, such that for any par of ponts p, q l1 n: the probablty that f (p) f (q) 1 (1 ɛ) p q 1 s smaller than δ the probablty that f (p) f (q) 1 (1 + ɛ) p q 1 s smaller than 1+γ 1+ɛ Note that the embeddng s randomzed but asymmetrc: the probablty that the expanson s small s only about ɛ, whle the probablty that the contracton s small s 1 δ. Also, note that the term c(γ ) n the defnton of k enables us to assume that k s large compared to any functon of γ. PROOF. We defne the random mappng f such that, for = 1...k the th coordnate of f (q) for q = (q 1,...,q n ) s equal to Y = X q, where X are..d random varables havng Cauchy dstrbuton. Snce f s lnear, t s suffcent to show the above for p = 0 and q such that q 1 = 1. In ths case f (p) f (q) = X q = Y. Snce the Cauchy dstrbuton s 1-stable, each Y has

11 Dstrbutons, Generators, Embeddngs, and Computaton 317 a Cauchy dstrbuton. Thus t s suffcent to prove the followng fact. For any sequence Y 1 Y k of..d. varables wth Cauchy dstrbuton, let Y = Y. Show that there exsts a threshold T = T (k,γ,ɛ), such that: Pr[Y < (1 ɛ)t δ Pr[Y > (1 + ɛ)t 1+γ 1+ɛ Our approach s to consder a truncated verson of the varables Y. In partcular, for B > 0 and any = 1...k, let Z B be equal to Y f Y B; we wll set = B otherwse. Z B CLAIM 4. Let P = Pr[Z B = B. Then P b/b for some constant b > 0. LEMMA 5. For any B > 0 k ln(b 2 + 1)/π E k[ln(b 2 + 1)/π + b PROOF. Wehave E Z B Z B = E [ Z B [ = k (1 P) 2 B x dx + PB π x 2 = k[(1 P)/π ln(b 2 + 1) + PB. The nequaltes follow from Clam 4. LEMMA 6. For any B > 0 E [( Z B ) 2 4/π B. PROOF. x x dx + 2 B 2/π [B + B 2 /B = 4/π B. E [( [ Z B ) 2 B = 2/π γ 1 + ɛ + E B x dx 2 We wll frst establsh T, whch satsfes the second condton. Let U = 1+ɛ γ bk. By the unon bound, we have Pr[ Y U kb/u = γ. We defne T = 1+ɛ E[ Z U. Then Pr[Y (1 + ɛ)t Pr[ Y > U + Pr[Y (1 + ɛ)t : Y U Y : Y U (1 + ɛ)t γ 1 + ɛ + E Z U (1 + ɛ)t = 1 + γ 1 + ɛ.

12 318 PIOTR INDYK Now we focus on the frst condton. Defne α = 1 ɛ 1 γ, L = U α. Observe that 1/2 α<1. Observe that E Z L k/π ln(l 2 + 1) k/π ln(u 2α + 1) αk/π ln(u 2 + 1) α(t bk), where the last nequalty follows from Lemma 5 and the defnton of T. Thus, T E[ Z L /α + bk. Set γ = γ/2, and assume that k s large enough wth respect to γ.wehave Pr Y (1 ɛ)t Pr Pr Z L Z L (1 ɛ)t (1 ɛ)(e = Pr Z L (1 γ )E [ = Pr E + ( γ E Z L Z L Z L Z L Z L γ E ) (1 ɛ)bk /α + bk) + (1 ɛ)bk Z L Observe that, by Lemma 5: γ E Z L (1 ɛ)bk k[γ /π ln(l 2 + 1) b k[γ /π ln(u 2α ) b k[γ /π ln(bk) b whch s postve for k large enough. Therefore, we have Pr Y (1 ɛ)t [ Pr E Z L Z L γ E Z L.

13 Dstrbutons, Generators, Embeddngs, and Computaton 319 By usng the nequalty of Maurer [2003 we get [ Pr E Z L Z L γ E Z L ( exp γ 2 E 2 Z ) L 2kE [( ) Z L 2 exp ( γ 2 (αk/π ln(u 2 + 1)) 2 ) 2k 4/π L exp ( γ 2 (αk/π) 2 ) 2k[(1 + ɛ)bk/γ α exp( c(γ )k 1 α ) exp( c(γ )k ɛ γ ), where c(γ ) s a constant dependent on γ. Thus, settng k = ln(1/δ) 1/(ɛ γ ) /c(γ ) for a proper functon c(γ ) ensures that the frst condton of the theorem holds as well. 5. Explct Embeddng of L n O(log n) 2 nto Ln 1 wth (1 + 1/N O(1) ) Dstorton We start from llustratng the embeddng by provdng an ntutve explct embeddng of l2 d nto l 1 wth large dmenson. To ths end, notce that f X 1 X n s a sequence of..d. random varables wth Gaussan dstrbuton, then there exsts a constant C > 0 such that for any q = (q 1,...,q n ) l2 n,wehave [ E q X = C q 2, (ths easly follows from 2-stablty of Gaussan dstrbuton and propertes of a norm). Ths s approxmately true even f the Gaussan varables are dscretzed to be representable usng b = O(log n) bts; the detals are as n Secton 3. Thus, f we create a matrx A wth n columns and (2 b ) n rows, one for each confguraton of (X 1,...,X n ), then Aq 1 /(2 b ) n C q 2, whch s what we need. To reduce the dmenson of the host space, we proceed essentally as n Secton 3. The only dfference s that ths tme we are dealng wth the expectaton nstead of low probablty of error (.e., we have to exclude the case that a small probablty event has a sgnfcant contrbuton to the expectaton). To ths end, we proceed as follows. Let X be..d. varables havng the truncated Gaussan dstrbuton, that s, such that: f X t, then X = X f X > t, then X = 0 We use t = 2c log n, sopr[ X > t a/n c, for some a > 0. We wll relate E[ X q and E[ X q as follows. Let P = Pr[ : X > t; notce that

14 320 PIOTR INDYK P a/n c 1, that s, s small. Then we can wrte [ E = E X q [ = (1 P)E = (1 P)E 1 + PE 2 and [ E = E X q [ X q : X t + PE [ = (1 P)E X q : X 0 = (1 P)E 1 + PE 2 [ + PE X q : X > t X q : X = 0 Notce that E 1 = E 1. Moreover, t s easy to see that E 2 = O(nt) and E 2 = O(nt). Thus, E and E dffer only by a factor of (1 + 1/n (1) ). The bounded precson ssues are essentally the same as n Secton 3, so we skp the detals. THEOREM 6. For any n > 0, there exsts an explctly constructble embeddng of l2 n n) nto lno(log 1 wth dstorton (1 + 1/n O(1) ). 6. Extensons, Dscusson and Open Problems 6.1. APPROXIMATE RESULT CHECKING. The technque of usng a random lnear mappng to estmate the norm of a vector has ts computer scence roots n (approxmate) checkng of computaton. Consder the followng problem: for a fxed lnear mappng A : R n R m, construct a checker, that gven x R n and y R m, checks f Ax = y. The check should be preferably done n tme O(n), so that the overhead of checkng s low compared to the computaton tme. The latter s typcally ω(n), for example, t s (n log n) for Fourer Transform. A soluton to ths problem [Frevalds 1979; Wasserman and Blum 1997 can be obtaned as follows. Frst, observe that for any r R m we have r T (Ax) = (r T A)x = s T x. Moreover, f Ax y, then for r chosen unformly at random from {0, 1} m we have Pr[s T x = r T y = Pr[r T (Ax y) = 0 1. These two 2 observatons gve us a probablstc checker for Ax = y that runs n O(n) tme, provded we generate the par (r, s) n advance. A more refned approxmate checker was proposed n Ar et al. [1993, and Wasserman and Blum [1997. It not only verfes f Ax = y, but also enables to estmate the norm of the dfference vector Ax y. In partcular, Wasserman and Blum [1997 observes that (s T x r T y) 2 provdes an unbased estmator of Ax y 2 2. In the context of the aforementoned research, one can easly see that our sketchng algorthms can be drectly translated nto approxmate checkers that work for any l p norm, p (0, 2.

15 Dstrbutons, Generators, Embeddngs, and Computaton FURTHER DEVELOPMENTS. Snce the earler verson of ths artcle has been presented at FOCS 00, the technques ntroduced n ths artcle have been used n several other artcles. In ths secton, we brefly dscuss those results. Our algorthms for estmatng the l p norms, as well as the use of Nsan s generator to reduce the storage needed for the random bts, become a standard tool n the area of streamng algorthms (cf. Glbert et al. [2002, Datar et al. [2002, Cormode et al. [2002a, Thaper et al. [2002, Cormode and Muthukrshnan [2003, Indyk [2004, and Indyk and Woodruff [2005; see also the surveys [Cormode 2003; Muthukrshnan 2003). In Cormode et al. [2002b the authors use the algorthms n a nonstreamng settng to reduce the dmensonalty of the data and the runnng tme needed to compute dstances between the vectors. Stable dstrbutons found use n other algorthmc settngs as well. In Datar et al. [2004, they are used to construct a Localty-Senstve Hashng scheme that works drectly n l p norms; the earler scheme of Indyk and Motwan [1998 works only for Hammng space. In Fegenbaum et al. [2001a, 2001b (Appendx B.2), t s showed how to augment our l 2 estmaton algorthm to construct sketches that are cryptographcally secure. Specfcally, the authors use the memoryless property of p-stable dstrbutons: a dot-product of any vector x R n wth a vector of n ndependent p-stable random varables s a random varable that depends only on x p and not on any other propertes of x. The observaton that Nsan s generator can be used to reduce the randomness needed for dot product computaton has been used n Engebretsen et al. [2002 to gve an effcent derandomzaton of an approxmaton algorthm based on semdefnte programmng. Ther algorthm was farly complex and nvolved the method of condtonal probabltes n addton to the use of pseudorandom generator. Independently, Svakumar [2002 showed that a smlar result (as well as many others) can be obtaned drectly, by usng a dfferent verson of Nsan s generator [Nsan OPEN PROBLEMS. There are several nterestng problems left open by ths artcle. In partcular, we do not know f the use of Nsan s generator s really necessary for our purpose. It s plausble that one could use O(1)-wse ndependent famles of random varables (as n Alon et al. [1996) to generate random varables that have suffcent stable law propertes. If so, then one would be able to reduce the space used by our algorthm by a logarthmc factor. Even better, ths mght gve an explct constructon of an embeddng of l2 d O(1) nto ld 1 wth dstorton arbtrarly close to 1. In general, closng the gap between probablstc and explct constructons of such embeddngs remans an mportant open problem (cf. Matoušek [2004, Problem 2.2). ACKNOWLEDGMENTS. The author would lke to thank Martn Strauss, Joan Fegenbaum, Graham Cormode, Anastasos Sdropoulos and the anonymous referees for helpful comments and dscussons. REFERENCES ALON, N., MATIAS, Y.,AND SZEGEDY, M The space complexty of approxmatng the frequency moments. In Proceedngs of the ACM Symposum on Theory of Computng, ACM, New York, AR, S., BLUM, M., CODENOTTI, B., AND GEMMELL, P Checkng approxmate computaton over the reals. In Proceedngs of the Annual ACM Symposum on Theory of Computng, ACM, New York.

16 322 PIOTR INDYK BERGER, B The fourth moment method. SIAM J. Comput. 26. BRINKMAN, B., AND CHARIKAR, M On the mpossblty of dmenson reducton n l1. In Proceedngs of the 44th Annual IEEE Symposum on Foundatons of Computer Scence. IEEE Computer Socety Press, Los Alamtos, CA. BRODER, A Flterng near-duplcate documents. In Proceedngs of FUN. BRODER, A., GLASSMAN, S., MANASSE, M., AND ZWEIG, G Syntactc clusterng of the web. In Proceedngs of the 6th Internatonal World Wde Web Conference, CHAMBERS, J. M., MALLOWS, C. L., AND STUCK, B. W A method for smulatng stable random varables. J. Amer. Statst. Assoc. 71, COHEN, E., DATAR, M., FUJIWARA, S., GIONIS, A., INDYK, P., MOTWANI, R., ULLMAN, J., AND YANG, C Fndng nterestng assocatons wthout support prunnng. In Proceedngs of the 16th Internatonal Conference on Data Engneerng (ICDE). CORMODE, G Stable dstrbutons for stream computatons: It s as easy as 0,1,2. In Proceedngs of the Workshop on Management and Processng of Data Streams. CORMODE, G., DATAR, M., INDYK, P., AND MUTHUKRISHNAN, S. 2002a. Comparng data streams usng hammng norms. In Proceedngs of the Internatonal Conference on Very Large Databases (VLDB). CORMODE, G., INDYK, P., KOUDAS, N., AND MUTHUKRISHNAN, S. 2002b. Fast mnng of massve tabular data va approxmate dstance computatons. In Proceedngs of the 18th Internatonal Conference on Data Engneerng (ICDE). CORMODE, G., AND MUTHUKRISHNAN, S Estmatng domnance norms of multple data streams. In Proceedngs of the European Symposum on Algorthms. DATAR, M., GIONIS, A., INDYK, P., AND MOTWANI, R Mantanng stream statstcs over sldng wndows. In Proceedngs of the ACM-SIAM Symposum on Dscrete Algorthms, ACM, New York. DATAR, M., IMMORLICA, N., INDYK, P., AND MIRROKNI, V Localty-senstve hashng scheme based on p-stable dstrbutons. In Proceedngs of the ACM Symposum on Computatonal Geometry, ACM, New York. ENGEBRETSEN, L., INDYK, P., AND O DONNELL, R Determnstc dmensonalty reducton wth applcatons. In Proceedngs of the ACM-SIAM Symposum on Dscrete Algorthms, ACM, New York. FEIGENBAUM, J., ISHAI, Y., MALKIN, T., NISSIM, K., STRAUSS, M. J., AND WRIGHT, R. N. 2001a. Secure multparty computaton of approxmatons. Lecture Notes n Computer Scence, vol. 2076, Sprnger- Verlag, New York, 927. FEIGENBAUM, J., ISHAI, Y., MALKIN, T., NISSIM, K., STRAUSS, M. J., AND WRIGHT, R. N. 2001b. Secure multparty computaton of approxmatons. FEIGENBAUM, J., KANNAN, S., STRAUSS, M., AND VISWANATHAN, M An approxmate l1- dfference algorthm for massve data streams. In Proceedngs of the Symposum on Foundatons of Computer Scence. IEEE Computer Socety Press, Los Alamtos, CA. FIGIEL, T., LINDENSTRAUSS, J., AND MILMAN, V. D The dmenson of almost sphercal sectons of convex bodes. Acta Math. 139, FREIVALDS, R Fast probablstc algorthms. In Proceedngs of the Mathematcal Foundatons of Computer Scence. Lecture Notes n Computer Scence, vol. 74, Sprnger-Verlag, New York. GILBERT, A., GUHA, S., KOTIDIS, Y., INDYK, P., MUTHUKRISHNAN, M., AND STRAUSS, M In Proceedngs of the Fast, small-space algorthms for approxmate hstogram mantenance. In Proceedngs of the Annual ACM Symposum on Theory of Computng, ACM, New York. HENZINGER, M., RAGHAVAN, P., AND RAJAGOPALAN, S Computng on data streams. Techncal Note , Dgtal Systems Research Center, Palo Alto, CA. INDYK, P Dmensonalty reducton technques for proxmty problems. In Proceedngs of the Nnth ACM-SIAM Symposum on Dscrete Algorthms, ACM, New York. INDYK, P Tutoral: Algorthmc applcatons of low-dstorton geometrc embeddngs. In Proceedngs of the Annual Symposum on Foundatons of Computer Scence. IEEE Computer Socety Press, Los Alamtos, CA. INDYK, P Algorthms for dynamc geometrc problems over data streams. In Proceedngs of the Annual ACM Symposum on Theory of Computng, ACM, New York. INDYK, P., KOUDAS, N., AND MUTHUKRISHNAN, S Identfyng representatve trends n massve tme seres datasets usng sketches. In Proceedngs of the 26th Internatonal Conference on Very Large Databases (VLDB). INDYK, P., AND MOTWANI, R Approxmate nearest neghbor: towards removng the curse of dmensonalty. In Proceedngs of the Symposum on Theory of Computng, ACM, New York.

17 Dstrbutons, Generators, Embeddngs, and Computaton 323 INDYK, P., AND WOODRUFF, D Optmal approxmatons of the frequency moments of data streams. In Proceedngs of the Annual ACM Symposum on Theory of Computng, ACM, New York. JOHNSON, W.,AND LINDENSTRAUSS, J Extensons of lpshtz mappng nto hlbert space. Contemp. Math. 26, JOHNSON, W.,AND SCHECHTMAN, G Embeddng l m p nto ln 1. Acta Math. 149, LINDENSTRAUSS, J., AND MILMAN, V. D The local theory of normed spaces and ts applcatons to convexty. In Handbook of Convex Geometry, P. M. Gruber and J. M. Wlls, eds Elsever, Amsterdam, The Netherlands, LINIAL, N., LONDON, E., AND RABINOVICH, Y The geometry of graphs and some of ts algorthmc applcatons. In Proceedngs of 35th Annual IEEE Symposum on Foundatons of Computer Scence. IEEE Computer Socety Press, Los Alamtos, CA, MATOUŠEK, J Collecton of open problems on low-dstorton embeddngs of fnte metrc spaces. Avalable at MAURER, A A bound on the devaton probablty for sums of non-negatve random varables. J. Ineq. Pure Appled Math. 4, 1, Art. 15. MUTHUKRISHNAN, S Data streams: Algorthms and applcatons (nvted talk at SODA 03). Avalable at muthu/stream-1-1.ps. NISAN, N Pseudorandom generators for space-bounded computaton. In Proceedngs of the Annual ACM Symposum on Theory of Computng, ACM, New York, NISAN, N RL SC. In Proceedngs of the Annual ACM Symposum on Theory of Computng, SIVAKUMAR, D Algorthmc derandomzaton va complexty theory. In Proceedngs of the Annual ACM Symposum on Theory of Computng, ACM, New York, THAPER, N., GUHA, S., INDYK, P., AND KOUDAS, N Dynamc multdmensonal hstograms. In Proceedngs of the ACM SIGMOD Internatonal Conference on Management of Data (SIGMOD), ACM, New York. WASSERMAN, H., AND BLUM, M Software relablty va run-tme result checkng. J. ACM. ZOLOTAREV, V One-Dmensonal Stable Dstrbutons. Vol. 65 of Translatons of Mathematcal Monographs, Amercan Mathematcal Socety. RECEIVED APRIL 2004; REVISED MAY 2005 AND SEPTEMBER 2005; ACCEPTED SEPTEMBER 2005 Journal of the ACM, Vol. 53, No. 3, May 2006.