Database-friendly random projections: Johnson-Lindenstrauss with binary coins

Transcription

1 Journal of Computer an System Sciences 66 (2003) Database-frienly ranom projections: Johnson-Linenstrauss with binary coins Dimitris Achlioptas Microsoft Research, One Microsoft Way, Remon, WA 98052, USA Receive 28 August 2001; revise 19 July 2002 Abstract A classic result of Johnson an Linenstrauss asserts that any set of n points in -imensional Eucliean space can be embee into k-imensional Eucliean space where k is logarithmic in n an inepenent of so that all pairwise istances are maintaine within an arbitrarily small factor. All known constructions of such embeings involve projecting the n points onto a spherically ranom k-imensional hyperplane through the origin. We give two constructions of such embeings with the property that all elements of the projection matrix belong in f 1; 0; þ1g: Such constructions are particularly well suite for atabase environments, as the computation of the embeing reuces to evaluating a single aggregate over k ranom partitions of the attributes. r 2003 Elsevier Science (USA). All rights reserve. 1. Introuction Consier projecting the points of your favorite sculpture first onto the plane an then onto a single line. The result amply emonstrates the power of imensionality. In general, given a high-imensional pointset it is natural to ask if it coul be embee into a lower imensional space without suffering great istortion. In this paper, we consier this question for finite sets of points in Eucliean space. It will be convenient to think of n points in R as an n matrix A; each point represente as a row (vector). Given such a matrix representation, one of the most commonly use embeings is the one suggeste by the singular value ecomposition of A: That is, in orer to embe the n points into R k we project them onto the k-imensional space spanne by the singular vectors corresponing to the k largest singular values of A: If one rewrites the result of this projection as a (rank k) n matrix A k ; we are guarantee that any other k-imensional pointset (represente as an n aress: optas@microsoft.com /03/$ - see front matter r 2003 Elsevier Science (USA). All rights reserve. oi: /s (03)

2 672 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) matrix D) satisfies ja A k j F pja Dj F ; where, for any matrix Q; jqj 2 F ¼ P Q 2 i;j : To interpret this result observe that if moving a point by z takes energy proportional to z 2 ; A k represents the k-imensional configuration reachable from A by expening least energy. In fact, A k is an optimal rank k approximation of A uner many matrix norms. In particular, it is well-known that for any rank k matrix D an for any rotationally invariant norm ja A k jpja Dj: At the same time, this optimality implies no guarantees regaring local properties of the resulting embeing. For example, it is not har to evise examples where the new istance between a pair of points is arbitrarily smaller than their original istance. For a number of problems where imensionality reuction is clearly esirable, the absence of such local guarantees can make it har to exploit embeings algorithmically. In a seminal paper, Linial et al. [12] were the first to consier algorithmic applications of embeings that respect local properties. By now, embeings of this type have become an important tool in algorithmic esign. A real gem in this area has been the following result of Johnson an Linenstrauss [9]. Lemma 1.1 (Johnson an Linenstrauss [9]). Given e40 an an integer n; let k be a positive integer such that kxk 0 ¼ Oðe 2 log nþ: For every set P of n points in R there exists f : R -R k such that for all u; vap ð1 eþjju vjj 2 pjj f ðuþ f ðvþjj 2 pð1 þ eþjju vjj 2 : We will refer to embeings proviing a guarantee akin to that of Lemma 1.1 as JLembeings. In the last few years, such embeings have been use in solving a variety of problems. The iea is as follows. By proviing a low-imensional representation of the ata, JL-embeings spee up certain algorithms ramatically, in particular algorithms whose run-time epens exponentially in the imension of the working space. (For a number of practical problems the best-known algorithms inee have such behavior.) At the same time, the provie guarantee regaring pairwise istances often allows one to establish that the solution foun by working in the low-imensional space is a goo approximation to the solution in the original space. We give a few examples below. Papaimitriou et al. [13], prove that embeing the points of A in a low-imensional space can significantly spee up the computation of a low-rank approximation to A; without significantly affecting its quality. In [8], Inyk an Motwani showe that JL-embeings are useful in solving the e-approximate nearest-neighbor problem, where (after some preprocessing of the pointset P) one is to answer queries of the following type: Given an arbitrary point x; fin a point yap; such that for every point zap; jjx zjjxð1 eþjjx yjj: In a ifferent vein, Schulman [14] use JL-embeings as part of an approximation algorithm for the version of clustering where we seek to minimize the sum of the squares of intracluster istances. Recently, Inyk [7] showe that

3 JL-embeings can also be use in the context of ata-stream computation, where one has limite memory an is allowe only a single pass over the ata (stream) Our contribution D. Achlioptas / Journal of Computer an System Sciences 66 (2003) Over the years, the probabilistic metho has allowe for the original proof of Johnson an Linenstrauss to be greatly simplifie an sharpene, while at the same time giving conceptually simple ranomize algorithms for constructing the embeing [5,6,8]. Roughly speaking, all such algorithms project the input points onto a spherically ranom hyperplane through the origin. While this is conceptually simple, in practical terms it amounts to multiplying the input matrix A with a ense matrix of real numbers. This can be a non-trivial task in many practical computational environments. At the same time, investigating the role of spherical symmetry in the choice of hyperplane is mathematically interesting in itself. Our main result, below, asserts that one can replace projections onto spherically ranom hyperplanes with much simpler an faster operations. In particular, these operations can be implemente efficiently in a atabase environment using stanar SQL primitives. Somewhat surprisingly, we prove that this comes without any sacrifice in the quality of the embeing. In fact, we will see that for every fixe value of we get a slightly better boun than all current methos. We state our result below as Theorem 1.1. Similarly to Lemma 1.1, the parameter e controls the esire accuracy in istance preservation, while now b controls the projection s probability of success. Theorem 1.1. Let P be an arbitrary set of n points in R ; represente as an n matrix A: Given e; b40 let k 0 ¼ 4 þ 2b e 2 =2 e 3 log n: =3 For integer kxk 0 ; let R be a k ranom matrix with Rði; jþ ¼r ij ; where fr ij g are inepenent ranom variables from either one of the following two probability istributions: ( þ1 with probability 1=2; r ij ¼ ð1þ 1 with probability 1=2; 8 p r ij ¼ ffiffi >< þ1 with probability 1=6; 3 0 with probability 2=3; ð2þ >: 1 with probability 1=6: Let E ¼ p 1 ffiffiffi AR k an let f : R -R k map the ith row of A to the ith row of E: With probability at least 1 n b ; for all u; vap ð1 eþjju vjj 2 pjj f ðuþ f ðvþjj 2 pð1 þ eþjju vjj 2 :

4 674 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) We see that to construct a JL-embeing via Theorem 1.1 we only nee very simple probability istributions to generate the projection matrix, while the computation of the projection itself reuces to aggregate evaluation, i.e., summations an subtractions (but no multiplications). While it seems har to imagine a probability istribution much simpler than (1), probability istribution (2) gives an aitional threefol speeup as we only nee to process a thir of all attributes for each of the k coorinates. We note that the construction base on probability istribution (1) was, inepenently, propose by Arriaga an Vempala in [2], but without an analysis of its performance Projecting onto ranom lines Looking a bit more closely into the computation of the embeing we see that each row (vector) of A is projecte onto k ranom vectors whose coorinates fr ij g are inepenent ranom variables with mean 0 an variance 1. If the fr ij g were inepenent normal ranom variables with mean 0 an variance 1, it is well-known that each resulting vector woul point to uniformly ranom irection in space. Projections onto such vectors have been consiere in a number of settings, incluing the work of Kleinberg [10] an Kushilevitz et al. [11] on approximate nearest neighbors an of Vempala on learning intersections of halfspaces [16]. More recently, such projections have also been use in learning mixture of Gaussians moels, starting with the work of Dasgupta [4] an later with the work of Arora an Kannan [3]. Our proof implies that for any fixe vector a; the behavior of its projection onto a ranom vector c is manate by the even moments of the ranom variable jja cjj: In fact, our result follows by showing that for every vector a; uner our istributions for fr ij g; these moments are ominate by the corresponing moments for the case where c is spherically symmetric. As a result, projecting onto vectors whose entries are istribute like the columns of matrix R is computationally simpler an results in projections that are at least as nicely behave Ranomization A naive, perhaps, attempt at constructing JL-embeings woul be to pick k of the original coorinates in -imensional space as the new coorinates. Naturally, as two points can be very far apart while only iffering along a single imension, this approach is oome. Yet, if we knew that for every pair of points all coorinates contribute roughly equally to the corresponing pairwise, istance, this naive scheme woul make perfect sense. With this in min, it is very natural to try an remove pathologies like those mentione above by first applying a ranom rotation to the original pointset in R : Observe now that picking, say, the first k coorinates after applying a ranom rotation is exactly the same as projecting onto a spherically ranom k-imensional hyperplane! Thus, we see that ranomization in JL-projections only serves as insurance against axis-alignment, analogous to the application of a ranom permutation before running Quicksort Deranomization Theorem 1.1 allows one to use significantly fewer ranom bits than all previous methos for constructing JL-embeings. Inee, since the appearance of an earlier version of this work [1],

5 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) the construction base on probability istribution (1) has been use by Sivakumar [15] to give a very simple eranomization of the Johnson Linenstrauss lemma. 2. Previous work As we will see, in all methos for proucing JL-embeings, incluing ours, the heart of the matter is showing that for any vector, the square length of its projection is sharply concentrate aroun its expecte value. The original proof of Johnson an Linenstrauss [9] uses quite heavy geometric approximation machinery to yiel such a concentration boun. That proof was greatly simplifie an sharpene by Frankl an Meahara [6] who explicitly consiere a projection onto k ranom orthonormal vectors (as oppose to viewing such vectors as the basis of a ranom hyperplane), yieling the following result. Theorem 2.1 (Frankl an Meahara [6]). For any eað0; 1=2Þ; any sufficiently large set PAR ; an kxk 0 ¼ J9ðe 2 2e 3 =3Þ 1 log jpjn þ 1; there exists a map f : P-R k such that for all u; vap; ð1 eþjju vjj 2 pjj f ðuþ f ðvþjj 2 pð1 þ eþjju vjj 2 : The next great simplification of the proof of Lemma 1.1 was given, inepenently, by Inyk an Motwani [8] an Dasgupta an Gupta [5], the latter also giving a slight sharpening of the boun for k 0 : By combining the analysis of [5] with the viewpoint of [8] it is in fact not har to show that Theorem 1.1 hols if for all i; j; r ij ¼ D Nð0; 1Þ (this was also observe in [2]). Below we state our renition of how each of these two latest simplifications [8,5] were achieve, as they prepare the groun for our own work. Let us write X ¼ D y to enote that X is istribute as Y an recall that Nð0; 1Þ enotes the stanar Normal istribution with mean 0 an variance 1. Inyk an Motwani [8]: Assume that we try to implement the scheme of Frankl an Maehara [6] but we are lazy about enforcing either normality (unit length) or orthogonality among our k vectors. Instea, we just pick k inepenent, spherically symmetric ranom vectors, by taking as the coorinates of each vector ki:i:: Nð0; 1=Þ ranom variables (so that the expecte length of each vector is 1). An immeiate gain of this approach is that now, for any fixe vector a; the length of its projection onto each of our vectors is also a normal ranom variable. This is ue to a powerful an eep fact, namely the 2-stability of the Gaussian istribution: for any real numbers a 1 ; a 2 ; y; a ; if fz i g i¼1 is a family of inepenent normal ranom variables an X ¼ P i¼1 a iz i ; then X ¼ D cnð0; 1Þ; where c ¼ða 2 1 þ? þ a2 Þ1=2 : As a result, if we take these k projection lengths to be the coorinates of the embee vector in R k ; then the square length of the embee vector follows the chi-square istribution for which strong concentration bouns are reaily available. Remarkably, very little is lost ue to this laziness. Although, we i not explicitly enforce either orthogonality, or normality, the resulting k vectors, with high probability, will come very close to having both of these properties. In particular, the length of each of the k vectors is sharply

6 676 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) concentrate (aroun 1) as the sum of inepenent ranom variables. Moreover, since the k vectors point in uniformly ranom irections in R ; as grows they rapily get closer to being orthogonal. Dasgupta an Gupta [5]: Here we will exploit spherical symmetry without appealing irectly to the 2-stability of the Gaussian istribution. Instea observe that, by symmetry, the projection of any unit vector a on a ranom hyperplane through the origin is istribute exactly like the projection of a ranom point from the surface of the -imensional sphere onto a fixe subspace of imension k: Such a projection can be stuie reaily since each coorinate is a scale normal ranom variable. With a somewhat tighter analysis than [8], this approach gave the best known boun, namely kxk 0 ¼ð4 þ 2bÞðe 2 =2 e 3 =3Þ 1 log n; which is exactly the same as the boun in Theorem Some intuition Our contribution begins with the realization that spherical symmetry, while making life extremely comfortable, is not essential. What is essential is concentration. So, at least in principle, we are free to consier other caniate istributions for the fr ij g; if perhaps at the expense of comfort. As we saw earlier, each column of our projection matrix R will yiel one coorinate of the projection in R k : Thus, the square length of the projection is merely the sum of the squares of these coorinates. Therefore, the projection can be thought of as follows: each column of R acts as an inepenent estimator of the original vector s length (its estimate being the inner prouct with it); to reach consensus we take the sum of the k estimators. Seen from this angle, requiring the k vectors to be orthonormal has the pleasant statistical overtone of maximizing mutual information (since all estimators have equal weight an are orthogonal). Nonetheless, even if we only require that each column simply gives an unbiase, boune variance estimator, the Central Limit Theorem implies that if we take sufficiently many columns, we can get an arbitrarily goo estimate of the original length. Naturally, the number of columns neee epens on the variance of the estimators. From the above we see that the key issue is the concentration of the projection of an arbitrary fixe vector a onto a single ranom vector. The main technical ifficulty that results from giving up spherical symmetry is that this concentration can epen on a: Our technical contribution lies in etermining probability istributions for the fr ij g uner which, for all vectors, this concentration is at least as goo as in the spherically symmetric case. In fact, it will turn out that for every fixe value of ; we can get a (minuscule) improvement in concentration. Thus, for every fixe ; we can actually get a strictly better boun for k than by taking spherically ranom vectors. The reaer might be wonering how can it be that perfect spherical symmetry oes not buy us anything (an is in fact slightly worse for each fixe )?. The following intuitive argument hints at why giving up spherical symmetry is (at least) not catastrophic. Say, for example, that fr ij gaf 1; þ1g so that the projection length of certain vectors is more variable than that of others, an assume that an aversary is trying to pick a worst-case such vector w; i.e., one whose projection length is most variable. Our problem can be rephrase

7 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) as How much are we empowering an aversary by committing to picking our column vectors among lattice points rather than arbitrary points in R?. As we will see, an this is the heart of our proof, the worst-case vectors are p 1 ffiffi ð71; y; 71Þ: So, in some sense, the worst-case vectors are typical, unlike, say, ð1; 0; y; 0Þ: From this it is a small leap to believe that the aversary woul not fare much worse by giving us a spherically ranom vector. But in that case, by symmetry, our commitment to lattice points is irrelevant! To get a more precise answer it seems like one has to elve into the proof. In particular, both for the spherically ranom case an for our istributions, the boun on k is manate by the probability of overestimating the projecte length. Thus, the ba events amount to the spanning vectors being too well-aligne with a: Now, in the spherically symmetric setting it is possible to have alignment that is arbitrarily close to perfect, albeit with corresponingly smaller probability. In our case, if we o not have perfect alignment then we are guarantee a certain, boune amount of misalignment. It is precisely this traeoff between the probability an the extent of alignment that rives the proof. Consier, for example, p the case when ¼ 2 with r ij Af 1; þ1g: As we sai above, the worstcase vector is w ¼ð1= ffiffiffi 2 Þð1; 1Þ: So, with probability 1=2 we have perfect alignment (when our ranom vector is 7w) an with probability 1=2 we have orthogonality. On the other han, for the spherically symmetric case, we have to consier the integral over all points on the plane, weighte by their probability uner the two-imensional Gaussian istribution. It is a rather instructive exercise to visually explore this traeoff by plotting the corresponing functions; moreover, it might offer the intereste reaer some intuition for the general case. 4. Preliminaries an the spherically symmetric case 4.1. Preliminaries Let x y enote the inner prouct of p vectors x; y: To simplify notation in the calculations we will work with a matrix R scale by 1= ffiffiffi pffiffiffiffiffiffiffiffi p : As a result, to get E we nee to scale A R by =k rather than 1= ffiffiffi p k : So, R will be a ranom k matrix with Rði; jþ ¼rij = ffiffiffi ; where the frij g are istribute as in Theorem 1.1. Therefore, if c j enotes the jth column of p R; then fc j g k j¼1 is a family of k i.i.. ranom unit vectors in R an for all aar ; f ðaþ ¼ ffiffiffiffiffiffiffiffi =k ða c1 ; y; a c Þ: Naturally, such scaling can be postpone until after the matrix multiplication (projection) has been performe, so that we maintain the avantage of only having f 1; 0; þ1g in the projection matrix. Let us first compute Eðjj f ðaþjj 2 Þ for an arbitrary vector aar : For j ¼ 1; y; k let Q j ðaþ ¼a c j ; where sometimes we will omit the epenence of Q j ðaþ on a an refer simply to Q j : Then EðQ j Þ¼E p 1 ffiffiffi X i¼1! a i r ij ¼ p 1 ffiffiffi X i¼1 a i Eðr ij Þ¼0 ð3þ

8 678 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) an 0 EðQ 2 j ¼ 1 E ¼ 1 X i¼1 1 p ffiffiffi X i¼1 X i¼1! 1 2 a i r ij A ða i r ij Þ 2 þ X a 2 i Eðr2 ij Þþ1 l¼1 X l¼1 X m¼1 X m¼1 2a l a m r lj r mj! 2a l a m Eðr lj ÞEðr mj Þ ¼ 1 jjajj2 : ð4þ Note that to get (3) an (4) we only use that the fr ij g are inepenent with zero mean an unit variance. From (4) we see that pffiffiffiffiffiffiffiffi Eðjj f ðaþjj 2 Þ¼Eððjj =k ða c1 ; y; a c ÞjjÞ 2 Þ¼ X k k j¼1 EðQ 2 j Þ¼jjajj2 : That is for any inepenent family of fr ij g with Eðr ij Þ¼0 an Varðr ij Þ¼1 we get an unbiase estimator, i.e., Eðjj f ðaþjj 2 Þ¼jjajj 2 : In orer to have a JL-embeing we nee that for each of the ð n 2Þ pairs u; vap; the square norm of the vector u v; is maintaine within a factor of 17e: Therefore, if for some family fr ij g as above we can prove that for some b40 an any fixe vector aar ; Pr½ð1 eþjjajj 2 pjj f ðaþjj 2 pð1 þ eþjjajj 2 ŠX1 2 n 2þb; ð5þ then the probability of not getting a JL-embeing is boune by ð n 2 Þ2=n2þb o1=n b : From the above iscussion we see that our entire task has been reuce to etermining a zero mean, unit variance istribution for the fr ij g such that (5) hols for any fixe vector a: In fact, since for any fixe projection matrix, jj f ðaþjj 2 is proportional to jjajj 2 ; it suffices to prove that (5) hols for arbitrary unit vectors. Moreover, since Eðjj f ðaþjj 2 Þ¼jjajj 2 ; inequality (5) merely asserts that the ranom variable jj f ðaþjj 2 is concentrate aroun its expectation The spherically symmetric case As a warmup we first work out the spherically ranom case below. Our results follows from exactly the same proof after replacing Lemma 4.1 below with a corresponing lemma for our choices of fr ij g: Getting a concentration inequality for jj f ðaþjj 2 when r ij ¼ D Nð0; 1Þ is straightforwar. Due to the 2-stability of the normal istribution, for every unit vector a; we have jj f ðaþjj 2 ¼ D w 2 ðkþ=k; where w 2 ðkþ enotes the chi-square istribution with k egrees of freeom. The fact that we get the same istribution for every vector a correspons to the obvious intuition that all vectors are the same

9 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) with respect to projection onto a spherically ranom vector. Stanar tail-bouns for the chi-square istribution reaily yiel the following. Lemma 4.1. Let r ij ¼ D Nð0; 1Þ for all i; j: Then, for any e40 an any unit-vector aar ; Pr½jj f ðaþjj 2 41 þ ešoexp k 2 ðe2 =2 e 3 =3Þ ; Pr½jj f ðaþjj 2 o1 ešoexp k 2 ðe2 =2 e 3 =3Þ : Thus, to get a JL-embeing we nee only require 2 exp k 2 ðe2 =2 e 3 =3Þ p 2 n 2þb; which hols for kx 4 þ 2b e 2 =2 e 3 log n: =3 Let us note that the boun on the upper tail of jj f ðaþjj 2 above is tight (up to lower orer terms). As a result, as long as the union boun is use, one cannot hope for a better boun on k while using spherically ranom vectors. To prove our result we will use the exact same approach, arguing that for every unit vector aar ; the ranom variable jj f ðaþjj 2 is sharply concentrate aroun its expectation. In the next section we state a lemma analogous to Lemma 4.1 above an show how it follows from bouns on certain moments of Q 2 1 : We then prove those bouns in Section Tail bouns To simplify notation let us efine for an arbitrary vector a; S ¼ SðaÞ ¼ Xk j¼1 ða c j Þ 2 ¼ Xk j¼1 Q 2 j ðaþ; where c j is the jth column of R; so that jj f ðaþjj 2 ¼ S =k: Lemma 5.1. Let r ij have any one of the two istributions in Theorem 1.1. Then, for any e40 an any unit vector aar ; Pr½SðaÞ4ð1 þ eþk=šoexp k 2 ðe2 =2 e 3 =3Þ ; Pr½SðaÞoð1 eþk=šoexp k 2 ðe2 =2 e 3 =3Þ :

10 680 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) In proving Lemma 5.1 we will generally omit the epenence of probabilities on a; making it explicit only when it affects our calculations. We will use the stanar technique of applying Markov s inequality to the moment generating function of S; thus reucing the proof of the lemma to bouning certain moments of Q 1 : In particular, we will nee the following lemma which will be prove in Section 6. Lemma 5.2. For all ha½0; =2Þ; all X1 an all unit vectors a; EðexpðhQ 1 ðaþ 2 1 ÞÞpp ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi; ð6þ 1 2h= EðQ 1 ðaþ 4 Þp 3 2: ð7þ Proof of Lemma 5.1. We start with the upper tail. For arbitrary h40 let us write Pr S4ð1 þ eþ k ¼ Pr expðhsþ4exp hð1 þ eþ k o EðexpðhSÞÞ exp hð1 þ eþ k : Since fq j g k j¼1 are i.i.. we have EðexpðhSÞÞ ¼ E Yk j¼1 expðhq 2 j Þ! ð8þ ¼ Yk j¼1 EðexpðhQ 2 j ÞÞ ð9þ ¼ðEðexpðhQ 2 1 ÞÞÞk ; ð10þ where passing from (8) to (9) uses that the fq j g k j¼1 are inepenent, while passing from (9) to (10) uses that they are ientically istribute. Thus, for any e40 Pr S4ð1 þ eþ k oðeðexpðhq 2 1 ÞÞÞk exp hð1 þ eþ k : ð11þ Substituting (6) in (11) we get (12). To optimize the boun we set the erivative in (12) with respect to h to 0. This gives h ¼ e 2 1þe o 2 : Substituting this value of h we get (13) an series expansion yiels (14). Pr S4ð1 þ eþ k! k 1 o pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi exp hð1 þ eþ k ð12þ 1 2h= ¼ðð1 þ eþ expð eþþ k=2 ð13þ oexp k 2 ðe2 =2 e 3 =3Þ : ð14þ

11 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) Similarly, but now consiering expð hsþ for arbitrary h40; we get that for any e40 Pr Soð1 eþ k oðeðexpð hq 2 1 ÞÞÞk exp hð1 eþ k : ð15þ Rather than bouning Eðexpð hq 2 1 ÞÞ irectly, let us expan expð hq2 1Þ to get!! Pr Soð1 eþ k k o E 1 hq 21 þ ð hq2 1 Þ2 exp hð1 eþ k 2! ¼ 1 h k þ h2 2 EðQ4 1 Þ exp hð1 eþ k ; ð16þ where EðQ 2 1Þ was given by (4). Now, substituting (7) in (16) we get (17). This time taking h ¼ e 2 1þe is not optimal but is still goo enough, giving (18). Again, series expansion yiels (19). Pr Soð1 eþ k p 1 h þ 3! h 2 k exp hð1 eþ k ð17þ 2! k e ¼ 1 2ð1 þ eþ þ 3e2 eð1 eþk 8ð1 þ eþ 2 exp ð18þ 2ð1 þ eþ oexp k 2 ðe2 =2 e 3 =3Þ : & ð19þ 6. Moment bouns To simplify notation in this section we will rop the subscript an refer to Q 1 as Q: It shoul be clear that the istribution of Q epens on a; i.e., Q ¼ QðaÞ: This is precisely what we give up by not projecting onto spherically symmetric vectors. Our strategy for giving bouns on the moments of Q will be to etermine a worst-case unit vector w an boun the moments of QðwÞ: We claim the following. Lemma 6.1. Let w ¼ p 1 ffiffiffi ð1; y; 1Þ: For every unit vector aar ; an for all k ¼ 0; 1; y EðQðaÞ 2k ÞpEðQðwÞ 2k Þ: ð20þ Moreover, we will prove that the even moments of QðwÞ are ominate by the corresponing moments from the spherically symmetric case. That is,

12 682 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) Lemma 6.2. Let T ¼ D Nð0; 1=Þ: For all X1 an all k ¼ 0; 1; y EðQðwÞ 2k ÞpEðT 2k Þ: ð21þ Using Lemmata 6.1 an 6.2 we can prove Lemma 5.2 as follows. Proof of Lemma 5.2. To prove (7) we observe that for any unit vector a; by (20) an (21), EðQðaÞ 4 ÞpEðQðwÞ 4 ÞpEðT 4 Þ; while Z þn EðT 4 1 Þ¼ p ffiffiffiffiffi expð l 2 =2Þ l4 N 2p 2 l ¼ 3 2: To prove (6) we first observe that for any real-value ranom variable U an for all h such that EðexpðhU 2 ÞÞ is boune, the Monotone Convergence Theorem (MCT) allows us to swap the expectation with the sum an get EðexpðhU 2 ÞÞ ¼ E XN k¼0! ðhu 2 Þ k k! ¼ XN k¼0 h k k! EðU 2k Þ: So, below, we procee as follows. Taking ha½0; =2Þ makes the integral in (22) converge, giving us (23). Thus, for such h; we can apply the MCT to get (24). Now, applying (20) an (21) (24) gives (25). Applying the MCT once more gives (26). EðexpðhT 2 ÞÞ ¼ Z þn N 1 p ffiffiffiffiffi 2p expð l 2 =2Þ exp h l2 l ð22þ 1 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 2h= ð23þ ¼ XN k¼0 h k k! EðT 2k Þ ð24þ X XN k¼0 h k k! EðQðaÞ2k Þ ð25þ ¼ EðexpðhQðaÞ 2 ÞÞ: p Thus, EðexpðhQ 2 ÞÞp1= ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 2h= for ha½0; =2Þ; as esire. & ð26þ Before proving Lemma 6.1 we will nee to prove the following lemma.

13 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) Lemma 6.3. Let r 1 ; r 2 be i.i.. ranom variables having one of the two probability istributions given by Eqs. (1) an (2) in Theorem p 1.1. For any a; bar let c ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ða 2 þ b 2 Þ=2: Then for any MAR an all k ¼ 0; 1; y EððM þ ar 1 þ br 2 Þ 2k ÞpEððM þ cr 1 þ cr 2 Þ 2k Þ: Proof. We first consier the case where r i Af 1; þ1g: If a 2 ¼ b 2 then a ¼ c an the lemma hols with equality. Otherwise, observe that EððM þ cr 1 þ cr 2 Þ 2k Þ EððM þ ar 1 þ br 2 Þ 2k Þ¼ S k 4 ; where S k ¼ðM þ 2cÞ 2k þ 2M 2k þðm 2cÞ 2k ðm þ a þ bþ 2k ðm þ a bþ 2k ðm a þ bþ 2k ðm a bþ 2k : We will show that S k X0 for all kx0: Since a 2 ab 2 we can use the binomial theorem to expan every term other than 2M 2k in S k an get! S k ¼ 2M 2k þ X2k 2k M 2k i D i ; i where i¼0 D i ¼ð2cÞ i þð 2cÞ i ða þ bþ i ða bþ i ð a þ bþ i ð a bþ i : Observe now that for o i; D i ¼ 0: Moreover, we claim that D 2j X0 for all jx1: To see this claim observe that ð2a 2 þ 2b 2 Þ¼ða þ bþ 2 þða bþ 2 an that for all jx1 an x; yx0; ðx þ yþ j Xx j þ y j : Thus, ð2cþ 2j ¼ð2a 2 þ 2b 2 Þ j ¼½ða þ bþ 2 þða bþ 2 Š j Xða þ bþ 2j þða bþ 2j implying!! S k ¼ 2M 2k þ Xk 2k M 2ðk jþ D 2j ¼ Xk 2k M 2ðk jþ D 2j X0: j¼0 2j j¼1 2j p The proof for the case where r i Af ffiffi pffiffi 3 ; 0; þ 3 g is just a more cumbersome version of the proof above, so we omit it. That proof, though, brings forwar an interesting point. If one tries to take r i ¼ 0 with probability greater than 2=3; while maintaining that r i has a range of size 3 an variance 1, the lemma fails. In other wors, 2=3 is tight in terms of how much probability mass we can put to r i ¼ 0 an still have the current lemma hol. & Proof of Lemma 6.1. Recall that for any vector a; QðaÞ ¼Q 1 ðaþ ¼a c 1 where c 1 ¼ p 1 ffiffiffi ðr 11 ; y; r 1 Þ: If a ¼ða 1 ; y; a Þ is such that a 2 i ¼ a 2 j for all i; j; then by symmetry, QðaÞ an QðwÞ are ientically istribute an the lemma hols trivially. Otherwise, we can assume without loss of generality,

14 684 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) qthat ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi a 2 1 aa2 2 an consier the more balance unit vector y ¼ðc; c; a 3; y; a Þ; where c ¼ ða 2 1 þ a2 2 Þ=2 : We will prove that EðQðaÞ 2k ÞpEðQðyÞ 2k Þ: ð27þ Applying this argument repeately yiels the lemma, as y eventually becomes w: To prove (27), below we first express EðQðaÞ 2k Þ as a sum of averages over r 11 ; r 21 an then apply Lemma 6.3 to get that each term (average) in the sum, is boune by the corresponing average for vector y: More precisely, EðQðaÞ 2k Þ¼ 1 k X M p 1 k X M ¼ EðQðyÞ 2k Þ: " # EððM þ a 1 r 11 þ a 2 r 21 Þ 2k ÞPr a i r i1 ¼ p M ffiffiffi i¼3 " # EððM þ cr 11 þ cr 21 Þ 2k ÞPr & X i¼3 X a i r i1 ¼ p M ffiffiffi Proof of Lemma 6.2. Recall that T ¼ D Nð0; 1=Þ: We will first express T as the scale sum of inepenent stanar Normal ranom variables. This will allow for a irect comparison of the terms in each of the two expectations. Specifically, let ft i g i¼1 be a family of i.i.. stanar Normal ranom variables. Then P i¼1 T i is a Normal ranom variable with variance : Therefore, T ¼ D 1 X i¼1 T i : Recall also that QðwÞ ¼Q 1 ðwþ ¼w c 1 where c 1 ¼ p 1 ffiffiffi ðr 11 ; y; r 1 Þ: To simplify notation let us write r i1 ¼ Y i an let us also rop the epenence of Q on w: Thus, Q ¼ 1 X i¼1 Y i ; where fy i g i¼1 are i.i.. r.v. having one of the two istributions in Theorem 1.1. We are now reay to compare EðQ 2k Þ with EðT 2k Þ: We first observe that for every k ¼ 0; 1; y EðT 2k Þ¼ 1 2k X i 1 ¼1? X i 2k ¼1 EðT i1?t i2k Þ

15 an EðQ 2k Þ¼ 1 2k X i 1 ¼1? X i 2k ¼1 EðY i1?y i2k Þ: To prove the lemma we will show that for every value assignment to the inices i 1 ; y; i 2k ; EðY i1?y i2k ÞpEðT i1?t i2k Þ: ð28þ Let V ¼ /v 1 ; v 2 ; y; v 2k S be the value assignment consiere. For iaf1; y; g; let c V ðiþ be the number of times that i appears in V: Observe that if for some i; c V ðiþ is o then both expectations appearing in (28) are 0, since both fy i g i¼1 an ft ig i¼1 are inepenent families an EðY i Þ¼EðT i Þ¼0 for all i: Thus, we can assume that there exists a set f j 1 ; j 2 ; y; j p g of inices an corresponing values c 1 ; c 2 ; y; c p such that EðY i1?y i2k Þ¼EðY 2c 1 j 1 Y 2c 2 j 2?Y 2c p j p Þ an EðT i1?t i2k Þ¼EðT 2c 1 j 1 T 2c 2 j 2?T 2c p j p Þ: Note now that since the inices j 1 ; j 2 ; y; j p are istinct, fy jt g p t¼1 an ft j t g p t¼1 are families of i.i.. r.v. Therefore, EðY i1?y i2k Þ¼EðY 2c 1 j 1 Þ? EðY 2c p j p Þ an EðT i1?t i2k Þ¼EðT 2c 1 j 1 Þ? EðT 2c p j p Þ: So, without loss of generality, in orer to prove (28) it suffices to prove that for every c ¼ 0; 1; y EðY 2c 1 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) c ÞpEðT Þ: ð29þ 1 This, though, is completely trivial. First recall the well-known fact that the ð2cþth moment of Nð0; 1Þ is ð2c 1Þ!! ¼ð2cÞ!=ðc!2 c ÞX1: Now: * If Y1 Af 1; p þ1g then EðY1 2c Þ¼1; for all cx0: * If Y1 Af ffiffi pffiffi 3 ; 0; þ 3 g then EðY 2c 1 Þ¼3c 1 pð2cþ!=ðc!2 c Þ; where the last inequality follows by an easy inuction. It is worth pointing out that, along with Lemma 6.3, these are the only two points were we use any properties of the istributions for the r ij (here calle Y i ) other than them having zero mean an unit variance. & Finally, we note that * Since EðY 2c 2c 1 ÞoEðT1 Þ for certain l; we see that for each fixe ; both inequalities in Lemma 5.2 are actually strict, yieling slightly better tails bouns for S an a corresponingly better boun for k 0 :

16 686 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) * By using Jensen s inequality one can get a irect boun for EðQ 2k Þ when Y i Af 1; þ1g; i.e., without comparing it to EðT 2k Þ: That simplifies the proof for that case an shows that, in fact, taking Y i Af 1; þ1g is the minimizer of EðexpðhQ 2 ÞÞ for all h: 7. Discussion 7.1. Database-frienliness As we mentione earlier, all previously known constructions of JL-embeings require the multiplication of the input matrix with a ense, ranom matrix. Unfortunately, such general matrix multiplication can be very inefficient (an cumbersome) to carry out in a relational atabase. Our constructions, on the other han, replace the inner prouct operations of matrix multiplication with view selection an aggregation (aition). Using, say, istribution (2) of Theorem 1.1 the k new coorinates are generate by inepenently performing the following ranom experiment k times: throw away 2=3 of the original attributes at ranom; partition the remaining attributes ranomly into two parts; for each part, prouce a new attribute equal to the sum of all its attributes; take the ifference of the two sum attributes Further work It is well-known that Oðe 2 log n=logð1=eþþ imensions are necessary for embeing arbitrary sets of n points with istortion 17e: At the same time, all currently known embeings amount to (ranom) projections requiring Oðe 2 log nþ imensions. As we saw, the current analysis of such projections is tight, except for using the union boun to boun the total probability of the n 2 potential ba events. Exploring the possibility of savings on this point appears interesting, especially since in practice we often can make some assumptions about the input points. For example, what happens if the points are guarantee to alreay lie (approximately) in some lowimensional space? Finally, we note that with effort (an using a ifferent proof) one can increase the probability of 0 in istribution (2) slightly above 2=3: Yet, it seems har to get a significant increase without incurring a penalty in the imensionality. Exploring the suggeste traeoff might also be interesting, at least for practice purposes. Acknowlegments I am grateful to Marek Biskup for his help with the proof of Lemma 6.2 an to Jeong Han Kim for suggesting the approach of Eq. (16). Many thanks also to Paul Braley, Aris Gionis, Anna Karlin, Elias Koutsoupias an Piotr Inyk for comments on earlier versions of the paper an useful iscussions.

17 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) References [1] D. Achlioptas, Database-frienly ranom projections, 20th Annual Symposium on Principles of Database Systems, Santa Barbara, CA, 2001, pp [2] R.I. Arriaga, S. Vempala, An algorithmic theory of learning: robust concepts an ranom projection, 40th Annual Symposium on Founations of Computer Science, New York, NY, 1999, IEEE Computer Society Press, Los Alamitos, CA, 1999, pp [3] S. Arora, R. Kannan, Learning mixtures of arbitrary Gaussians, 33r Annual ACM Symposium on Theory of Computing, Creete, Greece, ACM, New York, 2001, pp [4] S. Dasgupta, Learning mixtures of Gaussians, 40th Annual Symposium on Founations of Computer Science, New York, NY, 1999, IEEE Computer Society Press, Los Alamitos, CA, 1999, pp [5] S. Dasgupta, A. Gupta, An elementary proof of the Johnson Linenstrauss lemma. Technical Report , UC Berkeley, March [6] P. Frankl, H. Maehara, The Johnson-Linenstrauss lemma an the sphericity of some graphs, J. Combin. Theory Ser. B 44 (3) (1988) [7] P. Inyk, Stable istributions, pseuoranom generators, embeings an ata stream computation, 41st Annual Symposium on Founations of Computer Science, Reono Beach, CA, 2000, IEEE Computer Society Press, Los Alamitos, CA, 2000, pp [8] P. Inyk, R. Motwani, Approximate nearest neighbors: towars removing the curse of imensionality, 30th Annual ACM Symposium on Theory of Computing, Dallas, TX, ACM, New York, 1998, pp [9] W.B. Johnson, J. Linenstrauss, Extensions of Lipschitz mappings into a Hilbert space, Conference in moern analysis an probability, New Haven, CI, 1982, Amer. Math. Soc., Provience, RI, 1984, pp [10] J. Kleinberg, Two algorithms for nearest-neighbor search in high imensions, 29th Annual ACM Symposium on Theory of Computing, El Paso, TX, 1997, ACM, New York, 1997, pp [11] E. Kushilevitz, R. Ostrovsky, Y. Rabani, Efficient search for approximate nearest neighbor in high imensional spaces, SIAM J. Comput. 30 (2) (2000) [12] N. Linial, E. Lonon, Y. Rabinovich, The geometry of graphs an some of its algorithmic applications, Combinatorica 15 (2) (1995) [13] C.H. Papaimitriou, P. Raghavan, H. Tamaki, S. Vempala, Latent semantic inexing: a probabilistic analysis, 17th Annual Symposium on Principles of Database Systems, Seattle, WA, 1998, pp [14] L.J. Schulman, Clustering for ege-cost minimization, 32n Annual ACM Symposium on Theory of Computing, Portlan, OR, 2000, ACM, New York, 2000, pp [15] D. Sivakumar, Algorithmic eranomization via complexity theory, 34th Annual ACM Symposium on Theory of Computing, Montreal, QC, 2002, ACM, New York, 2002, pp [16] S. Vempala, A ranom sampling base algorithm for learning the intersection of half-spaces, 38th Annual Symposium on Founations of Computer Science, Miami, FL, 1997, IEEE Computer Society Press, Los Alamitos, CA, 1997, pp