Databasefriendly random projections: JohnsonLindenstrauss with binary coins


 Belinda Flynn
 1 years ago
 Views:
Transcription
1 Journal of Computer an System Sciences 66 (2003) Databasefrienly ranom projections: JohnsonLinenstrauss with binary coins Dimitris Achlioptas Microsoft Research, One Microsoft Way, Remon, WA 98052, USA Receive 28 August 2001; revise 19 July 2002 Abstract A classic result of Johnson an Linenstrauss asserts that any set of n points in imensional Eucliean space can be embee into kimensional Eucliean space where k is logarithmic in n an inepenent of so that all pairwise istances are maintaine within an arbitrarily small factor. All known constructions of such embeings involve projecting the n points onto a spherically ranom kimensional hyperplane through the origin. We give two constructions of such embeings with the property that all elements of the projection matrix belong in f 1; 0; þ1g: Such constructions are particularly well suite for atabase environments, as the computation of the embeing reuces to evaluating a single aggregate over k ranom partitions of the attributes. r 2003 Elsevier Science (USA). All rights reserve. 1. Introuction Consier projecting the points of your favorite sculpture first onto the plane an then onto a single line. The result amply emonstrates the power of imensionality. In general, given a highimensional pointset it is natural to ask if it coul be embee into a lower imensional space without suffering great istortion. In this paper, we consier this question for finite sets of points in Eucliean space. It will be convenient to think of n points in R as an n matrix A; each point represente as a row (vector). Given such a matrix representation, one of the most commonly use embeings is the one suggeste by the singular value ecomposition of A: That is, in orer to embe the n points into R k we project them onto the kimensional space spanne by the singular vectors corresponing to the k largest singular values of A: If one rewrites the result of this projection as a (rank k) n matrix A k ; we are guarantee that any other kimensional pointset (represente as an n aress: /03/$  see front matter r 2003 Elsevier Science (USA). All rights reserve. oi: /s (03)
2 672 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) matrix D) satisfies ja A k j F pja Dj F ; where, for any matrix Q; jqj 2 F ¼ P Q 2 i;j : To interpret this result observe that if moving a point by z takes energy proportional to z 2 ; A k represents the kimensional configuration reachable from A by expening least energy. In fact, A k is an optimal rank k approximation of A uner many matrix norms. In particular, it is wellknown that for any rank k matrix D an for any rotationally invariant norm ja A k jpja Dj: At the same time, this optimality implies no guarantees regaring local properties of the resulting embeing. For example, it is not har to evise examples where the new istance between a pair of points is arbitrarily smaller than their original istance. For a number of problems where imensionality reuction is clearly esirable, the absence of such local guarantees can make it har to exploit embeings algorithmically. In a seminal paper, Linial et al. [12] were the first to consier algorithmic applications of embeings that respect local properties. By now, embeings of this type have become an important tool in algorithmic esign. A real gem in this area has been the following result of Johnson an Linenstrauss [9]. Lemma 1.1 (Johnson an Linenstrauss [9]). Given e40 an an integer n; let k be a positive integer such that kxk 0 ¼ Oðe 2 log nþ: For every set P of n points in R there exists f : R R k such that for all u; vap ð1 eþjju vjj 2 pjj f ðuþ f ðvþjj 2 pð1 þ eþjju vjj 2 : We will refer to embeings proviing a guarantee akin to that of Lemma 1.1 as JLembeings. In the last few years, such embeings have been use in solving a variety of problems. The iea is as follows. By proviing a lowimensional representation of the ata, JLembeings spee up certain algorithms ramatically, in particular algorithms whose runtime epens exponentially in the imension of the working space. (For a number of practical problems the bestknown algorithms inee have such behavior.) At the same time, the provie guarantee regaring pairwise istances often allows one to establish that the solution foun by working in the lowimensional space is a goo approximation to the solution in the original space. We give a few examples below. Papaimitriou et al. [13], prove that embeing the points of A in a lowimensional space can significantly spee up the computation of a lowrank approximation to A; without significantly affecting its quality. In [8], Inyk an Motwani showe that JLembeings are useful in solving the eapproximate nearestneighbor problem, where (after some preprocessing of the pointset P) one is to answer queries of the following type: Given an arbitrary point x; fin a point yap; such that for every point zap; jjx zjjxð1 eþjjx yjj: In a ifferent vein, Schulman [14] use JLembeings as part of an approximation algorithm for the version of clustering where we seek to minimize the sum of the squares of intracluster istances. Recently, Inyk [7] showe that
3 JLembeings can also be use in the context of atastream computation, where one has limite memory an is allowe only a single pass over the ata (stream) Our contribution D. Achlioptas / Journal of Computer an System Sciences 66 (2003) Over the years, the probabilistic metho has allowe for the original proof of Johnson an Linenstrauss to be greatly simplifie an sharpene, while at the same time giving conceptually simple ranomize algorithms for constructing the embeing [5,6,8]. Roughly speaking, all such algorithms project the input points onto a spherically ranom hyperplane through the origin. While this is conceptually simple, in practical terms it amounts to multiplying the input matrix A with a ense matrix of real numbers. This can be a nontrivial task in many practical computational environments. At the same time, investigating the role of spherical symmetry in the choice of hyperplane is mathematically interesting in itself. Our main result, below, asserts that one can replace projections onto spherically ranom hyperplanes with much simpler an faster operations. In particular, these operations can be implemente efficiently in a atabase environment using stanar SQL primitives. Somewhat surprisingly, we prove that this comes without any sacrifice in the quality of the embeing. In fact, we will see that for every fixe value of we get a slightly better boun than all current methos. We state our result below as Theorem 1.1. Similarly to Lemma 1.1, the parameter e controls the esire accuracy in istance preservation, while now b controls the projection s probability of success. Theorem 1.1. Let P be an arbitrary set of n points in R ; represente as an n matrix A: Given e; b40 let k 0 ¼ 4 þ 2b e 2 =2 e 3 log n: =3 For integer kxk 0 ; let R be a k ranom matrix with Rði; jþ ¼r ij ; where fr ij g are inepenent ranom variables from either one of the following two probability istributions: ( þ1 with probability 1=2; r ij ¼ ð1þ 1 with probability 1=2; 8 p r ij ¼ ffiffi >< þ1 with probability 1=6; 3 0 with probability 2=3; ð2þ >: 1 with probability 1=6: Let E ¼ p 1 ffiffiffi AR k an let f : R R k map the ith row of A to the ith row of E: With probability at least 1 n b ; for all u; vap ð1 eþjju vjj 2 pjj f ðuþ f ðvþjj 2 pð1 þ eþjju vjj 2 :
4 674 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) We see that to construct a JLembeing via Theorem 1.1 we only nee very simple probability istributions to generate the projection matrix, while the computation of the projection itself reuces to aggregate evaluation, i.e., summations an subtractions (but no multiplications). While it seems har to imagine a probability istribution much simpler than (1), probability istribution (2) gives an aitional threefol speeup as we only nee to process a thir of all attributes for each of the k coorinates. We note that the construction base on probability istribution (1) was, inepenently, propose by Arriaga an Vempala in [2], but without an analysis of its performance Projecting onto ranom lines Looking a bit more closely into the computation of the embeing we see that each row (vector) of A is projecte onto k ranom vectors whose coorinates fr ij g are inepenent ranom variables with mean 0 an variance 1. If the fr ij g were inepenent normal ranom variables with mean 0 an variance 1, it is wellknown that each resulting vector woul point to uniformly ranom irection in space. Projections onto such vectors have been consiere in a number of settings, incluing the work of Kleinberg [10] an Kushilevitz et al. [11] on approximate nearest neighbors an of Vempala on learning intersections of halfspaces [16]. More recently, such projections have also been use in learning mixture of Gaussians moels, starting with the work of Dasgupta [4] an later with the work of Arora an Kannan [3]. Our proof implies that for any fixe vector a; the behavior of its projection onto a ranom vector c is manate by the even moments of the ranom variable jja cjj: In fact, our result follows by showing that for every vector a; uner our istributions for fr ij g; these moments are ominate by the corresponing moments for the case where c is spherically symmetric. As a result, projecting onto vectors whose entries are istribute like the columns of matrix R is computationally simpler an results in projections that are at least as nicely behave Ranomization A naive, perhaps, attempt at constructing JLembeings woul be to pick k of the original coorinates in imensional space as the new coorinates. Naturally, as two points can be very far apart while only iffering along a single imension, this approach is oome. Yet, if we knew that for every pair of points all coorinates contribute roughly equally to the corresponing pairwise, istance, this naive scheme woul make perfect sense. With this in min, it is very natural to try an remove pathologies like those mentione above by first applying a ranom rotation to the original pointset in R : Observe now that picking, say, the first k coorinates after applying a ranom rotation is exactly the same as projecting onto a spherically ranom kimensional hyperplane! Thus, we see that ranomization in JLprojections only serves as insurance against axisalignment, analogous to the application of a ranom permutation before running Quicksort Deranomization Theorem 1.1 allows one to use significantly fewer ranom bits than all previous methos for constructing JLembeings. Inee, since the appearance of an earlier version of this work [1],
5 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) the construction base on probability istribution (1) has been use by Sivakumar [15] to give a very simple eranomization of the Johnson Linenstrauss lemma. 2. Previous work As we will see, in all methos for proucing JLembeings, incluing ours, the heart of the matter is showing that for any vector, the square length of its projection is sharply concentrate aroun its expecte value. The original proof of Johnson an Linenstrauss [9] uses quite heavy geometric approximation machinery to yiel such a concentration boun. That proof was greatly simplifie an sharpene by Frankl an Meahara [6] who explicitly consiere a projection onto k ranom orthonormal vectors (as oppose to viewing such vectors as the basis of a ranom hyperplane), yieling the following result. Theorem 2.1 (Frankl an Meahara [6]). For any eað0; 1=2Þ; any sufficiently large set PAR ; an kxk 0 ¼ J9ðe 2 2e 3 =3Þ 1 log jpjn þ 1; there exists a map f : PR k such that for all u; vap; ð1 eþjju vjj 2 pjj f ðuþ f ðvþjj 2 pð1 þ eþjju vjj 2 : The next great simplification of the proof of Lemma 1.1 was given, inepenently, by Inyk an Motwani [8] an Dasgupta an Gupta [5], the latter also giving a slight sharpening of the boun for k 0 : By combining the analysis of [5] with the viewpoint of [8] it is in fact not har to show that Theorem 1.1 hols if for all i; j; r ij ¼ D Nð0; 1Þ (this was also observe in [2]). Below we state our renition of how each of these two latest simplifications [8,5] were achieve, as they prepare the groun for our own work. Let us write X ¼ D y to enote that X is istribute as Y an recall that Nð0; 1Þ enotes the stanar Normal istribution with mean 0 an variance 1. Inyk an Motwani [8]: Assume that we try to implement the scheme of Frankl an Maehara [6] but we are lazy about enforcing either normality (unit length) or orthogonality among our k vectors. Instea, we just pick k inepenent, spherically symmetric ranom vectors, by taking as the coorinates of each vector ki:i:: Nð0; 1=Þ ranom variables (so that the expecte length of each vector is 1). An immeiate gain of this approach is that now, for any fixe vector a; the length of its projection onto each of our vectors is also a normal ranom variable. This is ue to a powerful an eep fact, namely the 2stability of the Gaussian istribution: for any real numbers a 1 ; a 2 ; y; a ; if fz i g i¼1 is a family of inepenent normal ranom variables an X ¼ P i¼1 a iz i ; then X ¼ D cnð0; 1Þ; where c ¼ða 2 1 þ? þ a2 Þ1=2 : As a result, if we take these k projection lengths to be the coorinates of the embee vector in R k ; then the square length of the embee vector follows the chisquare istribution for which strong concentration bouns are reaily available. Remarkably, very little is lost ue to this laziness. Although, we i not explicitly enforce either orthogonality, or normality, the resulting k vectors, with high probability, will come very close to having both of these properties. In particular, the length of each of the k vectors is sharply
6 676 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) concentrate (aroun 1) as the sum of inepenent ranom variables. Moreover, since the k vectors point in uniformly ranom irections in R ; as grows they rapily get closer to being orthogonal. Dasgupta an Gupta [5]: Here we will exploit spherical symmetry without appealing irectly to the 2stability of the Gaussian istribution. Instea observe that, by symmetry, the projection of any unit vector a on a ranom hyperplane through the origin is istribute exactly like the projection of a ranom point from the surface of the imensional sphere onto a fixe subspace of imension k: Such a projection can be stuie reaily since each coorinate is a scale normal ranom variable. With a somewhat tighter analysis than [8], this approach gave the best known boun, namely kxk 0 ¼ð4 þ 2bÞðe 2 =2 e 3 =3Þ 1 log n; which is exactly the same as the boun in Theorem Some intuition Our contribution begins with the realization that spherical symmetry, while making life extremely comfortable, is not essential. What is essential is concentration. So, at least in principle, we are free to consier other caniate istributions for the fr ij g; if perhaps at the expense of comfort. As we saw earlier, each column of our projection matrix R will yiel one coorinate of the projection in R k : Thus, the square length of the projection is merely the sum of the squares of these coorinates. Therefore, the projection can be thought of as follows: each column of R acts as an inepenent estimator of the original vector s length (its estimate being the inner prouct with it); to reach consensus we take the sum of the k estimators. Seen from this angle, requiring the k vectors to be orthonormal has the pleasant statistical overtone of maximizing mutual information (since all estimators have equal weight an are orthogonal). Nonetheless, even if we only require that each column simply gives an unbiase, boune variance estimator, the Central Limit Theorem implies that if we take sufficiently many columns, we can get an arbitrarily goo estimate of the original length. Naturally, the number of columns neee epens on the variance of the estimators. From the above we see that the key issue is the concentration of the projection of an arbitrary fixe vector a onto a single ranom vector. The main technical ifficulty that results from giving up spherical symmetry is that this concentration can epen on a: Our technical contribution lies in etermining probability istributions for the fr ij g uner which, for all vectors, this concentration is at least as goo as in the spherically symmetric case. In fact, it will turn out that for every fixe value of ; we can get a (minuscule) improvement in concentration. Thus, for every fixe ; we can actually get a strictly better boun for k than by taking spherically ranom vectors. The reaer might be wonering how can it be that perfect spherical symmetry oes not buy us anything (an is in fact slightly worse for each fixe )?. The following intuitive argument hints at why giving up spherical symmetry is (at least) not catastrophic. Say, for example, that fr ij gaf 1; þ1g so that the projection length of certain vectors is more variable than that of others, an assume that an aversary is trying to pick a worstcase such vector w; i.e., one whose projection length is most variable. Our problem can be rephrase
7 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) as How much are we empowering an aversary by committing to picking our column vectors among lattice points rather than arbitrary points in R?. As we will see, an this is the heart of our proof, the worstcase vectors are p 1 ffiffi ð71; y; 71Þ: So, in some sense, the worstcase vectors are typical, unlike, say, ð1; 0; y; 0Þ: From this it is a small leap to believe that the aversary woul not fare much worse by giving us a spherically ranom vector. But in that case, by symmetry, our commitment to lattice points is irrelevant! To get a more precise answer it seems like one has to elve into the proof. In particular, both for the spherically ranom case an for our istributions, the boun on k is manate by the probability of overestimating the projecte length. Thus, the ba events amount to the spanning vectors being too wellaligne with a: Now, in the spherically symmetric setting it is possible to have alignment that is arbitrarily close to perfect, albeit with corresponingly smaller probability. In our case, if we o not have perfect alignment then we are guarantee a certain, boune amount of misalignment. It is precisely this traeoff between the probability an the extent of alignment that rives the proof. Consier, for example, p the case when ¼ 2 with r ij Af 1; þ1g: As we sai above, the worstcase vector is w ¼ð1= ffiffiffi 2 Þð1; 1Þ: So, with probability 1=2 we have perfect alignment (when our ranom vector is 7w) an with probability 1=2 we have orthogonality. On the other han, for the spherically symmetric case, we have to consier the integral over all points on the plane, weighte by their probability uner the twoimensional Gaussian istribution. It is a rather instructive exercise to visually explore this traeoff by plotting the corresponing functions; moreover, it might offer the intereste reaer some intuition for the general case. 4. Preliminaries an the spherically symmetric case 4.1. Preliminaries Let x y enote the inner prouct of p vectors x; y: To simplify notation in the calculations we will work with a matrix R scale by 1= ffiffiffi pffiffiffiffiffiffiffiffi p : As a result, to get E we nee to scale A R by =k rather than 1= ffiffiffi p k : So, R will be a ranom k matrix with Rði; jþ ¼rij = ffiffiffi ; where the frij g are istribute as in Theorem 1.1. Therefore, if c j enotes the jth column of p R; then fc j g k j¼1 is a family of k i.i.. ranom unit vectors in R an for all aar ; f ðaþ ¼ ffiffiffiffiffiffiffiffi =k ða c1 ; y; a c Þ: Naturally, such scaling can be postpone until after the matrix multiplication (projection) has been performe, so that we maintain the avantage of only having f 1; 0; þ1g in the projection matrix. Let us first compute Eðjj f ðaþjj 2 Þ for an arbitrary vector aar : For j ¼ 1; y; k let Q j ðaþ ¼a c j ; where sometimes we will omit the epenence of Q j ðaþ on a an refer simply to Q j : Then EðQ j Þ¼E p 1 ffiffiffi X i¼1! a i r ij ¼ p 1 ffiffiffi X i¼1 a i Eðr ij Þ¼0 ð3þ
8 678 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) an 0 EðQ 2 j ¼ 1 E ¼ 1 X i¼1 1 p ffiffiffi X i¼1 X i¼1! 1 2 a i r ij A ða i r ij Þ 2 þ X a 2 i Eðr2 ij Þþ1 l¼1 X l¼1 X m¼1 X m¼1 2a l a m r lj r mj! 2a l a m Eðr lj ÞEðr mj Þ ¼ 1 jjajj2 : ð4þ Note that to get (3) an (4) we only use that the fr ij g are inepenent with zero mean an unit variance. From (4) we see that pffiffiffiffiffiffiffiffi Eðjj f ðaþjj 2 Þ¼Eððjj =k ða c1 ; y; a c ÞjjÞ 2 Þ¼ X k k j¼1 EðQ 2 j Þ¼jjajj2 : That is for any inepenent family of fr ij g with Eðr ij Þ¼0 an Varðr ij Þ¼1 we get an unbiase estimator, i.e., Eðjj f ðaþjj 2 Þ¼jjajj 2 : In orer to have a JLembeing we nee that for each of the ð n 2Þ pairs u; vap; the square norm of the vector u v; is maintaine within a factor of 17e: Therefore, if for some family fr ij g as above we can prove that for some b40 an any fixe vector aar ; Pr½ð1 eþjjajj 2 pjj f ðaþjj 2 pð1 þ eþjjajj 2 ŠX1 2 n 2þb; ð5þ then the probability of not getting a JLembeing is boune by ð n 2 Þ2=n2þb o1=n b : From the above iscussion we see that our entire task has been reuce to etermining a zero mean, unit variance istribution for the fr ij g such that (5) hols for any fixe vector a: In fact, since for any fixe projection matrix, jj f ðaþjj 2 is proportional to jjajj 2 ; it suffices to prove that (5) hols for arbitrary unit vectors. Moreover, since Eðjj f ðaþjj 2 Þ¼jjajj 2 ; inequality (5) merely asserts that the ranom variable jj f ðaþjj 2 is concentrate aroun its expectation The spherically symmetric case As a warmup we first work out the spherically ranom case below. Our results follows from exactly the same proof after replacing Lemma 4.1 below with a corresponing lemma for our choices of fr ij g: Getting a concentration inequality for jj f ðaþjj 2 when r ij ¼ D Nð0; 1Þ is straightforwar. Due to the 2stability of the normal istribution, for every unit vector a; we have jj f ðaþjj 2 ¼ D w 2 ðkþ=k; where w 2 ðkþ enotes the chisquare istribution with k egrees of freeom. The fact that we get the same istribution for every vector a correspons to the obvious intuition that all vectors are the same
9 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) with respect to projection onto a spherically ranom vector. Stanar tailbouns for the chisquare istribution reaily yiel the following. Lemma 4.1. Let r ij ¼ D Nð0; 1Þ for all i; j: Then, for any e40 an any unitvector aar ; Pr½jj f ðaþjj 2 41 þ ešoexp k 2 ðe2 =2 e 3 =3Þ ; Pr½jj f ðaþjj 2 o1 ešoexp k 2 ðe2 =2 e 3 =3Þ : Thus, to get a JLembeing we nee only require 2 exp k 2 ðe2 =2 e 3 =3Þ p 2 n 2þb; which hols for kx 4 þ 2b e 2 =2 e 3 log n: =3 Let us note that the boun on the upper tail of jj f ðaþjj 2 above is tight (up to lower orer terms). As a result, as long as the union boun is use, one cannot hope for a better boun on k while using spherically ranom vectors. To prove our result we will use the exact same approach, arguing that for every unit vector aar ; the ranom variable jj f ðaþjj 2 is sharply concentrate aroun its expectation. In the next section we state a lemma analogous to Lemma 4.1 above an show how it follows from bouns on certain moments of Q 2 1 : We then prove those bouns in Section Tail bouns To simplify notation let us efine for an arbitrary vector a; S ¼ SðaÞ ¼ Xk j¼1 ða c j Þ 2 ¼ Xk j¼1 Q 2 j ðaþ; where c j is the jth column of R; so that jj f ðaþjj 2 ¼ S =k: Lemma 5.1. Let r ij have any one of the two istributions in Theorem 1.1. Then, for any e40 an any unit vector aar ; Pr½SðaÞ4ð1 þ eþk=šoexp k 2 ðe2 =2 e 3 =3Þ ; Pr½SðaÞoð1 eþk=šoexp k 2 ðe2 =2 e 3 =3Þ :
10 680 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) In proving Lemma 5.1 we will generally omit the epenence of probabilities on a; making it explicit only when it affects our calculations. We will use the stanar technique of applying Markov s inequality to the moment generating function of S; thus reucing the proof of the lemma to bouning certain moments of Q 1 : In particular, we will nee the following lemma which will be prove in Section 6. Lemma 5.2. For all ha½0; =2Þ; all X1 an all unit vectors a; EðexpðhQ 1 ðaþ 2 1 ÞÞpp ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi; ð6þ 1 2h= EðQ 1 ðaþ 4 Þp 3 2: ð7þ Proof of Lemma 5.1. We start with the upper tail. For arbitrary h40 let us write Pr S4ð1 þ eþ k ¼ Pr expðhsþ4exp hð1 þ eþ k o EðexpðhSÞÞ exp hð1 þ eþ k : Since fq j g k j¼1 are i.i.. we have EðexpðhSÞÞ ¼ E Yk j¼1 expðhq 2 j Þ! ð8þ ¼ Yk j¼1 EðexpðhQ 2 j ÞÞ ð9þ ¼ðEðexpðhQ 2 1 ÞÞÞk ; ð10þ where passing from (8) to (9) uses that the fq j g k j¼1 are inepenent, while passing from (9) to (10) uses that they are ientically istribute. Thus, for any e40 Pr S4ð1 þ eþ k oðeðexpðhq 2 1 ÞÞÞk exp hð1 þ eþ k : ð11þ Substituting (6) in (11) we get (12). To optimize the boun we set the erivative in (12) with respect to h to 0. This gives h ¼ e 2 1þe o 2 : Substituting this value of h we get (13) an series expansion yiels (14). Pr S4ð1 þ eþ k! k 1 o pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi exp hð1 þ eþ k ð12þ 1 2h= ¼ðð1 þ eþ expð eþþ k=2 ð13þ oexp k 2 ðe2 =2 e 3 =3Þ : ð14þ
11 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) Similarly, but now consiering expð hsþ for arbitrary h40; we get that for any e40 Pr Soð1 eþ k oðeðexpð hq 2 1 ÞÞÞk exp hð1 eþ k : ð15þ Rather than bouning Eðexpð hq 2 1 ÞÞ irectly, let us expan expð hq2 1Þ to get!! Pr Soð1 eþ k k o E 1 hq 21 þ ð hq2 1 Þ2 exp hð1 eþ k 2! ¼ 1 h k þ h2 2 EðQ4 1 Þ exp hð1 eþ k ; ð16þ where EðQ 2 1Þ was given by (4). Now, substituting (7) in (16) we get (17). This time taking h ¼ e 2 1þe is not optimal but is still goo enough, giving (18). Again, series expansion yiels (19). Pr Soð1 eþ k p 1 h þ 3! h 2 k exp hð1 eþ k ð17þ 2! k e ¼ 1 2ð1 þ eþ þ 3e2 eð1 eþk 8ð1 þ eþ 2 exp ð18þ 2ð1 þ eþ oexp k 2 ðe2 =2 e 3 =3Þ : & ð19þ 6. Moment bouns To simplify notation in this section we will rop the subscript an refer to Q 1 as Q: It shoul be clear that the istribution of Q epens on a; i.e., Q ¼ QðaÞ: This is precisely what we give up by not projecting onto spherically symmetric vectors. Our strategy for giving bouns on the moments of Q will be to etermine a worstcase unit vector w an boun the moments of QðwÞ: We claim the following. Lemma 6.1. Let w ¼ p 1 ffiffiffi ð1; y; 1Þ: For every unit vector aar ; an for all k ¼ 0; 1; y EðQðaÞ 2k ÞpEðQðwÞ 2k Þ: ð20þ Moreover, we will prove that the even moments of QðwÞ are ominate by the corresponing moments from the spherically symmetric case. That is,
12 682 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) Lemma 6.2. Let T ¼ D Nð0; 1=Þ: For all X1 an all k ¼ 0; 1; y EðQðwÞ 2k ÞpEðT 2k Þ: ð21þ Using Lemmata 6.1 an 6.2 we can prove Lemma 5.2 as follows. Proof of Lemma 5.2. To prove (7) we observe that for any unit vector a; by (20) an (21), EðQðaÞ 4 ÞpEðQðwÞ 4 ÞpEðT 4 Þ; while Z þn EðT 4 1 Þ¼ p ffiffiffiffiffi expð l 2 =2Þ l4 N 2p 2 l ¼ 3 2: To prove (6) we first observe that for any realvalue ranom variable U an for all h such that EðexpðhU 2 ÞÞ is boune, the Monotone Convergence Theorem (MCT) allows us to swap the expectation with the sum an get EðexpðhU 2 ÞÞ ¼ E XN k¼0! ðhu 2 Þ k k! ¼ XN k¼0 h k k! EðU 2k Þ: So, below, we procee as follows. Taking ha½0; =2Þ makes the integral in (22) converge, giving us (23). Thus, for such h; we can apply the MCT to get (24). Now, applying (20) an (21) (24) gives (25). Applying the MCT once more gives (26). EðexpðhT 2 ÞÞ ¼ Z þn N 1 p ffiffiffiffiffi 2p expð l 2 =2Þ exp h l2 l ð22þ 1 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 2h= ð23þ ¼ XN k¼0 h k k! EðT 2k Þ ð24þ X XN k¼0 h k k! EðQðaÞ2k Þ ð25þ ¼ EðexpðhQðaÞ 2 ÞÞ: p Thus, EðexpðhQ 2 ÞÞp1= ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 2h= for ha½0; =2Þ; as esire. & ð26þ Before proving Lemma 6.1 we will nee to prove the following lemma.
13 D. Achlioptas / Journal of Computer an System Sciences 66 (2003) Lemma 6.3. Let r 1 ; r 2 be i.i.. ranom variables having one of the two probability istributions given by Eqs. (1) an (2) in Theorem p 1.1. For any a; bar let c ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ða 2 þ b 2 Þ=2: Then for any MAR an all k ¼ 0; 1; y EððM þ ar 1 þ br 2 Þ 2k ÞpEððM þ cr 1 þ cr 2 Þ 2k Þ: Proof. We first consier the case where r i Af 1; þ1g: If a 2 ¼ b 2 then a ¼ c an the lemma hols with equality. Otherwise, observe that EððM þ cr 1 þ cr 2 Þ 2k Þ EððM þ ar 1 þ br 2 Þ 2k Þ¼ S k 4 ; where S k ¼ðM þ 2cÞ 2k þ 2M 2k þðm 2cÞ 2k ðm þ a þ bþ 2k ðm þ a bþ 2k ðm a þ bþ 2k ðm a bþ 2k : We will show that S k X0 for all kx0: Since a 2 ab 2 we can use the binomial theorem to expan every term other than 2M 2k in S k an get! S k ¼ 2M 2k þ X2k 2k M 2k i D i ; i where i¼0 D i ¼ð2cÞ i þð 2cÞ i ða þ bþ i ða bþ i ð a þ bþ i ð a bþ i : Observe now that for o i; D i ¼ 0: Moreover, we claim that D 2j X0 for all jx1: To see this claim observe that ð2a 2 þ 2b 2 Þ¼ða þ bþ 2 þða bþ 2 an that for all jx1 an x; yx0; ðx þ yþ j Xx j þ y j : Thus, ð2cþ 2j ¼ð2a 2 þ 2b 2 Þ j ¼½ða þ bþ 2 þða bþ 2 Š j Xða þ bþ 2j þða bþ 2j implying!! S k ¼ 2M 2k þ Xk 2k M 2ðk jþ D 2j ¼ Xk 2k M 2ðk jþ D 2j X0: j¼0 2j j¼1 2j p The proof for the case where r i Af ffiffi pffiffi 3 ; 0; þ 3 g is just a more cumbersome version of the proof above, so we omit it. That proof, though, brings forwar an interesting point. If one tries to take r i ¼ 0 with probability greater than 2=3; while maintaining that r i has a range of size 3 an variance 1, the lemma fails. In other wors, 2=3 is tight in terms of how much probability mass we can put to r i ¼ 0 an still have the current lemma hol. & Proof of Lemma 6.1. Recall that for any vector a; QðaÞ ¼Q 1 ðaþ ¼a c 1 where c 1 ¼ p 1 ffiffiffi ðr 11 ; y; r 1 Þ: If a ¼ða 1 ; y; a Þ is such that a 2 i ¼ a 2 j for all i; j; then by symmetry, QðaÞ an QðwÞ are ientically istribute an the lemma hols trivially. Otherwise, we can assume without loss of generality,
Which Networks Are Least Susceptible to Cascading Failures?
Which Networks Are Least Susceptible to Cascaing Failures? Larry Blume Davi Easley Jon Kleinberg Robert Kleinberg Éva Taros July 011 Abstract. The resilience of networks to various types of failures is
More informationHow to Use Expert Advice
NICOLÒ CESABIANCHI Università di Milano, Milan, Italy YOAV FREUND AT&T Labs, Florham Park, New Jersey DAVID HAUSSLER AND DAVID P. HELMBOLD University of California, Santa Cruz, Santa Cruz, California
More informationSketching as a Tool for Numerical Linear Algebra
Foundations and Trends R in Theoretical Computer Science Vol. 10, No. 12 (2014) 1 157 c 2014 D. P. Woodruff DOI: 10.1561/0400000060 Sketching as a Tool for Numerical Linear Algebra David P. Woodruff IBM
More informationCOSAMP: ITERATIVE SIGNAL RECOVERY FROM INCOMPLETE AND INACCURATE SAMPLES
COSAMP: ITERATIVE SIGNAL RECOVERY FROM INCOMPLETE AND INACCURATE SAMPLES D NEEDELL AND J A TROPP Abstract Compressive sampling offers a new paradigm for acquiring signals that are compressible with respect
More informationDecoding by Linear Programming
Decoding by Linear Programming Emmanuel Candes and Terence Tao Applied and Computational Mathematics, Caltech, Pasadena, CA 91125 Department of Mathematics, University of California, Los Angeles, CA 90095
More informationFoundations of Data Science 1
Foundations of Data Science John Hopcroft Ravindran Kannan Version /4/204 These notes are a first draft of a book being written by Hopcroft and Kannan and in many places are incomplete. However, the notes
More informationON THE DISTRIBUTION OF SPACINGS BETWEEN ZEROS OF THE ZETA FUNCTION. A. M. Odlyzko AT&T Bell Laboratories Murray Hill, New Jersey ABSTRACT
ON THE DISTRIBUTION OF SPACINGS BETWEEN ZEROS OF THE ZETA FUNCTION A. M. Odlyzko AT&T Bell Laboratories Murray Hill, New Jersey ABSTRACT A numerical study of the distribution of spacings between zeros
More informationSubspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity
Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity Wei Dai and Olgica Milenkovic Department of Electrical and Computer Engineering University of Illinois at UrbanaChampaign
More informationA Tutorial on Spectral Clustering
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Spemannstr. 38, 7276 Tübingen, Germany ulrike.luxburg@tuebingen.mpg.de This article appears in Statistics
More informationIEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006 1289. Compressed Sensing. David L. Donoho, Member, IEEE
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006 1289 Compressed Sensing David L. Donoho, Member, IEEE Abstract Suppose is an unknown vector in (a digital image or signal); we plan to
More informationSteering User Behavior with Badges
Steering User Behavior with Badges Ashton Anderson Daniel Huttenlocher Jon Kleinberg Jure Leskovec Stanford University Cornell University Cornell University Stanford University ashton@cs.stanford.edu {dph,
More informationHighRate Codes That Are Linear in Space and Time
1804 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 7, JULY 2002 HighRate Codes That Are Linear in Space and Time Babak Hassibi and Bertrand M Hochwald Abstract Multipleantenna systems that operate
More informationGeneralized compact knapsacks, cyclic lattices, and efficient oneway functions
Generalized compact knapsacks, cyclic lattices, and efficient oneway functions Daniele Micciancio University of California, San Diego 9500 Gilman Drive La Jolla, CA 920930404, USA daniele@cs.ucsd.edu
More informationIndexing by Latent Semantic Analysis. Scott Deerwester Graduate Library School University of Chicago Chicago, IL 60637
Indexing by Latent Semantic Analysis Scott Deerwester Graduate Library School University of Chicago Chicago, IL 60637 Susan T. Dumais George W. Furnas Thomas K. Landauer Bell Communications Research 435
More informationErgodicity and Energy Distributions for Some Boundary Driven Integrable Hamiltonian Chains
Ergodicity and Energy Distributions for Some Boundary Driven Integrable Hamiltonian Chains Peter Balint 1, Kevin K. Lin 2, and LaiSang Young 3 Abstract. We consider systems of moving particles in 1dimensional
More informationNew Trade Models, New Welfare Implications
New Trae Moels, New Welfare Implications Marc J. Melitz Harvar University, NBER an CEPR Stephen J. Reing Princeton University, NBER an CEPR August 13, 2014 Abstract We show that enogenous firm selection
More informationThe SmallWorld Phenomenon: An Algorithmic Perspective
The SmallWorld Phenomenon: An Algorithmic Perspective Jon Kleinberg Abstract Long a matter of folklore, the smallworld phenomenon the principle that we are all linked by short chains of acquaintances
More informationFrom Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images
SIAM REVIEW Vol. 51,No. 1,pp. 34 81 c 2009 Society for Industrial and Applied Mathematics From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images Alfred M. Bruckstein David
More informationOrthogonal Bases and the QR Algorithm
Orthogonal Bases and the QR Algorithm Orthogonal Bases by Peter J Olver University of Minnesota Throughout, we work in the Euclidean vector space V = R n, the space of column vectors with n real entries
More informationHow Bad is Forming Your Own Opinion?
How Bad is Forming Your Own Opinion? David Bindel Jon Kleinberg Sigal Oren August, 0 Abstract A longstanding line of work in economic theory has studied models by which a group of people in a social network,
More informationTHE PROBLEM OF finding localized energy solutions
600 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1997 Sparse Signal Reconstruction from Limited Data Using FOCUSS: A Reweighted Minimum Norm Algorithm Irina F. Gorodnitsky, Member, IEEE,
More informationWHICH SCORING RULE MAXIMIZES CONDORCET EFFICIENCY? 1. Introduction
WHICH SCORING RULE MAXIMIZES CONDORCET EFFICIENCY? DAVIDE P. CERVONE, WILLIAM V. GEHRLEIN, AND WILLIAM S. ZWICKER Abstract. Consider an election in which each of the n voters casts a vote consisting of
More informationEfficient algorithms for online decision problems
Journal of Computer and System Sciences 71 (2005) 291 307 www.elsevier.com/locate/jcss Efficient algorithms for online decision problems Adam Kalai a,, Santosh Vempala b a Department of Computer Science,
More informationCalibrating Noise to Sensitivity in Private Data Analysis
Calibrating Noise to Sensitivity in Private Data Analysis Cynthia Dwork 1, Frank McSherry 1, Kobbi Nissim 2, and Adam Smith 3 1 Microsoft Research, Silicon Valley. {dwork,mcsherry}@microsoft.com 2 BenGurion
More informationRECENTLY, there has been a great deal of interest in
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 1, JANUARY 1999 187 An Affine Scaling Methodology for Best Basis Selection Bhaskar D. Rao, Senior Member, IEEE, Kenneth KreutzDelgado, Senior Member,
More informationHow many numbers there are?
How many numbers there are? RADEK HONZIK Radek Honzik: Charles University, Department of Logic, Celetná 20, Praha 1, 116 42, Czech Republic radek.honzik@ff.cuni.cz Contents 1 What are numbers 2 1.1 Natural
More informationAn Elementary Introduction to Modern Convex Geometry
Flavors of Geometry MSRI Publications Volume 3, 997 An Elementary Introduction to Modern Convex Geometry KEITH BALL Contents Preface Lecture. Basic Notions 2 Lecture 2. Spherical Sections of the Cube 8
More informationOptimal aggregation algorithms for middleware $
Journal of Computer and System Sciences 66 (2003) 614 656 http://www.elsevier.com/locate/jcss Optimal aggregation algorithms for middleware $ Ronald Fagin, a, Amnon Lotem, b and Moni Naor c,1 a IBM Almaden
More informationFrom Few to Many: Illumination Cone Models for Face Recognition Under Variable Lighting and Pose. Abstract
To Appear in the IEEE Trans. on Pattern Analysis and Machine Intelligence From Few to Many: Illumination Cone Models for Face Recognition Under Variable Lighting and Pose Athinodoros S. Georghiades Peter
More informationA Modern Course on Curves and Surfaces. Richard S. Palais
A Modern Course on Curves and Surfaces Richard S. Palais Contents Lecture 1. Introduction 1 Lecture 2. What is Geometry 4 Lecture 3. Geometry of InnerProduct Spaces 7 Lecture 4. Linear Maps and the Euclidean
More information