Lecture 3. 1 Largest singular value The Behavior of Algorithms in Practice 2/14/2

18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest sngular value n addton to the lower bound on the smallest that we derved last class. Snce the largest sngular value of A + G can be bounded by σ n (A + G) = A + G A + G and we can t really do much about A, the mportant thng to do s bound G. To start off wth a weak but easy bound, we use the followng smple lemma. Lemma 1. If a denote the columns of the matrx A, then max a A d max a Proof. If e denotes the vector wth 1 n the th component but 0 s everywhere else, then Ae = a Hence the left-hand nequalty s clear. For the other nequalty, let x be a unt vector and wrte ( ) Ax = A x e = x a Therefore Ax x a Applyng Cauchy-Schwarz and usng the fact that x = 1, we get Ax x a 2 d max a 2 whch s what we want. If g s a vector of Gaussan random varables wth varance 1, then g 2 s dstrbuted accordng to the χ 2 dstrbuton wth d degrees of freedom, whch has densty functon We need the followng bound on how large a χ 2 random varable can be. 1

Lemma 2. If X s a random varable dstrbuted accordng to the χ 2 dstrbuton wth d degrees of freedom, then Pr{X kd} k d/2 1 e d(k 1)/2 Snce G kd mples max g k d, hence usng lemma 2 and the unon bound, we get Pr{ G kd} dk d 2 e d(k2 1)/2 2 A sharper bound usng nets The bound above s unsatsfyng: for any fxed unt vector x, the vector Gx s a Gaussan random vector, and so ts length should be about d on average. Ths secton wll show how to get a bound on G that uses ths dea to get a bound on G that grows as d rather than as d. Let S d 1 denote the (d 1)-dmensonal unt sphere (the boundary of the unt ball n d dmensons). Defnton 1. A λ-net on S d 1 s a collecton of ponts {x 1, x 2,... x n } such that for any x S d 1, mn x x λ We wll use only 1-nets, and the followng lemma clams that they need not be too large. Lemma 3. For d 2, there exsts a 1-net wth at most 2 d (d 1) ponts. Usng ths lemma, we can prove the followng bound on G : Lemma 4. If G s a matrx of standard normal varables, then Pr{ G 2k d} 2 d (d 1)k d 2 e d(k2 1)/2 (Ths lemma appears wth a slghtly dfferent bound as lemma 2.8 on pg. 907 of [Sza90]) Proof. Let N be the 1-net gven by lemma 3. Let G = UΣV T be the sngular value decomposton of G, and let u and v be the columns of U and V respectvely. By defnton of the net, there exsts a vector x N such that Ths s equvalent to Expandng x n the bass v, we obtan v n x 1 v n x 1 2 x = x v 2

wth x n 1/2. Hence Gx = x Gv = x σ u x n σ n G /2 Hence G 2k d mples that there exsts x N such that Gx k d By the unon bound and lemma 2, we obtan whch s the stated result. Pr{ G 2k d} N k d 2 e d(k2 1)/2 3 Gaussan elmnaton In the next couple of lectures, we wll use the results we have proved to analyze Gaussan elmnaton. Brefly, Gaussan elmnaton solves a system Ax = b by performng row and column operatons on A to reduce t to an upper trangular matrx, whch can then be easly solved. Theoretcally, one can vew ths process as factorng A nto a product of a lower trangular matrx representng the row operatons performed (actually, ther nverses), and an upper trangular matrx representng the result of these operatons. Ths s called the LU-factorzaton of A. There are three pvotng strateges one can use whle performng ths algorthm (pvotng s the process of permutng rows and/or columns before dong the elmnaton). 1. No pvotng: Just what t says. Ths can be done only f we never run nto zeros on the dagonal. Ths s easy to analyze. 2. Partal pvotng: Here only row permutatons are permtted. The strategy s to brng the largest entry n the column we are consderng onto the dagonal. The LU-factorzaton now actually has to be wrtten as LU = P A where P s a permutaton matrx representng the row permutatons performed. Partal pvotng guarantees that no entry n L can exceed 1 n absolute value. 3. Complete pvotng: Here both row and column permutatons are permtted, and the strategy s to move the largest entry n the part of the matrx that we have not yet processed to the dagonal. The factorzaton now looks lke LU = P AQ where P and Q are permutaton matrces. 3

Wlknson showed that f ˆL, Û and ˆx represent the computed values of L, U and x n floatng pont to an accuracy of ɛ, then wth δa such that (A + δa)ˆx = b δa dɛ(3 A + 5 L U ) Matlab uses partal pvotng, and t can be shown that there exst matrces A for whch partal pvotng fals, n the sense that U becomes exponentally large (n d). Ths leads to a total loss of precson unless at least d bts are used to store ntermedate results. Wlknson also showed that for complete pvotng, U A d 1 2 lg d whch means that the number of bts requred s only lg 2 d n the worst case. However, complete pvotng s much more expensve n floatng pont than partal pvotng, whch seems to work qute well n practce. One of the goals of ths class s to understand why. In the next couple of lectures, we wll show n fact that no pvotng does well most of the tme. 4 Proof of techncal lemmas For completeness, we gve the proofs of lemmas 2 and 3. Proof of lemma 2. We have Pr{X kd} = = Usng x + (k 1)d kx, and we are done. kd d (x + (k 1)d) d/2 1 e (k 1)d/2 x/2 k d/2 1 e (k 1)d/2 k d/2 1 e (k 1)d/2 d Proof of lemma 3. Let N be a maxmal set of ponts on the unt sphere such that the great-crcle dstance between any two ponts n N s at least π/3. Then N wll be a 1-net, because f u were a unt vector such that no vector n N s wthn dstance 1 of u, then there would be no pont of N wthn great-crcle dstance π/3 of u, so u could be added to N. 4

To see that N (d 1)2 d, observe that the sets B(x, π/6) = {u S d 1 : d(u, x) π/6}, x N are dsjont. A lower bound on the (d 1)-dmensonal volume of each B(x, π/6) s gven by the volume of the (d 1)-dmensonal ball of radus sn(π/6) = 1/2. If S d 1 denotes the volume of S d 1 and V d the volume of the unt ball n d dmensons, then V d = 2πd/2 dγ(d/2) and S d 1 = 2πd/2 Γ(d/2) Hence N 2 d 1 S d 1 V d 1 = 2 d 1 (d 1) π 2 d (d 1) Γ((d 1)/2) Γ(d/2) A somewhat tghter bound can be obtaned by usng the fact that Γ((d 1)/2) lm = e d Γ(d/2) d References [Sza90] Stanslaw J. Szarek, Spaces wth large dstance to l n and random matrces, Amercan Journal of Mathematcs 112 (1990), no. 6, 899 942. 5